RNAZoo¶
A Nextflow pipeline model zoo for RNA deep learning.
What's included¶
16 models across 5 tracks. Every container has its model weights baked in at build time — no runtime downloads. Image sizes below are the compressed download size from GHCR; on disk they roughly double after extraction.
| Model | Track | Training set | Input | Output | GPU image | CPU image |
|---|---|---|---|---|---|---|
| RiboNN | Translation | 78 human cell-type TE | TSV (tx_id, UTR5, CDS, UTR3) |
per-cell-type TE TSV | 2.8 GB | 1.2 GB |
| Riboformer | Translation | ribo-seq, 5 species | Dir (WIG + GFF + FASTA) | model_prediction.txt |
4.0 GB | 2.3 GB |
| RiboTIE | Translation | human ribo-seq (8 SRRs) | Dir (FASTA + GTF + BAMs + YAML) | per-sample GTF / CSV / NPY | 3.9 GB | 1.3 GB |
| seq2ribo | Translation | 4 human cell-line ribo-seq + sTASEP sim | FASTA mRNA | seq2ribo_output.json |
10.3 GB | — (GPU only) |
| TranslationAI | Translation | 47K human RefSeq mRNAs | FASTA mRNA | *_predTIS / *_predTTS / *_predORFs.txt |
1.9 GB | 0.6 GB |
| Saluki | Translation | 66 mRNA-decay datasets (human + mouse) | FASTA (UTR lowercase, CDS UPPERCASE) | preds.npy |
4.2 GB | 1.4 GB |
| CodonTransformer | Translation | 1M genes across 164 organisms | FASTA protein | optimized DNA FASTA | 3.7 GB | 1.2 GB |
| RNA-FM | Foundation | 23M ncRNAs (RNAcentral) | FASTA RNA | sequence_embeddings.npy + labels.txt |
4.2 GB | 1.7 GB |
| RiNALMo | Foundation | 36M ncRNAs (RNAcentral) | FASTA RNA | sequence_embeddings.npy + labels.txt |
5.6 GB | 3.1 GB |
| ERNIE-RNA | Foundation | 20M ncRNAs (RNAcentral) | FASTA RNA | sequence_embeddings.npy + labels.txt |
5.7 GB | — (single image) |
| Orthrus | Foundation | 32.7M mRNAs (GENCODE+RefSeq+Zoonomia, contrastive) | FASTA mature mRNA (4-track) | sequence_embeddings.npy + labels.txt |
~5 GB | — (GPU only) |
| RNAformer | Structure | bpRNA + PDB (LoRA-finetuned) | FASTA RNA | structures.txt (dot-bracket) |
3.8 GB | — (single image) |
| RhoFold | Structure | PDB + bpRNA self-distillation | FASTA RNA | PDB + ss.ct + results.npz |
4.2 GB | 1.7 GB |
| SPOT-RNA | Structure | bpRNA + PDB + Rfam | FASTA RNA | structures.txt + per-seq bpseq / ct / prob |
2.7 GB | 0.6 GB |
| MultiRM | Modification | ~300K human modification sites | FASTA RNA | modification_scores.tsv + predicted_sites.tsv |
3.5 GB | 1.0 GB |
| UTR-LM | mRNA Design | 5'UTRs, 5 species + MPRA (MRL) | FASTA 5'UTR | predictions.tsv |
4.9 GB | 2.4 GB |
Totals: CPU set is ~28 GB across 14 images; GPU set is ~70 GB across 16 images. See the installation page for the matching pre-pull commands.
Quick start¶
With Nextflow (recommended for pipelines)¶
# Run the test suite (13 models on CPU, ~5 min)
nextflow run . -profile test,docker,cpu
# Run a single model — only models you provide input for will run
nextflow run . -profile docker,cpu --rnafm_input my_sequences.fa
# Run multiple models in parallel
nextflow run . -profile docker,cpu \
--rnafm_input seqs.fa \
--rnaformer_input seqs.fa \
--multirm_input seqs.fa
# Use a YAML params file for complex runs
nextflow run . -profile docker,cpu -params-file my_params.yml
With plain Docker (no Nextflow required)¶
# Run one model against a FASTA (CPU)
docker run --rm \
-u $(id -u):$(id -g) -e HOME=/tmp -e USER=$(whoami) \
-v $PWD/seqs.fa:/data/input.fa -v $PWD/out:/out \
ghcr.io/ericmalekos/rnazoo-rnafm-cpu:latest \
rnafm_predict.py -i /data/input.fa -o /out
See the Direct Docker guide for invocations of every model.
Design principles¶
- One Docker image per model — weights baked in at build time, no runtime downloads
- CPU by default — GPU-only models auto-skip under
--profile cpu - Per-model input/output — each model uses its native format, no forced preprocessing
- Portable — runs anywhere with Docker or Singularity + Nextflow
License¶
RNAZoo pipeline code is open source. Individual models carry their own licenses — see each model's page for details. Most are MIT/Apache-2.0; some have non-commercial restrictions noted on their pages.