RiNALMo¶
Extract RNA embeddings from a 650M-parameter RNA language model.
- Paper: NeurIPS 2024
- Upstream: https://github.com/lbcb-sci/RiNALMo
- License: Apache 2.0 (code), CC BY 4.0 (weights)
- Device: CPU or GPU (650M params — GPU strongly recommended). Two image variants:
rnazoo-rinalmo:latest— CUDA-enabled (default, used with-profile gpu)rnazoo-rinalmo-cpu:latest— CPU-only (smaller, used with-profile cpu)
What it does¶
RiNALMo is a large RNA language model (650M parameters) pretrained on 36 million non-coding RNA sequences using masked language modeling. It produces 1280-dimensional embeddings for each nucleotide position. These embeddings achieve state-of-the-art results on downstream tasks including secondary structure prediction, splice site identification, and RNA family classification.
Compared to RNA-FM (99M params, 640-d), RiNALMo is larger and produces higher-dimensional embeddings but requires more compute.
Input format¶
FASTA file of RNA sequences (A, C, G, U). DNA sequences (with T) are automatically converted.
No architectural length limit (uses Rotary Position Embeddings) — but attention memory scales O(L²), so practical limits are memory-bound. See Memory scaling below for exact ceilings on common hardware.
Example (reuses tests/data/rnafm_test.fa):
>test_rna_1
GGGUGCGAUCAUACCAGCACUAAUGCCCUCCUGGGAAGUCCUCGUGUUGCACCUGACUGUCUUUCCGAACGGGCGUUUCUUUUCCUCCGCGCUACCUGCCAGG
>test_rna_2
AUUCCGAGAGCUAACGGAGAACUCUGUUCGAUUUAAGCUGUAAGAUGGCAGUAGCUUACUAGGCAGGAAAAGACCCUGUUGAGCUUGACUCUAGUU
Memory scaling¶
RiNALMo uses full self-attention (no sliding window), so attention memory scales O(L²) with sequence length. With 20 attention heads and float32 attention scores:
attention_memory = 20 × L² × 4 bytes
Model weights + activations add ~3–4 GB on top.
| Sequence length | Attention matrix | Observed behavior |
|---|---|---|
| 1,022 nt | ~80 MB | Fits easily on any hardware |
| 3,000 nt | ~690 MB | Fits on A10G (24 GB VRAM) |
| 5,000 nt | ~1.9 GB | Tight on A10G |
| 11,933 nt | ~10.6 GB | OOM on A10G (24 GB) |
| 14,859 nt | ~16.5 GB | OOM on A10G |
| 20,635 nt | ~31.8 GB | OOM even with 30 GB CPU RAM |
Practical maximums:
- A10G (24 GB VRAM): ~2–3k nt
- 30 GB system RAM: ~11k nt
- 64 GB system RAM: ~16k nt
For longer sequences, truncate to a sliding window (e.g. 1022 nt → ~80 MB attention, runnable anywhere).
Output format¶
A directory containing:
sequence_embeddings.npy: NumPy array of shape(N, 1280)— one 1280-d embedding per input sequence (mean-pooled over positions)labels.txt: one FASTA header per line, in the same order as the embedding rows
With --per-token:
- <label>_tokens.npy: per-sequence NumPy array of shape (L, 1280) — one 1280-d embedding per nucleotide position
Run with Docker¶
See the Direct Docker guide for the shared
docker runrecipe (UID,HOME,USERenv vars, and GPU flag). Below are the model-specific parts.
# CPU
docker run --rm \
-v /path/to/input.fa:/data/input.fa \
-v /path/to/output:/out \
ghcr.io/ericmalekos/rnazoo-rinalmo-cpu:latest \
rinalmo_predict.py -i /data/input.fa -o /out
# GPU
docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all \
-v /path/to/input.fa:/data/input.fa \
-v /path/to/output:/out \
ghcr.io/ericmalekos/rnazoo-rinalmo:latest \
rinalmo_predict.py -i /data/input.fa -o /out
Add --per-token to either invocation for per-token embeddings.
Run with Nextflow¶
# CPU
nextflow run main.nf -profile docker,cpu --rinalmo_input /path/to/input.fa
# GPU (recommended for production)
nextflow run main.nf -profile docker,gpu --rinalmo_input /path/to/input.fa
Only models with input provided will run — no ignore flags needed.
Results appear in results/rinalmo/rinalmo_out/.
Parameters¶
| Parameter | Default | Description |
|---|---|---|
--rinalmo_per_token |
false |
Also output per-token (L x 1280) embeddings per sequence |
Reading the output¶
import numpy as np
# Per-sequence embeddings
embeddings = np.load("rinalmo_out/sequence_embeddings.npy") # (N, 1280)
labels = open("rinalmo_out/labels.txt").read().strip().split("\n")
for label, emb in zip(labels, embeddings):
print(f"{label}: {emb.shape}") # (1280,)
Example output¶
Model variants¶
The giga model (650M params, 1280-d) is the default and only variant included. Smaller variants exist (mega: 150M/640-d, micro: 35M/480-d) but are not bundled.
Technical notes¶
- Built from the
non_flashbranch (no FlashAttention dependency — works on any GPU or CPU). - Weights (~2.6 GB) are baked into the Docker image from Zenodo.
- The same pretrained weights work on both flash and non-flash branches.
- Skipped in the default test profile due to CPU inference time (~60s for model loading).