RhoFold+¶
Predict RNA 3D structure from sequence (single-sequence mode).
- Paper: Nature Methods 2024
- Upstream: https://github.com/ml4bio/RhoFold
- License: Apache 2.0
- Device: CPU or GPU. Two image variants:
rnazoo-rhofold:latest— CUDA-enabled (default, used with-profile gpu)rnazoo-rhofold-cpu:latest— CPU-only (smaller, used with-profile cpu)
What it does¶
RhoFold+ predicts full-atom RNA 3D structures from sequence, similar to AlphaFold for proteins. It combines an RNA language model (RNA-FM, embedded in the architecture) with an E2Eformer structure module and iterative refinement (10 recycling steps). It outputs PDB files with atomic coordinates and per-residue confidence scores (pLDDT).
This module runs in single-sequence mode (no MSA databases needed). For higher accuracy with MSA, users can provide a pre-computed A3M file via the upstream CLI.
Input format¶
FASTA file of RNA sequences (A, C, G, U alphabet; T is auto-converted):
Maximum sequence length: ~1022 nt (limited by the embedded RNA-FM positional embeddings).
Output format¶
Per-sequence output directory containing:
unrelaxed_model.pdb— predicted 3D structure in PDB format. B-factor column contains pLDDT confidence scores (0-100).ss.ct— predicted secondary structure in CT (connect table) formatresults.npz— NumPy archive with distograms, SS probability maps, and pLDDT scoreslog.txt— inference log with timing
Run with Docker¶
See the Direct Docker guide for the shared
docker runrecipe (UID,HOME,USERenv vars, and GPU flag). Below are the model-specific parts.
# CPU
docker run --rm \
-v /path/to/input.fa:/data/input.fa \
-v /path/to/output:/out \
ghcr.io/ericmalekos/rnazoo-rhofold-cpu:latest \
rhofold_predict.py -i /data/input.fa -o /out
# GPU
docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all \
-v /path/to/input.fa:/data/input.fa \
-v /path/to/output:/out \
ghcr.io/ericmalekos/rnazoo-rhofold:latest \
rhofold_predict.py -i /data/input.fa -o /out
Run with Nextflow¶
# CPU
nextflow run main.nf -profile docker,cpu --rhofold_input /path/to/input.fa
# GPU (recommended for production)
nextflow run main.nf -profile docker,gpu --rhofold_input /path/to/input.fa
Only models with input provided will run — no ignore flags needed.
Results appear in results/rhofold/rhofold_out/<sequence_name>/.
Viewing the output¶
# View in PyMOL, ChimeraX, or any PDB viewer
pymol results/rhofold/rhofold_out/tRNA_fragment/unrelaxed_model.pdb
# Read pLDDT scores from B-factor column
grep "^ATOM" unrelaxed_model.pdb | awk '{print $6, $11}'
Performance notes¶
- CPU: ~5-15 minutes per sequence (76 nt tRNA). Very slow for long sequences.
- GPU: ~1-3 minutes per sequence. Recommended for production use.
- Memory scales quadratically with sequence length (pair representations are L x L x 128).
- Amber energy relaxation is disabled by default (
--relax-steps 0). Enable with--relax-steps 1000if OpenMM is installed.
Technical notes¶
- Runs in single-sequence mode by default — no external MSA databases needed
- Weights (~508 MB) baked into Docker image from HuggingFace
- The RNA-FM language model is embedded in the architecture (not a separate module)
- 10 recycling iterations per prediction (each is a full forward pass)
- OpenMM/Amber relaxation is not included in the Docker image to keep it small; the unrelaxed PDB is the primary output