RiboTIE (TRISTAN)¶
Detect translated ORFs from ribo-seq + genomic sequence using a transformer model.
- Paper: Nature Communications 2025
- Upstream: https://github.com/TRISTAN-ORF/TRISTAN (v1.1.1)
- License: Upstream repository license
- Device: CPU or GPU. Two image variants:
rnazoo-tristan:latest— CUDA-enabled (default, used with-profile gpu, recommended for real datasets)rnazoo-tristan-cpu:latest— CPU-only (smaller, used with-profile cpu, viable for small datasets)
What it does¶
RiboTIE (implemented via the TRISTAN package) detects translated open reading frames (ORFs) by combining genomic sequence with ribo-seq data. It fine-tunes a pretrained transformer model on the user's ribo-seq data before predicting translated ORFs. The model outputs per-transcript translation initiation site (TIS) predictions as GTF annotations and CSV scores.
Input format¶
A directory containing:
- Genome FASTA (
.fa): reference genome assembly - GTF annotation (
.gtf): gene/transcript annotation - BAM files (
.bam): transcriptome-mapped ribo-seq reads (one per sample) - YAML config file: specifying paths, sample mapping, and fold assignments
Example config (config.yml):
fa_path: GRCh38v110_snippet.fa
gtf_path: GRCh38v110_snippet.gtf
ribo_paths:
"sample1": SRR000001.bam
"sample2": SRR000002.bam
"sample3": SRR000003.bam
h5_path: ribotie_out/dbs/ribotie.h5
out_prefix: ribotie_out/out/ribotie
trained_model:
folds:
0:
test: []
train: ['sample2']
transfer_checkpoint: /path/to/checkpoint_f0.ckpt
val: ['sample3']
1:
test: ['sample3']
train: ['sample1']
transfer_checkpoint: /path/to/checkpoint_f1.ckpt
val: ['sample2']
Note: sample IDs in ribo_paths must be strings (use quotes around numeric IDs).
Output format¶
*.gtf: predicted ORFs in GTF format*.csv: per-ORF prediction scores*.npy(optional): raw prediction arrays
Run with Docker¶
See the Direct Docker guide for the shared
docker runrecipe (UID,HOME,USERenv vars, and GPU flag). Below are the model-specific parts.
# CPU
docker run --rm \
-v /path/to/data:/work \
-w /work \
ghcr.io/ericmalekos/rnazoo-tristan-cpu:latest \
bash -c "mkdir -p ribotie_out/dbs ribotie_out/out && \
ribotie config.yml --accelerator cpu --overwrite_data --max_epochs 10 --patience 3"
# GPU
docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all \
-v /path/to/data:/work \
-w /work \
ghcr.io/ericmalekos/rnazoo-tristan:latest \
bash -c "mkdir -p ribotie_out/dbs ribotie_out/out && \
ribotie config.yml --accelerator gpu --overwrite_data --max_epochs 10 --patience 3"
Run with Nextflow¶
# CPU (slow but works for the bundled test data and small inputs)
nextflow run main.nf -profile docker,cpu \
--ribotie_input /path/to/data_dir \
--ribotie_config /path/to/config.yml
# GPU (recommended for real ribo-seq datasets)
nextflow run main.nf -profile docker,gpu \
--ribotie_input /path/to/data_dir \
--ribotie_config /path/to/config.yml
Only models with input provided will run — no ignore flags needed.
Parameters¶
| Parameter | Default | Description |
|---|---|---|
--ribotie_max_epochs |
(upstream default) | Maximum training epochs |
--ribotie_patience |
(upstream default) | Early stopping patience |
--ribotie_checkpoint |
null |
Path to a *.rt.ckpt saved from a previous run. When set, each fold's transfer_checkpoint in the YAML is rewritten to this file so the model skips re-fine-tuning. |
Reusing a fine-tuned checkpoint¶
Every ribotie config.yml run writes per-sample, per-fold fine-tuned checkpoints at <out_prefix>_<sample>_f{0,1}.rt.ckpt. These are captured by the module's checkpoints output emit (matched by **.ckpt). To skip re-fine-tuning on a later run over new data, pass one of them as --ribotie_checkpoint:
nextflow run main.nf -profile docker,cpu \
--ribotie_input /path/to/new_data_dir \
--ribotie_config /path/to/new_config.yml \
--ribotie_checkpoint results/ribotie/ribotie_out/out/ribotie_SRR000001_f0.rt.ckpt
Bundled pretrained checkpoints¶
Located inside the Docker image at:
/opt/conda/envs/tristan/lib/python3.10/site-packages/transcript_transformer/pretrained/
tt_models/Homo_sapiens.GRCh38.113_f{0..4}.tt.ckpt # Human
tt_models/Mus_musculus.GRCm39.112_f{0..4}.tt.ckpt # Mouse
rt_models/50perc_06_23_f{0,1}.rt.ckpt # Pretrained RiboTIE
Pipeline steps¶
- Build HDF5 database from FASTA + GTF + BAM (automatic)
- Fine-tune the pretrained model on user's ribo-seq samples
- Predict translated ORFs and emit per-sample GTF/CSV outputs