MultiRM¶

Predict 12 RNA modification types from sequence.

Paper: NAR 2021
Upstream: https://github.com/Tsedao/MultiRM
License: MIT
Device: CPU or GPU (lightweight LSTM, ~8MB weights). Two image variants:
- rnazoo-multirm:latest — CUDA-enabled (default, used with -profile gpu)
- rnazoo-multirm-cpu:latest — CPU-only (smaller, used with -profile cpu)

What it does¶

MultiRM is a multi-task deep learning model that simultaneously predicts 12 types of RNA modifications from sequence. It uses Word2Vec 3-mer embeddings, a bidirectional LSTM, and Bahdanau attention with 12 task-specific classification heads. For each position in the input sequence, it outputs a probability for each of the 12 modification types and a statistical significance (p-value) against a null distribution.

The 12 modification types¶

Code	Full Name
Am	2'-O-methyladenosine
Cm	2'-O-methylcytidine
Gm	2'-O-methylguanosine
Um	2'-O-methyluridine
m1A	N1-methyladenosine
m5C	5-methylcytidine
m5U	5-methyluridine
m6A	N6-methyladenosine
m6Am	N6,2'-O-dimethyladenosine
m7G	7-methylguanosine
Psi	Pseudouridine
AtoI	Adenosine-to-inosine editing

Input format¶

FASTA file of RNA sequences (min 51 nt each). U is auto-converted to T internally.

>test_rna_modification
GGGGCCGTGGATACCTGCCTTTTAATTCTTTTTTATTCGCCCATCGGGGCCGCGGATACCTGCTTTTTATTTTTTTTTCCTTAGCCCATCGGGG

The first and last 25 nucleotides cannot be scored (edge padding for the 51-nt sliding window).

Output format¶

modification_scores.tsv — per-position probabilities for all 12 modification types:

header  position    base    Am  Cm  Gm  Um  m1A m5C m5U m6A m6Am    m7G Psi AtoI
test_rna_modification   26  T   0.002129    0.000499    ... 0.839820    ... 0.017451

predicted_sites.tsv — only statistically significant predictions (p-value < alpha):

header  modification    position    base    probability p_value
test_rna_modification   m6A 28  C   0.953859    0.000000
test_rna_modification   m6A 29  T   0.968842    0.000000

Run with Docker¶

See the Direct Docker guide for the shared docker run recipe (UID, HOME, USER env vars, and GPU flag). Below are the model-specific parts.

# CPU
docker run --rm \
  -v /path/to/input.fa:/data/input.fa \
  -v /path/to/output:/out \
  ghcr.io/ericmalekos/rnazoo-multirm-cpu:latest \
  multirm_predict.py -i /data/input.fa -o /out

# GPU
docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all \
  -v /path/to/input.fa:/data/input.fa \
  -v /path/to/output:/out \
  ghcr.io/ericmalekos/rnazoo-multirm:latest \
  multirm_predict.py -i /data/input.fa -o /out

Run with Nextflow¶

# CPU
nextflow run main.nf -profile docker,cpu --multirm_input /path/to/input.fa

# GPU
nextflow run main.nf -profile docker,gpu --multirm_input /path/to/input.fa

Only models with input provided will run — no ignore flags needed.

Results appear in results/multirm/multirm_out/.

Parameters¶

Parameter	Default	Description
`--multirm_alpha`	`0.05`	Significance threshold for calling modification sites

Technical notes¶

Uses a 51-nt sliding window across the input. Each window is encoded as 49 overlapping 3-mers mapped to 300-d Word2Vec embeddings.
The model is an LSTM + attention architecture (~8MB), making it one of the smallest models in the zoo.
P-values are computed by comparing each prediction probability against a null distribution derived from negative samples.
The wrapper re-implements the model architecture (CPU/GPU compatible) rather than using the upstream code which has hardcoded CUDA calls.