Skip to content

SPOT-RNA

Predict RNA secondary structure including pseudoknots and non-canonical base pairs.

  • Paper: Nature Communications 2019
  • Upstream: https://github.com/jaswindersingh2/SPOT-RNA
  • License: MPL-2.0
  • Device: CPU or GPU (5-model TensorFlow ensemble). Two image variants:
    • rnazoo-spotrna:latest — CUDA-enabled (default, used with -profile gpu)
    • rnazoo-spotrna-cpu:latest — CPU-only (smaller, used with -profile cpu)

What it does

SPOT-RNA predicts RNA secondary structure from sequence using an ensemble of 5 deep learning models. Unlike many structure prediction methods, it can predict:

  • Canonical base pairs (AU, GC, GU)
  • Non-canonical base pairs
  • Pseudoknots
  • All types of base pair interactions

The ensemble averages predictions across 5 models with different architectures for robust results.

Input format

FASTA file of RNA sequences (A, C, G, U alphabet; T is auto-converted):

>2zzm_B
GGCAGAUCUGAGCCUGGGAGCUCUCUGCC

No hard length limit, but memory scales as O(L^2). Sequences under 500 nt are recommended for reasonable memory usage.

Output format

structures.txt — FASTA-like file with dot-bracket notation (pseudoknot-aware):

>2zzm_B
GGCAGAUCUGAGCCUGGGAGCUCUCUGCC
((((((...(((((...).))))))))))

Per-sequence files: - *.bpseq — base-pair format (index, base, pair partner; 0=unpaired) - *.ct — connectivity table format - *.prob — full L x L base-pair probability matrix

Run with Docker

See the Direct Docker guide for the shared docker run recipe (UID, HOME, USER env vars, and GPU flag). Below are the model-specific parts.

# CPU
docker run --rm \
  -v /path/to/input.fa:/data/input.fa \
  -v /path/to/output:/out \
  ghcr.io/ericmalekos/rnazoo-spotrna-cpu:latest \
  spotrna_predict.py -i /data/input.fa -o /out

# GPU
docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all \
  -v /path/to/input.fa:/data/input.fa \
  -v /path/to/output:/out \
  ghcr.io/ericmalekos/rnazoo-spotrna:latest \
  spotrna_predict.py -i /data/input.fa -o /out

Run with Nextflow

# CPU
nextflow run main.nf -profile docker,cpu --spotrna_input /path/to/input.fa

# GPU
nextflow run main.nf -profile docker,gpu --spotrna_input /path/to/input.fa

Only models with input provided will run — no ignore flags needed.

Results appear in results/spotrna/spotrna_out/.

Reading the probability matrix

import numpy as np

# Load the full L x L pair probability matrix
prob = np.loadtxt("spotrna_out/2zzm_B.prob")
print(f"Shape: {prob.shape}")  # (29, 29)

# Find high-confidence pairs
pairs = np.argwhere(prob > 0.5)
print(f"Predicted pairs: {len(pairs)}")

Example output

>2zzm_B
GGCAGAUCUGAGCCUGGGAGCUCUCUGCC
((((((...(((((...).))))))))))

This structure shows a stem-loop with nested base pairs and no pseudoknots for this particular sequence. When pseudoknots are present, they are denoted with [] and {} bracket types.

Comparison with RNAformer

Feature SPOT-RNA RNAformer
Framework TensorFlow 2.x PyTorch
Architecture 5-model ensemble Single model with recycling
Pseudoknots Yes Yes
Non-canonical pairs Yes Yes
Output formats bpseq + ct + prob + dot-bracket dot-bracket + optional prob matrix
Pretrained on bpRNA + PDB + Rfam bpRNA + PDB
License MPL-2.0 Apache 2.0

Technical notes

  • Uses TensorFlow 2.15 with tf.compat.v1 (originally TF 1.14, updated for modern TF)
  • 5 ensemble models range from 8-58 MB each (~155 MB total)
  • Prediction threshold is hardcoded at 0.335
  • Post-processing includes hairpin loop constraints, multiplet resolution, and lone pair removal
  • The wrapper copies input FASTA to a temp file to avoid an upstream in-place overwrite bug