File formats for Secondary Structure Constraints
Constraints Definition File
The RNAlib can parse and apply data from constraint definition text files, where each constraint is given as a line of whitespace delimited commands. The syntax we use extends the one used in mfold / UNAfold where each line begins with a command character followed by a set of positions.
Additionally, we introduce several new commands, and allow for an optional loop type context specifier in form of a sequence of characters, and an orientation flag that enables one to force a nucleotide to pair upstream, or downstream.
Constraint commands
The following set of commands is recognized:
F
Force
P
Prohibit
C
Conflicts/Context dependency
A
Allow (for non-canonical pairs)
E
Soft constraints for unpaired position(s), or base pair(s)
Specification of the loop type context
The optional loop type context specifier
[WHERE] may be a combination of the following:
E
Exterior loop
H
Hairpin loop
I
Interior loop (enclosing pair)
i
Interior loop (enclosed pair)
M
Multibranch loop (enclosing pair)
m
Multibranch loop (enclosed pair)
A
All loops
If no
[WHERE] flags are set, all contexts are considered (equivalent to A
)
Controlling the orientation of base pairing
For particular nucleotides that are forced to pair, the following
[ORIENTATION] flags may be used:
U
Upstream
D
Downstream
If no
[ORIENTATION] flag is set, both directions are considered.
Sequence coordinates
Sequence positions of nucleotides/base pairs are
based and consist of three positions
,
, and
. Alternativly, four positions may be provided as a pair of two position ranges
, and
using the '-' sign as delimiter within each range, i.e.
, and
.
Valid constraint commands
Below are resulting general cases that are considered valid constraints:
- "Forcing a range of nucleotide positions to be paired":
Syntax:F i 0 k [WHERE] [ORIENTATION]
Description:
Enforces the set of
consecutive nucleotides starting at position
to be paired. The optional loop type specifier
[WHERE] allows to force them to appear as closing/enclosed pairs of certain types of loops.
- "Forcing a set of consecutive base pairs to form":
Syntax:F i j k [WHERE]
Description:
Enforces the base pairs
to form. The optional loop type specifier
[WHERE] allows to specify in which loop context the base pair must appear.
- "Prohibiting a range of nucleotide positions to be paired":
Syntax:P i 0 k [WHERE]
Description:
Prohibit a set of
consecutive nucleotides to participate in base pairing, i.e. make these positions unpaired. The optional loop type specifier
[WHERE] allows to force the nucleotides to appear within the loop of specific types.
- "Probibiting a set of consecutive base pairs to form":
Syntax:P i j k [WHERE]
Description:
Probibit the base pairs
to form. The optional loop type specifier
[WHERE] allows to specify the type of loop they are disallowed to be the closing or an enclosed pair of.
- "Prohibiting two ranges of nucleotides to pair with each other":
Syntax:P i-j k-l [WHERE]
Description:
Prohibit any nucleotide
to pair with any other nucleotide
. The optional loop type specifier
[WHERE] allows to specify the type of loop they are disallowed to be the closing or an enclosed pair of.
- "Enforce a loop context for a range of nucleotide positions":
Syntax:C i 0 k [WHERE]
Description:
This command enforces nucleotides to be unpaired similar to prohibiting nucleotides to be paired, as described above. It too marks the corresponding nucleotides to be unpaired, however, the
[WHERE] flag can be used to enforce specfic loop types the nucleotides must appear in.
- "Remove pairs that conflict with a set of consecutive base pairs":
Syntax:C i j k
Description:
Remove all base pairs that conflict with a set of consecutive base pairs
. Two base pairs
and
conflict with each other if
, or
.
- "Allow a set of consecutive (non-canonical) base pairs to form":
Syntax:
Description:
This command enables the formation of the consecutive base pairs
, no matter if they are canonical, or non-canonical. In contrast to the above F
and W
commands, which remove conflicting base pairs, the A
command does not. Therefore, it may be used to allow non-canoncial base pair interactions. Since the RNAlib does not contain free energy contributions
for non-canonical base pairs
, they are scored as the maximum of similar, known contributions. In terms of a Nussinov like scoring function the free energy of non-canonical base pairs is therefore estimated as
The optional loop type specifier
[WHERE] allows to specify in which loop context the base pair may appear.
- "Apply pseudo free energy to a range of unpaired nucleotide positions":
Syntax:
Description:
Use this command to apply a pseudo free energy of
to the set of
consecutive nucleotides, starting at position
. The pseudo free energy is applied only if these nucleotides are considered unpaired in the recursions, or evaluations, and is expected to be given in
.
- "Apply pseudo free energy to a set of consecutive base pairs":
Syntax
Use this command to apply a pseudo free energy of
to the set of base pairs
. Energies are expected to be given in
.