File formats
All SPRAS file formats are tab-separated values.
Node file
Node files include a header row and rows providing attributes for each
node. One column is for the node identifier and must have the header
value NODEID. All other columns specify additional node attributes
such as prizes. Any nodes that are listed in a node file but are not
present in one or more edges in the edge file will be removed. For
example:
NODEID |
prize |
sources |
targets |
active |
dummy |
|---|---|---|---|---|---|
A |
1.0 |
True |
True |
True |
|
B |
3.3 |
True |
True |
||
C |
2.5 |
True |
True |
||
D |
1.9 |
True |
True |
True |
A secondary format provides only a list of node identifiers and uses the
filename as the node attribute, as in the example sources.txt. This
format may be deprecated.
Edge file
Edge files do not include a header row. Each row lists the two nodes that are connected with an edge, the weight for that edge, and, optionally, a directionality column to indicate whether the edge is directed or undirected. The directionality values are either a ‘U’ for an undirected edge or a ‘D’ for a directed edge. If the directionality column is not included, SPRAS will assume that the file’s edges are entirely undirected. The weights are typically in the range [0,1] with 1 being the highest confidence for the edge.
For example:
A |
B |
0.98 |
U |
|---|---|---|---|
B |
C |
0.77 |
D |
or
A |
B |
0.98 |
|---|---|---|
B |
C |
0.77 |
Gold Standard
Nodes
Gold standard node files are txt files and do not include a header row.
Each row in the file represents a single node identifier. The file is structured as a single column with one node per line. These nodes typically correspond to gene or protein identifiers that are relevant to the biological pathway of interest.
For example:
A
B
C
Pathway output format
Output pathway files in the standard SPRAS format include a header row
and rows providing attributes for each edge. The header row is
Node1 Node2 Rank Direction. Each row lists the two nodes
that are connected with an edge, the rank for that edge, and a
directionality column to indicate whether the edge is directed or
undirected. The directionality values are either a ‘U’ for an undirected
edge or a ‘D’ for a directed edge, where the direction is from Node1 to
Node2. Pathways that do not contain ranked edges can output all 1s in
the Rank column.
For example:
Node1 |
Node2 |
Rank |
Direction |
|---|---|---|---|
A |
B |
1 |
D |
B |
C |
1 |
D |
B |
D |
2 |
U |
D |
A |
3 |
U |