Match-Box Web Server 1.3

Guy Baudoux, Christophe Lambert, Ernest Feytmans and Eric Depiereux.

Molecular Biology Research Unit, The University of Namur, Belgium.

Project supported by the Walloon Government, Federal State of Belgium.

Return to SUBMIT | HELP page.

Information about the output.

The Match-Box server can produce six types of output files:

an "align" report in plain text format;
an "explore" report in plain text format;
a PostScript file of the alignment;
a FASTA file of the alignment;
an MSF file of the alignment (produced by readseq(1));
an HSSP file of the alignment;
a log file of the execution of the Match-Box server.

By default, only the two reports are mailed.

The "explore" report presents a table and a graph showing the global similarity cumulated for all the sequences. This allows to detect if at least some sequences depart from randomness. A second table and graph show the similarity for the less related pair of sequences. This allows to detect if the most distant sequences remain significantly similar. The factor analysis allows to delineate relevant groups of similar sequences.

The "align" report shows the boxes selected at the end of the four steps of matching and the final alignment. Residues included in the boxes obtained are printed in lowercase. Residues in upper case are NOT align. A score from 1 to 9 is written below each position in the boxes. It corresponds to the run in which this position has been included in the box and is thus related to the statistical significance of the alignment at this position. Lower the score is, higher is the reliability of the alignment. A score of 5 corresponds to a level of similarity of equal occurrence in related and unrelated sequences. This score is related to the averaged OBSERVED confidence rate by a quite perfect linear relationship. For scores below 5 on more than 2 non redundant sequences, you can expect a rate of confidence over 90% (less than 10% of false positive), even when the percentage of identity is very low (10%). The following results (Table 1) have been obtained on 20 families of 3 to 6 known structures sharing between 9% and 44% of conserved residues.

Table 1. Percentage of correctly predicted aligned residues, with respect to structure alignments. Results expressed in % confidence (correctly predicted/total aligned) Minimum and maximum observed on the 20 tests.
Reliability Score	Minimum %	Maximum %
6	41.3	86.8
5	48.8	100
4	73.9	100
3	84.6	100
2	100	100

When lowercase amino-acids are aligned to gaps, it means that the position of the gaps is not completely defined. If two successive selected boxes are overlapping by a maximum of k amino acids in one of the sequences, the final alignment will show a gap aligned with lowercase amino acids. Part of this gap, or the whole gap, can be moved partially or totally to the right by r positions (r being lower or equal to k). It means that Match-Box is not able to fix exactly the position of this gap, but that the gap can be placed somewhere to the right within a range of k amino acids.

The reliability score is not printed with the FASTA, MSF and HSSP file formats.

In the PostScript version of the alignment, the aligned amino-acids are coloured according to the reliability of the alignment at that position.

The PostScript files can be printed after cutting the email header lines: the file must begin with the two characters identifying a PostScript file: "%!".

Please note that alignments must be displayed or printed with non-proportional fonts (like monaco or courier) and that big files (like PostSript files) can be cut in parts by some email client programs.

REFERENCES:

Gilbert, D.G. (1990). READSEQ, a program that reads and writes nucleic/protein sequences in various formats. Biology. Dept., Indiana University, Bloomington, In. 47405, USA.

Return to SUBMIT | HELP page.

Webmaster