Match-Box Web Server 1.3

Guy Baudoux, Christophe Lambert, Ernest Feytmans and Eric Depiereux.

Molecular Biology Research Unit, The University of Namur, Belgium.

Project supported by the Walloon Government, Federal State of Belgium.

Return to SUBMIT | HELP page.


Input of sequences to the Match-Box Web server.

You can send up to 50 sequences (of no more then 2000 amino acids each) in FREE, FASTA, MSF and HSSP formats. However, the reliability of the method is optimal within a range of 4 to 10 sequences. Also, CPU time increases exponentially with the number of sequences submitted. Please submit relevant subgroups of sequences and avoid too redundant information.

I. FREE format.

The FREE format requires to separate each sequence by

>sequence_name

followed by a carriage return. The 10 first characters are taken as sequence name.

Only the 20 standard one letter code amino acids are recognized. Numbers will be deleted and other characters will be transformed into gaps. Embedded comments are not detected and will be treated as sequence data. Dots (".") and hyphens ("-") are interpreted as gaps.

Example (example1 data):

>1ALC
KQFTKCE...LSQNLYDIDGY---GRIALP
ELICTMFHTSG
YDTQAIVENDE

STEYGLFQISNALWCKSSQSPQSRNIC
DITCDKFLDDDITDDIMCAKKIL
DIKGIDYWIAHKALCTEKLEQWLCEK*
>1LZ1
KVFE......RCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDR
STDYGIFQINSRYWCNDGKTPGAVNACHLSC----SALLQDNIADAVACAKR
VV
RDPQGIRAWVAWRNRCQNRDVRQYVQGCGV
>2LZ2
KVYGR
CELAAAMKRLGLDNYRGYSLGNWVCAAKFESNFNTHATNRNTDG
STDYGILQINSRWWCNDGRTPGSKNLCNIPCSALLSSDITASVNCAKKIA
SGGNGMNAWVAWRNRCKGT....
DVHAWIRGCRL
>2LZT
KVFGRCELAAA
MKRHGLDNYRG
YSLGNWVCAAKFESN
FNTQATNRNTDG
STDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIV
SDGNGMNAWVAWRNRCKGTDVQAWIRGC
RL*

II. FASTA format.

The FASTA format is also recognized.

Example (example1 data):

>1ALC, 122 bases, CE212E6E checksum.
KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEY
GLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKG
IDYWIAHKALCTEKLEQWLCEK
>1LZ1, 130 bases, 41D151BD checksum.
KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDR
STDYGIFQINSRYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVV
RDPQGIRAWVAWRNRCQNRDVRQYVQGCGV
>2LZ2, 129 bases, 69802993 checksum.
KVYGRCELAAAMKRLGLDNYRGYSLGNWVCAAKFESNFNTHATNRNTDGS
TDYGILQINSRWWCNDGRTPGSKNLCNIPCSALLSSDITASVNCAKKIAS
GGNGMNAWVAWRNRCKGTDVHAWIRGCRL
>2LZT, 129 bases, 69802993 checksum.
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGS
TDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVS
DGNGMNAWVAWRNRCKGTDVQAWIRGCRL

III. MSF format.

The MSF format is a specific multiple sequence format used in the GCG package (1). Multiple alignment produced with PILEUP in this package are in MSF format. Refer to the GCG documentation for a full specification. The symbols allowed in the sequences are the same as in the FREE format described above.

Example (example1 data):

 example1.msf  MSF: 130  Type: N  January 01, 1776  12:00  Check: 6909 ..

 Name: 1ALC             Len:   122  Check:    25  Weight:  1.00
 Name: 1LZ1             Len:   130  Check:  2703  Weight:  1.00
 Name: 2LZ2             Len:   129  Check:  1206  Weight:  1.00
 Name: 2LZT             Len:   130  Check:  2975  Weight:  1.00

//

           1ALC  KQFTKCELSQ NLYDIDGYGR IALPELICTM FHTSGYDTQA IVENDESTEY
           1LZ1  KVFERCELAR TLKRLGMDGY RGISLANWMC LAKWESGYNT RATNYNAGDR
           2LZ2  KVYGRCELAA AMKRLGLDNY RGYSLGNWVC AAKFESNFNT HATNRNTDGS
           2LZT  KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS

           1ALC  GLFQISNALW CKSSQSPQSR NICDITCDKF LDDDITDDIM CAKKILDIKG
           1LZ1  STDYGIFQIN SRYWCNDGKT PGAVNACHLS CSALLQDNIA DAVACAKRVV
           2LZ2  TDYGILQINS RWWCNDGRTP GSKNLCNIPC SALLSSDITA SVNCAKKIAS
           2LZT  TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS

           1ALC  IDYWIAHKAL CTEKLEQWLC EK
           1LZ1  RDPQGIRAWV AWRNRCQNRD VRQYVQGCGV
           2LZ2  GGNGMNAWVA WRNRCKGTDV HAWIRGCRL
           2LZT  DGNGMNAWVA WRNRCKGTDV QAWIRGCRL

IV. HSSP format.

The HSSP format is a specific multiple sequence format used in the HSSP database (2) and the PHD (3) server. Multiple sequence alignment produced with MaxHom (2) program of this server can be in this format. Refer the the HSSP database documentation for a full specification. The symbols allowed are the same as in the FREE format described above.

Example (example1.hssp file)

If the data are not submitted in the FREE, FASTA, MSF or HSSP formats as described, unpredictable results can occur.


V. REFERENCES

  1. Devereux, J., Haeberli, P., Smithies, O. (1984). A comprehensive set of sequence analysis programs for the VAX. Nuc. Acid. Res. 12, 387-395.

  2. Sander, C., Schneider, R. (1991). Database of homology-derived protein structures and the structural meaning of sequence alignment. Prot. Stru. Funct. Gen. 9, 56-68.

  3. Rost, B., Sander, C., Schneider, R. (1994). PHD - An automatic mail server for protein secondary structure prediction. CABIOS 10, 53-60.


Return to SUBMIT | HELP page.


Webmaster