.msf
From BITS wiki
GCG/MSF Format
- The file may begin with as many lines of comment or description as required.
- The comments are terminated with a line starting with two slashes.
- The first mandatory line that is recognised as part of the MSF file is the line containing the text "MSF:", this line also includes the sequence length, type and date plus an internal check sum value.
- The next line is a mandatory blank line inserted before the sequence names.
- There then follows one line per sequence describing the sequence name, length, checksum and a weight value. Only one name per line is allowed; the qualifier "Name: " is followed by the sequence name. Names are restricted to 10 characters or less. Extra characters, between the sequence names and "Len: " are acceptable if they contain no blank characters. Another blank line is added followed by a line starting with two slashes "//" , this indicates the end of the name list.
- There then follows another blank line.
- Sequences are interleaved on separate lines with gaps represented by periods. Each sequence line starts with the sequence name which is separated from the aligned sequence residues by white space.
Example
MSF: 510 Type: P Check: 7736 .. Name: ACHE_BOVIN oo Len: 510 Check: 7842 Weight: 16.0 Name: ACHE_HUMAN oo Len: 510 Check: 8553 Weight: 17.8 Name: ACHE_MOUSE oo Len: 510 Check: 229 Weight: 12.5 Name: ACHE_RAT oo Len: 510 Check: 8410 Weight: 14.2 Name: ACHE_XENLA oo Len: 510 Check: 2702 Weight: 39.2 // ACHE_BOVIN MAGALLCALL LLQLLGRGEG KNEELRLYHY LFDTYDPGRR PVQEPEDTVT ACHE_HUMAN MARAPLGVLL LLGLLGRGVG KNEELRLYHH LFNNYDPGSR PVREPEDTVT ACHE_MOUSE MAGALLGALL LLTLFGRSQG KNEELSLYHH LFDNYDPECR PVRRPEDTVT ACHE_RAT MTMALLGTLL LLALFGRSQG KNEELSLYHH LFDNYDPECR PVRRPEDTVT ACHE_XENLA MESGVRILSL LILLHNSLAS ESEESRLIKH LFTSYDQKAR PSKGLDDVVP ACHE_BOVIN ISLKVTLTNL ISLNEKEETL TTSVWIGIDW QDYRLNYSKG DFGGVETLRV ACHE_HUMAN ISLKVTLTNL ISLNEKEETL TTSVWIGIDW QDYRLNYSKD DFGGIETLRV ACHE_MOUSE ITLKVTLTNL ISLNEKEETL TTSVWIGIDW HDYRLNYSKD DFAGVGILRV ACHE_RAT ITLKVTLTNL ISLNEKEETL TTSVWIGIEW QDYRLNFSKD DFAGVEILRV ACHE_XENLA VTLKLTLTNL IDLNEKEETL TTNVWVQIAW NDDRLVWNVT DYGGIGFVPV