FastaValidator
a Java library to parse and validate FASTA formatted sequences

Performance of FastaValidator and three common bioinformatic frameworks (BioJava, BioPerl and BioPython) with six different datasets. Tests were performed on a standard Desktop-PC (Pentium IV; 3 GHz; 1 GB RAM; Ubuntu 10.4.3 server amd64). All tests were repeated 10 fold. Missing data indicate that the corresponding test failed. The test scripts are available on request.

Tool Dataset Mode average [ms] min [ms] max [ms]
biojava E.coli K-12 (all genes) protein 2705.7 2675 2748
bioperl E.coli K-12 (all genes) protein 501 485 619
biopython (pypy) E.coli K-12 (all genes) protein 1002.1 987 1089
biopython E.coli K-12 (all genes) protein 2536.5 2492 2633
fastavalidator E.coli K-12 (all genes) generic 276.3 255 321
fastavalidator E.coli K-12 (all genes) protein 270 258 323
biojava E.coli K-12 (complete genome) dna 5028.3 4944 5386
bioperl E.coli K-12 (complete genome) dna 210.1 191 379
biopython (pypy) E.coli K-12 (complete genome) dna 1540.3 1501 1604
biopython E.coli K-12 (complete genome) dna 8107.9 7958 8354
fastavalidator E.coli K-12 (complete genome) dna 342.5 321 393
fastavalidator E.coli K-12 (complete genome) generic 344 325 417
biojava GOS Sampling Site (JCVI_SMPL_1103283000001) dna - - -
bioperl GOS Sampling Site (JCVI_SMPL_1103283000001) dna 92693.6 92127 93772
biopython (pypy) GOS Sampling Site (JCVI_SMPL_1103283000001) dna 183596.8 179451 188090
biopython GOS Sampling Site (JCVI_SMPL_1103283000001) dna 1214955.7 1181343 1303798
fastavalidator GOS Sampling Site (JCVI_SMPL_1103283000001) dna 27124.1 24597 29289
fastavalidator GOS Sampling Site (JCVI_SMPL_1103283000001) generic 27414.1 24791 29132
bioperl SILVA 108 SSU Parc dna 327470.1 325995 330042
biopython (pypy) SILVA 108 SSU Parc dna 622445 608331 642292
biopython SILVA 108 SSU Parc dna 4258960.1 4177746 4324349
fastavalidator SILVA 108 SSU Parc dna 71758 64491 77298
fastavalidator SILVA 108 SSU Parc generic 70324.6 61121 77415
biojava SILVA 108 SSU Parc dna - - -
biojava SWISSPROT database (all proteins) protein 409978.7 406525 416524
bioperl SWISSPROT database (all proteins) protein 60297.6 59884 61528
biopython (pypy) SWISSPROT database (all proteins) protein 65946.9 64608 66786
biopython SWISSPROT database (all proteins) protein 358254.8 351367 367870
fastavalidator SWISSPROT database (all proteins) generic 6697 6453 8080
fastavalidator SWISSPROT database (all proteins) protein 6585.8 5638 8109
bioperl SILVA 108 SSU reference dataset (aligned) rna 8581689.9 8404714 9025214
fastavalidator SILVA 108 SSU reference dataset (aligned) generic 747770.6 717921 810887
fastavalidator SILVA 108 SSU reference dataset (aligned) rna 745107.7 718090 750540
biopython SILVA 108 SSU reference dataset (aligned) rna - - -
biopython (pypy) SILVA 108 SSU reference dataset (aligned) rna - - -
biojava SILVA 108 SSU reference dataset (aligned) rna - - -

Last update: March 23, 2012 - Disclaimer