FastaValidator
a Java library to parse and validate FASTA formatted sequences
Performance of FastaValidator and three common bioinformatic frameworks (BioJava, BioPerl and BioPython) with six different datasets. Tests were performed on a standard Desktop-PC (Pentium IV; 3 GHz; 1 GB RAM; Ubuntu 10.4.3 server amd64). All tests were repeated 10 fold. Missing data indicate that the corresponding test failed. The test scripts are available on request.
| Tool | Dataset | Mode | average [ms] | min [ms] | max [ms] |
| biojava | E.coli K-12 (all genes) | protein | 2705.7 | 2675 | 2748 |
| bioperl | E.coli K-12 (all genes) | protein | 501 | 485 | 619 |
| biopython (pypy) | E.coli K-12 (all genes) | protein | 1002.1 | 987 | 1089 |
| biopython | E.coli K-12 (all genes) | protein | 2536.5 | 2492 | 2633 |
| fastavalidator | E.coli K-12 (all genes) | generic | 276.3 | 255 | 321 |
| fastavalidator | E.coli K-12 (all genes) | protein | 270 | 258 | 323 |
| biojava | E.coli K-12 (complete genome) | dna | 5028.3 | 4944 | 5386 |
| bioperl | E.coli K-12 (complete genome) | dna | 210.1 | 191 | 379 |
| biopython (pypy) | E.coli K-12 (complete genome) | dna | 1540.3 | 1501 | 1604 |
| biopython | E.coli K-12 (complete genome) | dna | 8107.9 | 7958 | 8354 |
| fastavalidator | E.coli K-12 (complete genome) | dna | 342.5 | 321 | 393 |
| fastavalidator | E.coli K-12 (complete genome) | generic | 344 | 325 | 417 |
| biojava | GOS Sampling Site (JCVI_SMPL_1103283000001) | dna | - | - | - |
| bioperl | GOS Sampling Site (JCVI_SMPL_1103283000001) | dna | 92693.6 | 92127 | 93772 |
| biopython (pypy) | GOS Sampling Site (JCVI_SMPL_1103283000001) | dna | 183596.8 | 179451 | 188090 |
| biopython | GOS Sampling Site (JCVI_SMPL_1103283000001) | dna | 1214955.7 | 1181343 | 1303798 |
| fastavalidator | GOS Sampling Site (JCVI_SMPL_1103283000001) | dna | 27124.1 | 24597 | 29289 |
| fastavalidator | GOS Sampling Site (JCVI_SMPL_1103283000001) | generic | 27414.1 | 24791 | 29132 |
| bioperl | SILVA 108 SSU Parc | dna | 327470.1 | 325995 | 330042 |
| biopython (pypy) | SILVA 108 SSU Parc | dna | 622445 | 608331 | 642292 |
| biopython | SILVA 108 SSU Parc | dna | 4258960.1 | 4177746 | 4324349 |
| fastavalidator | SILVA 108 SSU Parc | dna | 71758 | 64491 | 77298 |
| fastavalidator | SILVA 108 SSU Parc | generic | 70324.6 | 61121 | 77415 |
| biojava | SILVA 108 SSU Parc | dna | - | - | - |
| biojava | SWISSPROT database (all proteins) | protein | 409978.7 | 406525 | 416524 |
| bioperl | SWISSPROT database (all proteins) | protein | 60297.6 | 59884 | 61528 |
| biopython (pypy) | SWISSPROT database (all proteins) | protein | 65946.9 | 64608 | 66786 |
| biopython | SWISSPROT database (all proteins) | protein | 358254.8 | 351367 | 367870 |
| fastavalidator | SWISSPROT database (all proteins) | generic | 6697 | 6453 | 8080 |
| fastavalidator | SWISSPROT database (all proteins) | protein | 6585.8 | 5638 | 8109 |
| bioperl | SILVA 108 SSU reference dataset (aligned) | rna | 8581689.9 | 8404714 | 9025214 |
| fastavalidator | SILVA 108 SSU reference dataset (aligned) | generic | 747770.6 | 717921 | 810887 |
| fastavalidator | SILVA 108 SSU reference dataset (aligned) | rna | 745107.7 | 718090 | 750540 |
| biopython | SILVA 108 SSU reference dataset (aligned) | rna | - | - | - |
| biopython (pypy) | SILVA 108 SSU reference dataset (aligned) | rna | - | - | - |
| biojava | SILVA 108 SSU reference dataset (aligned) | rna | - | - | - |