Leif™‎ > ‎


qblast runs slowly on human reads
When human reads reach qblast, these reads often align to many NCBI sequences, slowing alignment by a factor of up to 10 times. Different techniques can be used to eliminate human reads beyond the crude filter applied in fastq2fx. For example, fx2fx's filter function can be used to more accurately detect human reads. When sequencing runs have a high human genome coverage (>10x), most human reads will be assembled into "contig-like groups" containing multiple read pairs; in contrast, reads originating from low abundance microbes (with of coverage of <0.2x) should generally not assemble into large groups. Thus, "contig-like groups" with say two or more read pairs can be deemed human and eliminated (there is currently no command that does this automatically). Finally, it is possible to run qblast in two or more steps, first aligning only to human sequences in the "nt" database, discarding human reads, and then aligning remaining reads against all the NCBI BLAST databases including "nt", "human_genomic", "other_genomic" and "wgs". An example of this technique is shown below.

:: Fast primate alignment, with word_length (eg. seed_length) of 20 nt.
echo word_length    =  20;                                                      > qblast_homo20_settings.txt
echo dust           =   1;                                                     >> qblast_homo20_settings.txt
echo dual_align_pct = 101;                                                     >> qblast_homo20_settings.txt
echo num_genus      =   1;                                                     >> qblast_homo20_settings.txt
echo num_species    =   1;                                                     >> qblast_homo20_settings.txt
echo num_consensus  =   1;                                                     >> qblast_homo20_settings.txt
echo score_gi_age   =  4097607; // 1999                                        >> qblast_homo20_settings.txt
echo score_taxid= 9443;    // Primates                                         >> qblast_homo20_settings.txt
leif qblast qblast_homo20_settings.txt taxid.git blast_nt.fa.gz 120430_step6.fx
leif qbmajority single 70 50 120430_primate20.qb 120430_step7.qb 120430_step6.qb taxid.git Taxid 9443
leif qblowhom 101 120430_step8.fx 120430_step7.qb

:: Slow primate alignment, word_length (eg. seed_length) of 10 nt.
echo word_length    =  10;                                                      > qblast_homo10_settings.txt
echo dust           =   1;                                                     >> qblast_homo10_settings.txt
echo dual_align_pct = 101;                                                     >> qblast_homo10_settings.txt
echo num_genus      =   1;                                                     >> qblast_homo10_settings.txt
echo num_species    =   1;                                                     >> qblast_homo10_settings.txt
echo num_consensus  =   1;                                                     >> qblast_homo10_settings.txt
echo score_gi_age   =  4097607; // 1999                                        >> qblast_homo10_settings.txt
echo score_taxid= 9443;    // Primates                                         >> qblast_homo10_settings.txt
leif qblast qblast_homo10_settings.txt taxid.git blast_nt.fa.gz 120430_step8.fx
leif qbmajority single 70 50 120430_primate10.qb 120430_step9.qb 120430_step8.qb taxid.git Taxid 9443
leif qblowhom 101 120430_step9.fx 120430_step8.qb

:: Complete (and slower) qblast run.
leif qblast qblast_settings.txt taxid.git blast_*.fa.gz 120430_step9.fx