Leif™‎ > ‎

Setup Commands

leif taxid <taxid.git> <nodes.dmp> <names.dmp> <gi_taxid_nucl.dmp>

  Ex: leif taxid taxid.git nodes.dmp names.dmp gi_taxid_nucl.dmp

Parameter Description
taxid.git Output file which contains compressed "gi" and taxid information. This file is used by other Leif tools such as qblast, qbmajority and qbconsensus. This file is in a proprietary binary format.
nodes.dmp Input file which contains the taxid hierarchy. It can be downloaded from the NCBI Taxonomy FTP site as described below.
names.dmp Input file which contains the taxid names. It can be downloaded from the NCBI Taxonomy FTP site as described below.
gi_taxid_nucl.dmp Input file which contains the association between "gi" numbers and taxids. It can be downloaded from the NCBI Taxonomy FTP site as described below.

The taxid command reads three NCBI Taxonomy files which allow the conversion of "gi" numbers to taxids, and from taxids to English. The input files can be downloaded from the NCBI FTP site using the following commands:

wget ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz
wget ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
gzip -d gi_taxid_nucl.dmp.gz
gzip -d taxdump.tar.gz
tar xvf taxdump.tar nodes.dmp
tar xvf taxdump.tar names.dmp


leif fasta2fa <out.fa> <wildin.fa.gz> <taxid.git> <taxid0> [<taxid1> [...]]

  Ex 0: leif fasta2fa blast_nt_ebv.fa   blast_nt.fa.gz taxid.git 10376
  Ex 1: leif fasta2fa blast_nt_phage.fa blast_nt.fa.gz taxid.git 10841
  Ex 2: leif fasta2fa blast_nt_human.fa blast_nt.fa.gz taxid.git  9606

Parameter Description
out.fa Output file which contains FASTA entries whose "gi" number matched any of the specified taxids.
wildin.fa.gz Path to input FASTA files which will be analyzed. These files can be gzipped. Wildcards '*' '?' can be used to specify multiple files.
taxid.git Proprietary file generated by taxid command which allows the conversion of "gi" numbers into taxids.
taxid? Integer taxid of FASTA entries which must be written to the output file. Taxid numbers can be retrieved by typing into the "Organism" field in NCBI blast (blast.ncbi.nlm.nih.gov). You can also browse the names.dmp and nodes.dmp files from the NCBI Taxonomy FTP site.

The fasta2fa command reads FASTA entries in input files, and outputs select entries matching user specified taxids to the output file. This command is typically used to setup a custom filter dictionary.


leif fasta2fd [msb] <out.fd> <wildin.fa.gz>

 Ex 0: leif fasta2fd           ebv.fd   blast_nt_ebv.fa.gz
 Ex 1: leif fasta2fd 0x4 human_0x4.fd blast_nt_human.fa.gz
leif fasta2fd 0x5 human_0x5.fd blast_nt_human.fa.gz
leif fasta2fd 0x6 human_0x6.fd blast_nt_human.fa.gz
leif fasta2fd 0x7 human_0x7.fd blast_nt_human.fa.gz

Parameter Description
msb Optional parameter which forces only a subset of the filter dictionary to be output. Since fasta2fd can be slow, it is sometimes beneficial to run it in parallel by having each command output only a part of the filter dictionary. The final assembly of each part into a complete dictionary must be done using the fdmerge command. To split the into two parts, use msb codes 0x2 0x3. To split into four parts, use msb codes 0x4 0x5 0x6 0x7To split into eight parts, use msb codes 0x8 0x9 0xA 0xB 0xC 0xD 0xE 0xF.
out.fd Output file which contains sorted filter dictionary sequences in binary format. Sequences are written in 64 bit little-endian records with 32 bases coded as 0=A;1=C;2=G;3=T. The most significant bit pair contain the left most (5') base.
wildin.fa.gz Path to input FASTA files which will be converted into a filter dictionary. These files can be gzipped. Wildcards '*' '?' can be used to specify multiple files.

The fasta2fd command reads FASTA files and converts them into filter dictionary format (FD). Filter dictionaries can be used to align reads using the fastq2fx and fx2fx commands.


leif fdmerge <out.fd> <wildin0.fd> [<wildin1.fd> [...]] [!<wildexcl0.fd> [...]]

  Ex: leif fdmerge human.fd human_*.fd

Parameter Description
out.fd Output file which contains sorted filter dictionary sequences in binary format. Sequences are written in 64 bit little-endian records with 32 bases coded as 0=A;1=C;2=G;3=T. The most significant bit pair contain the left most (5') base.
wildin?.fd Path to input FD files which will be merged. Wildcards '*' '?' can be used to specify multiple files.
wildexcl?.fdPath to exclusion FD files which will prevent certain entries from being output. Wildcards '*' '?' can be used to specify multiple files. Exclusion paths must being with an exclamation point ("!"), and must be specified after wildin?.fd paths.

The fdmerge command reads FD files and merges them into a single filter dictionary file. This command is usually used to merge the output of fasta2fd when the msb parameter was used.


leif facheck <missing_gi_taxid.txt> <in.fa.gz> <taxid.git>

  Ex: leif facheck blast_nt_missing_gi.txt blast_nt.fa.gz taxid.git

Parameter Description
missing_gi_taxid.txt Output text file which contains a list of FASTA entries whose "gi" number cannot be converted into a taxid. This usually happens because the gi_taxid_nucl.dmp file downloaded from the NCBI Taxonomy FTP site is incomplete. Each of these FASTA entries are given the default taxid of 1, which is the "root" node. If these FASTA entries align with any reads, such reads will be deemed to belong to the taxonomic node "root", which is the least specific taxonomic label.
in.fa.gz Path to input FASTA file to be analyzedThis file can be gzipped.
taxid.git Proprietary file generated by taxid command which allows the conversion of "gi" numbers into taxids.

The facheck command verifies that all FASTA entries have a "gi" number which can be converted into a taxid. Entries for which this is not possible are listed in an output file. This command also indexes the input FASTA file for qblast, and automatically outputs a ".split" file in the same directory as the input FASTA file which is implicitly read by qblast. If this file does not exist or is out of date when qblast is run in multi core mode, it will automatically be created by qblast before the alignment process starts.


leif license

  Ex: leif license

The license command checks if the current license file is valid and displays information required to obtain a new license. The license file must be placed in the same directory as the leif.exe file, and be named leif_license.txt.

Comments