Extracting data from BLAST databases with blastdbcmd(用blastdbcmd从BLAST数据库中提取数据)
Created: June 23, 2008; Updated: January 7, 2021.
Extract lowercase masked FASTA from a BLAST database with masking information(从具有掩码信息的BLAST数据库中提取小写掩码FASTA)
If a BLAST database contains masking information, this can be extracted using the blastdbcmd options –db_mask and –mask_sequence as follows:
如果BLAST数据库包含屏蔽信息,可以使用blastdbcmd选项 –db_mask 和 –mask_sequence 提取屏蔽信息,如下所示:
blastdbcmd -info -db mask-data-db
blastdbcmd -db mask-data-db -mask_sequence_with 20 -entry 71022837
Custom data extraction and formatting from a BLAST database(从BLAST数据库中自定义数据提取和格式化)
The following examples show how to extract selected information from a BLAST database and how to format it:
以下示例显示了如何从BLAST数据库中提取选定信息以及如何对其进行格式化:
blastdbcmd -entry 71022837 -db Test/mask-data-db -outfmt "%a %l %m"
Extract different sequence ranges from the BLAST databases(从BLAST数据库中提取不同的序列范围)
The command below will extract two different sequences: bases 40-80 in human chromosome Y (GI 13626247) with the masked regions in lowercase characters (notice argument 30, the masking algorithm ID which is available in this BLAST database) and bases 1-10 in the minus strand of human chromosome 20 (GI 14772189).
下面的命令将提取两个不同的序列:人类Y染色体的碱基40-80(GI 13626247),带小写字符的掩蔽区域(请注意参数30,该BLAST数据库中可用的掩蔽算法ID),以及人类20号染色体负链的碱基1-10(GI 14772189)。
Display the locations where BLAST will search for BLAST databases(显示BLAST将搜索BLAST数据库的位置)### Display the available BLAST databases at a given directory(显示给定目录下的可用BLAST数据库)
This is accomplished by using the -list option in blastdbcmd:
这是通过使用blastdbcmd中的-list选项实现的:
The first column of the default output is the file name of the BLAST database (usually provided as the –db argument to other BLAST+ applications), the second column represents the molecule type of the BLAST database. This output is configurable via the list_outfmt command line option.
默认输出的第一列是BLAST数据库的文件名(通常作为–db参数提供给其他BLAST+应用程序),第二列表示BLAST数据库的分子类型。此输出可通过 list_outfmt 命令行选项进行配置。
Use Windowmasker to filter the query sequence(s) in a BLAST search(使用Windowmasker在BLAST搜索中过滤查询序列)
Created: June 23, 2008; Updated: January 7, 2021.
The blastn executable can filter a query sequence using the windowmasker data files. This option can be used to mask interspersed repeats that may lead to spurious matches. The windowmasker data files should be created as discussed in step 1 of “Create masking information using windowmasker” or downloaded from the NCBI FTP site. Follow the instructions in Configuring BLAST to make sure BLAST will be able to find the windowmasker files in the examples below.
blastn可执行文件可以使用windowmasker数据文件过滤查询序列。此选项可用于屏蔽可能导致虚假匹配的散布重复。windowmasker数据文件应按照“使用windowmasker创建屏蔽信息”的步骤1中的讨论创建,或从NCBI FTP站点下载。按照配置BLAST中的说明,确保BLAST能够在下面的示例中找到windowmasker文件。