What is NCBI Virus?(什么是NCBI病毒)
主要功能:
- Compare your sequence to those in the NCBI Virus database using NCBI BLAST algorithm.
使用NCBI BLAST算法将您的序列与NCBI病毒数据库中的序列进行比较。 - Search, view and download nucleotide and protein sequences using virus name or taxonomy group.
使用病毒名称或分类组搜索、查看和下载核苷酸和蛋白质序列。 - Quickly access common data sets for all viruses, all human viruses, bacteriophages, or sequences released in the past month.
快速访问过去一个月发布的所有病毒、所有人类病毒、噬菌体或序列的通用数据集。 - Explore the massive, normalized datasets and identify data trends.
探索庞大的标准化数据集并确定数据趋势。
Ways to access NCBI Virus data(访问NCBI病毒数据的方法)
Select one of the three options to access NCBI Virus data.
从三个选项中选择一个以访问NCBI病毒数据。
Option 1:
Through the navigation menu in Find data tab select one of the drop-down links:
通过“查找数据”选项卡中的导航菜单,选择其中一个下拉链接:
Search by sequence to use virus-specific NCBI BLAST tool.
按序列搜索以使用病毒特异性NCBI BLAST工具
Search by virus to perform virus sequence search based on virus name or taxonomy.
按病毒搜索以根据病毒名称或分类法执行病毒序列搜索
All viruses, Human viruses, Bacteriophages, New sequences (past one month) and Available SARS-CoV-2 sequences to view preselected data sets.
所有病毒、人类病毒、噬菌体、新序列(过去一个月)和可用的严重急性呼吸系统综合征冠状病毒2型序列以查看预选数据集。
Option 2:
The same functionalities can be accessed through the buttons Search by sequence and Search by virus located on NCBI Virus home page.
可以通过NCBI病毒主页上的“按序列搜索”和“按病毒搜索”按钮访问相同的功能。
The results can be viewed in the Results Table, and further refined by utilizing the sequence attributes (metadata) in the Refine Results panel located on the right side of the table. Additionally, you can download the results, conduct multiple sequence alignments, and generate phylogenetic trees using the selected results.
结果可以在“结果表”中查看,并通过使用位于表右侧的“优化结果”面板中的序列属性(元数据)进行进一步优化。此外,您可以下载结果,进行多序列比对,并使用所选结果生成系统发育树。
Option 3:
Through NCBI Visual Data Dashboard via statistics buttons located in the top row of the dashboard.
通过位于仪表板顶行的统计按钮,通过NCBI可视化数据仪表板。
NCBI Virus BLAST™ tool
The NCBI Virus BLAST™ tool provides rapid insight into query sequences by presenting BLASTn and BLASTp results alongside normalized metadata, when available. (NCBI Virus BLAST™工具通过在可用的情况下显示BLASTn和BLASTp结果以及标准化元数据,提供对查询序列的快速洞察。)These attributes include: isolation source, host, country, collection and release date, as well as taxonomy and genetic attributes such as completeness, and segment or protein names when applicable. (这些属性包括:分离来源、宿主、国家、收集和发布日期,以及分类学和遗传属性,如完整性,以及片段或蛋白质名称(如适用)。)The normalized metadata is generated via an internal, curator-guided data-processing pipeline that maps sequence-record attributes to standardized vocabularies to provide a user-friendly view of the data.(规范化元数据是通过一个内部的、由策展人引导的数据处理管道生成的,该管道将序列记录属性映射到标准化词汇表,以提供用户友好的数据视图。)
Compare your sequence to those in the NCBI Virus database using the BLAST algorithm
使用BLAST算法将您的序列与NCBI病毒数据库中的序列进行比较
Press on the button Search by sequence (or select this option from the Find data navigation tab on the top of the page).
按“按序列搜索”按钮(或从页面顶部的“查找数据”导航选项卡中选择此选项)。
Select Nucleotide or Protein tab. Nucleotide tab allows to perform BLASTn search (search against all NCBI virus nucleotide sequences). Protein tab allows to perform BLASTp search (search against all NCBI virus protein sequences). Read more about BLAST™ searches at NCBI BLAST Guide.
选择核苷酸或蛋白质选项卡。核苷酸选项卡允许执行BLASTn搜索(针对所有NCBI病毒核苷酸序列进行搜索)。蛋白质标签允许进行BLASTp搜索(针对所有NCBI病毒蛋白质序列进行搜索)。有关BLAST™搜索的更多信息,请访问NCBI BLAST指南。
In NCBI Virus Search by sequence input form enter NCBI sequence accession sequence in plain text or FASTA format and click Start search.
在NCBI病毒按序列搜索输入表单中,以纯文本或FASTA格式输入NCBI序列accession序列,然后单击开始搜索。
The BLAST search results will open in a new window, presented in a tabulated format (the Results Table).
BLAST搜索结果将在一个新窗口中打开,以列表格式显示(结果表)。
Compare your sequences to the sequences in up-to-date Betacoronavirus database
将您的序列与最新Betacoronavirus数据库中的序列进行比较
To accommodate the SARS-CoV-2 outbreak(爆发 ; 爆发,突然发生) the Betacoronavirus blast database was created. It is regularly updated and includes all sequences from the genus(属 ) Betacoronavirus. To search your sequence in Betacoronavirus database using BLAST:
为了适应严重急性呼吸系统综合征冠状病毒2型的爆发,创建了Betacoronavirus blast数据库。它定期更新,包括Betacoronavirus属的所有序列。要使用BLAST在Betacoronavirus数据库中搜索您的序列:
Press on the button Search by sequence (or select this option from the Find data navigation tab on the top of the page).
按“按序列搜索”按钮(或从页面顶部的“查找数据”导航选项卡中选择此选项)。
Select Nucleotide or Protein tab. 选择核苷酸或蛋白质选项卡。
In NCBI Virus Search by sequence input form enter NCBI sequence accession sequence in plain text or FASTA format and click Search up-to-date Betacoronavirus DB button.
在NCBI病毒按序列搜索输入表中,以纯文本或FASTA格式输入NCBI序列accession序列,然后单击搜索最新的Betacoronavirus DB按钮。
The BLAST search results will open in a separate window in a tabular format (the Results Table).
BLAST搜索结果将在一个单独的窗口中以表格格式打开(结果表)。
Compare BLAST results in the Results Table
Nucleotide tab allows to perform BLASTN search (using Megablast - optimize for highly similar sequences - search against all NCBI virus nucleotide sequences).
核苷酸选项卡允许执行BLASTN搜索(使用Megablast-优化高度相似的序列-搜索所有NCBI病毒核苷酸序列)。
Protein tab allows to perform BLASTP search (search against all NCBI virus protein sequences). Read more about BLAST algorithms on NCBI BLAST help documentation.
蛋白质标签允许进行BLASTP搜索(针对所有NCBI病毒蛋白质序列进行搜索)。在NCBI BLAST帮助文档中关于BLAST算法的信息。
In BLAST search Results Table you can compare search results in tabular display using the following sortable default columns:
在BLAST搜索结果表中,您可以使用以下可排序的默认列在表格显示中比较搜索结果:
Accession - the NCBI accession number of the NCBI Virus database sequence. Reference sequence accessions marked with label “RefSeq”.
Accession-NCBI病毒数据库序列的NCBI Accession号(登录号 ; 检索号 ; 收录号 ; 存取号 )。标记有标签“RefSeq”的参考序列accessions。
Coverage - query coverage. 覆盖率-查询覆盖率。
Identity - the highest percent identity of all query-subject alignments.
相似性-所有查询-主题对齐的最高相似性百分比。
Submitters(Submitter 递交者信息) - authors submitted the sequence. Only first submitter’s name is displayed in the column (for example, Baranov,P.V., et al.). To obtain a full list of submitters, click on sequence accession number, this will open the details menu. Click on accession number in the details panel, this will open GenBank Entrez page with all information available for the selected sequence. Alternatively, you can use Download button with CSV format option. The column “Submitters” in the downloaded table will contain the name of all authors submitted each sequence.
提交者-作者提交了序列。列中只显示第一个提交者的姓名(例如,Baranov,P.V.等人)。要获得提交者的完整列表,请单击序列accession号,这将打开详细信息菜单。点击详细信息面板中的accession号,这将打开GenBank Entrez页面,其中包含所选序列的所有可用信息。或者,您可以使用带有CSV格式选项的下载按钮。下载表格中的“提交者”列将包含每个序列提交的所有作者的姓名。
Release date - the date when sequence was released (publicly appeared) in GenBank or other INSDC databases.
发布日期-序列在GenBank或其他INSDC数据库中发布(公开出现)的日期。
Isolate - Individual isolate from which the sequence was obtained, typically an alphanumeric sample ID. Isolate name parsed from “/isolate” field of GenBank record. SARS-CoV-2 sequence isolate name is formatted according to the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) definitions.
描述获得该样本的特征性信息,用于显示独特性,有助于多个样本间的辅助性区分
如 “isolate: Han”,表示样本来自于特定人群;“isolate: Prostate Cancer Cell Line”,表示样本来源于特定类型细胞。
隔离物-从中获得序列的单个隔离物,通常是字母数字样本ID。从GenBank记录的“/隔离物”字段解析的隔离物名称。严重急性呼吸系统综合征冠状病毒2型序列分离物名称是根据国际病毒分类委员会(ICTV)冠状病毒科研究小组的定义格式化的。???
Species – virus species name. 物种——病毒物种名称
Molecule type - viral nucleic acid type. Molecule type is provided by International Committee on Taxonomy of Viruses (ICTV) in the Master Species List and maintained in the NCBI Taxonomy database. RefSeqs that have “Unknown” molecule type belong to tax groups which were not recognized by the ICTV yet.
分子型-病毒核酸型。分子类型由国际病毒分类委员会(ICTV)在《主要物种名录》中提供,并保存在NCBI分类数据库中。具有“未知”分子类型的RefSeqs属于尚未被ICTV承认的tax groups。
Length - sequence length. Length—序列长度
Geo(地理) Location - country/region of virus specimen(样品 ; 样本 ; 标本 ; 抽样,血样,尿样 ; 单一实例) collection. May contain additional geographic information, for example, US state.
地理位置-病毒样本采集的国家/地区。可能包含其他地理信息,例如美国。
BLAST results can be customized by adding/removing additional columns from the Results Table in Select columns drop-down menu.
BLAST结果可以通过在“选择列”下拉菜单的“结果表”中添加/删除其他列进行自定义。
Additional columns include:
USA. If the sample was collected in the United States, the column shows the state abbreviation.
美国。如果样本是在美国采集的,则该栏显示国家缩写。
Host(样本来源生物的天然(非实验室)宿主物种学名,即拉丁名) – virus isolation host (read more about isolation host vocabulary mapping). If isolation host is unknown (/host field of the GenBank record), but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the Results Table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the host column of the table.
host-病毒隔离host(阅读有关隔离host词汇映射的更多信息)。如果隔离host未知(GenBank记录的/host字段),但实验室host存在(如GenBank记录中的/lab_host字段所示),则实验室host将出现在结果表的host列中。如果隔离host和实验室host都可以映射,则表的host列中只显示隔离host。
Collection Date – virus specimen collection date.
采集日期–病毒样本采集日期。
SRA accession - NCBI Sequence Read Archive (SRA) accession number.
SRA accession-NCBI序列读取档案(SRA)accession号。
Score - the total alignment scores (Total score) from all alignment segments.
分数-所有对齐线段的总对齐分数(总分)。
Genus. 属
Family. 家族
Sequence type – complete/partial/proviral/refseq read more about sequence type here.
序列类型–完整/部分/provial/refseq点击此处了解更多关于序列类型的信息。
Nuc completeness - nucleotide completeness (note: it is preliminary data, not always accurate).
核苷酸完整性-核苷酸完整性(注:这是初步数据,并不总是准确的)。
Genotype. 基因型
Segment – segment name in case of segmented viruses.
Segment–如果是分段病毒,则为分段名称
Publications - links to the associated with sequences publications in PubMed.
出版物-PubMed中与序列相关的出版物的链接
Country - country of specimen collection (only country, no any additional information).
国家-标本采集的国家(仅国家,无任何其他信息)。
Isolation source – sequence isolation source read more about isolation source here.
隔离源-序列隔离源在这里关于隔离源的信息。
BioSample – NCBI BioSample accession number.
BioProject – NCBI BioProject accession number.
GenBank title.
The default number of rows displayed in the Results Table is 200. You can change the number of table rows by selecting number results per page (200, 100, 50 or 25) in Select Columns menu.
“结果表”中显示的默认行数为200。通过在“选择列”菜单中选择每页的结果数(200、100、50或25),可以更改表格行数。
View BLAST Alignment of selected sequences
To compare search results in pair-wise alignment:
要比较成对排列的搜索结果,请执行以下操作:
Select sequences to display.
选择要显示的序列。
Click on View BLAST Alignment of selected sequences link displayed in the center of the Info panel located above the Results Table.
点击结果表上方信息面板中央显示的查看所选序列的BLAST比对链接。
The new page will show a graphical view of pairwise alignments between selected BLAST results and the query, along with a feature map (if available) of the query at the top of the view.
新页面将显示所选BLAST结果和查询之间成对比对的图形视图,以及视图顶部的查询特征图(如果可用)。
Read more how to use alignment viewer please refer to NCBI Multiple Sequence Alignment Viewer documentation.
有关如何使用比对查看器的更多信息,请参阅NCBI多序列比对查看器文档。
Build multiple sequence alignment of selected BLAST results
构建所选BLAST结果的多序列比对
To build multiple sequences alignment based on selected BLAST results:
为了基于所选择的BLAST结果构建多个序列比对:
Select sequences that you want to align.
选择要对齐的序列。
Press the button Align on the right above the Results Table.
按下“结果表”上方右侧的“对齐”按钮。
Multiple sequence alignment will open at the new page. Multiple sequence alignments calculated using MUSCLE.
多序列对齐将在新页面打开。使用MUCLE计算的多序列比对。
Read more how to use alignment viewer please refer to NCBI Multiple Sequence Alignment Viewer documentation.
有关如何使用比对查看器的更多信息,请参阅NCBI多序列比对查看器文档。
Build phylogenetic tree of selected BLAST results
构建所选BLAST结果的系统发育树
To build a phylogenetic tree to see the relationships of selected sequences:
要构建系统发育树以查看所选序列的关系,请执行以下操作:
Select sequences to display. 选择要显示的序列。
Press the button labeled Build Phylogenetic Tree on the right above the Results Table.
按下结果表右上方标有“构建系统发育树”的按钮。
The tree will be calculated and available in tree viewer on a separate page.
该树将在单独的页面上计算并在树查看器中可用。
For more about Tree Viewer and how to use it, please refer to NCBI Tree Viewer help documentation located here.
有关树查看器及其使用方法的更多信息,请参阅此处的NCBI树查看器帮助文档。
Refine tabular BLAST results via filters:通过过滤器优化表格BLAST结果:
1. Virus name or taxonomy 病毒名称或分类
To Restrict search results to the particular virus group:要将搜索结果限制为特定的病毒组,请执行以下操作:
On BLAST result page in Refine Results panel (left upper corner) click on Virus.在优化结果面板(左上角)的BLAST结果页面上,单击病毒。
In the text box paste or start typing a single virus taxonomy name, or taxid (only 5 top taxa will be shown).在文本框中粘贴或开始键入单个病毒分类名称,或滑行(只显示5个顶部分类群)。
Select your taxid (NCBI taxonomy database ID) from the fly-out menu.从弹出菜单中选择您的taxid(NCBI分类数据库ID)。
The filtered results will be presented in the Results Table with the following 5 default sortable columns: accession, coverage, identity, species, country, host, collection date. Additional columns to display connected metadata can be added via the Customize Table menu. The query sequence will be highlighted in the first row of the table.
过滤后的结果将显示在结果表中,其中包含以下5个默认可排序列:加入、覆盖、身份、物种、国家、宿主、采集日期。可以通过“自定义表”菜单添加其他列以显示连接的元数据。查询序列将在表的第一行中突出显示。
2. Accession
You can search for the particular accessions in the Results Table by entering them in the search form under the Accession filter. The results on the table will be limited to the entered accession numbers.
您可以在结果表中的accession过滤器下的搜索表单中输入特定的accession信息。表中的结果将仅限于输入的accession号。
3. Sequence length
To restrict your results to the particular sequence length, enter the minimum and maximum length in nucleotides (for nucleotide search) or amino acids (for protein search).
要将结果限制为特定的序列长度,请输入核苷酸(用于核苷酸搜索)或氨基酸(用于蛋白质搜索)的最小和最大长度。
4. Ambiguous Characters
允许在结果表上设置每个序列中所需的最大模糊字符数(核苷酸中的N或蛋白质中的X)。
5. Sequence type
All sequences (Nucleotide or Protein) available in the NCBI Virus resource can be filtered based on following sequence types - GenBank and RefSeq.
NCBI病毒资源中可用的所有序列(核苷酸或蛋白质)都可以根据以下序列类型进行过滤——GenBank和RefSeq。
GenBank sequences include all sequences available in GenBank, except RefSeqs.
GenBank序列包括GenBank中可用的所有序列,参考序列除外。
Refseq filtered nucleotide sequences include all reference sequences for the selected virus. Note, that few RefSeqs are partial genomes, based on the International Committee on Taxonomy of Viruses (ICTV) proposal.
Refseq过滤的核苷酸序列包括所选病毒的所有参考序列。请注意,根据国际病毒分类委员会(ICTV)的建议,很少有参考序列是部分基因组。
6. RefSeq genome completeness
Complete or partial RefSeq genomes - filter for all complete (or partial) genomes, reference records (RefSeqs), and proteins form these RefSeqs. In case of segmented viruses complete genomes contain all genome segments. Most of RefSeq records are complete, but few RefSeqs are partial, based on International Commitee on Taxonomy of Viruses (ICTV) proposal.
完整或部分RefSeq基因组-过滤所有完整(或部分)基因组、参考记录(RefSeq)和形成这些RefSeq的蛋白质。在分段病毒的情况下,完整的基因组包含所有的基因组片段。根据国际病毒分类委员会(ICTV)的建议,大多数RefSeq记录是完整的,但很少有RefSeq是部分的。
7. Nucleotide completeness
Complete nucleotide sequences - filter for all NCBI viral nucleotide sequences, where GenBank ASN.1 format contains the following descriptors: descr/molinfo/completeness=complete or there is a word ‘complete’ present in the record’s definition line (defline). It also includes complete reference records (RefSeqs).
完整核苷酸序列-过滤所有NCBI病毒核苷酸序列,其中GenBank ASN.1格式包含以下描述符:descr/molifo/complety=完整或记录的定义行(defline)中存在“完整”一词。它还包括完整的参考记录(参考序列)。
Partial nucleotide sequence – filter for sequences that are not complete according to the definition above.
部分核苷酸序列-根据上述定义过滤不完整的序列。
If Protein tab selected and complete nucleotide sequence type filter applied, results will include all proteins from complete genomes or individual complete segments in case of segmented viruses.
如果选择了蛋白质标签并应用了完整核苷酸序列类型过滤器,结果将包括来自完整基因组的所有蛋白质,或者在分段病毒的情况下包括单个完整片段。
8. Isolate
Isolate - individual isolate from which the sequence was obtained, typically an alphanumeric sample ID. Isolate name parsed from “/isolate” field of GenBank record. SARS-CoV-2 sequence isolate name is formatted according to the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) definitions.
隔离物-从中获得序列的单个隔离物,通常是字母数字样本ID。从GenBank记录的“/隔离物”字段解析的隔离物名称。SARS-CoV-2序列分离物名称是根据国际病毒分类委员会(ICTV)冠状病毒科研究小组的定义格式化的。
9. Proteins
Protein name parsed from “/product=” field of GenBank nucleotide and protein records
==从GenBank核苷酸和蛋白质记录的“/product=”字段解析的蛋白质名称 ==
10. Provirus
Provirus sequences - filter for sequences that have “/proviral” source qualifier in the GenBank record.
Provirus序列-筛选GenBank记录中具有“/proval”源限定符的序列。
11. Geographic region
The Geographic region filter allows you to type your country of interest in the text box or select the continent(s) of interest. Selecting a continent also selects all the countries within that continent automatically.
地理区域过滤器允许您在文本框中键入感兴趣的国家或选择感兴趣的大陆。选择一个大陆也会自动选择该大陆内的所有国家。
Clicking on the arrow next to a continent’s name opens a secondary selection menu to (un)select the country(s) belonging to the continent of interest. The selected countries are listed below the continent name.
点击大陆名称旁边的箭头打开一个二级选择菜单,以(取消)选择属于感兴趣大陆的国家。所选国家列在大陆名称下方。
If an entire continent is selected, the continent’s name will be shown in a pillbox below, indicating that all countries for the continent are selected. If at least one country is selected, the corresponding continent is no longer displayed and instead, a pillbox for each selected country is shown below the associated continent. Each continent’s behavior is independent of the other continents.
如果选择了整个大陆,则该大陆的名称将显示在下面的pillbox中,表示该大陆的所有国家都已选择。如果至少选择了一个国家,则不再显示相应的大陆,而是在相关大陆下方显示每个选定国家的pillbox。每个大陆的行为都独立于其他大陆。
Selection can be deselected by clicking on the pillboxes, and multiple concurrent selections are supported.
点击pillboxes可以取消选择,并且支持多个同时选择。
12. Isolation host or taxonomy
Enter a host name or taxid to the text box and several host terms will be suggested (only 20 top taxids will be shown). Select the desired host term and hit Enter. The results will be restricted to sequences in the database with the indicated host term. Multiple hosts can be filtered on simultaneously by adding additional host terms to the filter.
在文本框中输入host名或taxid,将建议使用几个host术语(只显示20个顶部taxids)。选择所需的host术语,然后点击Enter。结果将仅限于数据库中具有指定宿主项的序列。通过向筛选器中添加其他host项,可以同时筛选多个host。
The terms for isolation host are parsed from the source/host field in a sequence’s GenBank record. Parsed terms are mapped to a standardized vocabulary, which was derived by curators by aggregating the variety of terms in GenBank files. Common mis-spellings are also included in this mapping strategy. For example, “Accipter cooperii” is mapped to “Accipiter cooperii”.
隔离host的术语是从序列的GenBank记录中的source/host字段中解析的。解析后的术语被映射到标准化词汇表,该词汇表是由curators通过汇总GenBank文件中的各种术语而导出的。常见的拼写错误也包括在这个映射策略中。例如,“Accipeter cooperii”映射为“Accipiter cooperii“。
The terms for isolation hosts are displayed in the host column of the Results Table. In case if the isolation source is unknown, but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the Results Table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the table (host column).
隔离host 的术语显示在“结果表”的host 列中。如果隔离源未知,但实验室host 存在(如GenBank记录的/lab_host字段所示),则实验室host 将出现在结果表的host 列中。如果隔离host 和实验室host 都可以映射,则表中只显示隔离host (host 列)。
13. Submitters
To search for sequences submitted by a particular author(s) enter the author’s last names with or without initials.
要搜索特定作者提交的序列,请输入作者的姓氏(带或不带首字母缩写)。
The following formats are supported: 支持以下格式:
Chiang,T.Y. Forsyth,K.A. Knittig,L.C. Lim,O.P. Chiang,T.Y., Forsyth,K.A., Knittig,L.C., Lim,O.P. Chiang Forsyth Knittig Lim Chiang, Forsyth, Knittig, Lim
14. Isolation source
The terms for isolation source are parsed from the isolation source field in a sequence’s GenBank record. Examples of parsed terms are serum and plasma, which are all mapped to the standardized vocabulary term blood.
隔离源的术语是从序列的GenBank记录中的隔离源字段中解析的。解析术语的例子是血清和血浆,它们都映射到标准化词汇术语血液。
Common mis-spelling as well as regional spelling differences are included in the mapping strategy. Multiple terms can be selected.
映射策略中包括常见的拼写错误以及区域拼写差异。可以选择多个术语。
15. Sample collection date
Collection date (From, To) - is the collection date for the sample from which the sequence was derived.
采集日期(From,To)-是衍生序列的样本的采集日期。
By default, the To: date is set to the current date.
默认情况下,“截止日期”设置为当前日期。
Use mm/dd/yyyy or yyyy formats or click on the calendar icon(图标 ; 偶像 ; 图符 ; 圣像 ; 崇拜对象 ) and select dates.
使用mm/dd/yyyy或yyyy格式,或单击日历图标并选择日期。
16. Sequence release date
Release date (From, To) – the date when sequence was released (publicly appeared) in GenBank or another INSDC database.
发布日期(从,到)-序列在GenBank或另一个INSDC数据库中发布(公开出现)的日期。
By default, the To: date is set to the current date.
默认情况下,“截止日期”设置为当前日期。
Use mm/dd/yyyy or yyyy formats or click on the calendar icon and select dates.
使用mm/dd/yyyy或yyyy格式,或单击日历图标并选择日期。
17. Environmental sourse
Environmental source filter allows to select virus sequences isolated from the environmental sources. Generally, environmental isolates are identified by searching for key terms, such as sewage or ocean water from /isolation_source and /note fields of GenBank records when /host field is empty.
环境源过滤器允许选择从环境源分离的病毒序列。通常,环境隔离物是通过搜索关键术语来识别的,例如当/host字段为空时,来自GenBank记录的/inisolation_source和/note字段的污水或海水。???
Select Include - to include all sequences isolated from environmental sources to the Results Table.
选择Include(包括)-将从环境源分离的所有序列包括在Results Table(结果表)中。
Select Exclude - to exclude all sequences isolated from environmental sources to the Results Table.
选择“排除”-将从环境源隔离的所有序列排除到“结果表”中。
Select Only - to view only sequences isolated from environmental sources.
选择“仅”-仅查看与环境源隔离的序列。???
18. Laboratory samples
Lab host filter allows to view laboratory isolated virus sequences. Lab host identified by searching lab host name in /lab_host field of GenBank record. Additionally (only for bacteriophages) if /host and /lab_host fields are empty, lab host identified by parsing lab host name from bacteriophage organism name of GenBank record.
实验室宿主过滤器允许查看实验室分离的病毒序列。通过在GenBank记录的/Lab_host字段中搜索实验室host名来识别实验室host。此外(仅适用于噬菌体),如果/host和/lab_host字段为空,则通过从GenBank记录的噬菌体生物名称中解析实验室宿主名称来识别实验室宿主。
Select Include - to include all laboratory isolated virus sequences to the Results Table.
选择包括-将所有实验室分离的病毒序列包括在结果表中
Select Exclude - to exclude all laboratory isolated virus sequences to the Results Table.
选择排除-将所有实验室分离的病毒序列排除到结果表中
Select Only - to view only laboratory isolated virus sequences.
仅选择-仅查看实验室分离的病毒序列
Note: lab host name can be viewed in the result table (in host column) only in cases when the isolation host cannot be identified (/host field of GenBank record is empty).
注意:只有在无法识别隔离host (GenBank记录的/host字段为空)的情况下,才能在结果表(host 列)中查看实验室host 名。
19. Vaccine strain
Vaccine strain filter allows to find virus vaccine strain sequences. Vaccine strains identified by searching vaccine strain terms in /isolation_source, /note, /host and definition line of GenBank record.
疫苗菌株过滤器可以找到病毒疫苗菌株序列。通过在GenBank记录的/sisolation_source、/note、/host和定义行中搜索疫苗菌株术语来识别疫苗菌株。
Select Include - to include all virus vaccine strain sequences to the Results Table.
选择包括-将所有病毒疫苗株序列包括在结果表中
Select Exclude - to exclude all virus vaccine strain sequences to the Results Table.
选择排除-将所有病毒疫苗株序列排除到结果表中
Select Only - to view only virus vaccine strain sequences.
仅选择-仅查看病毒疫苗株序列
Search for sequences by virus name or taxonomy group 按病毒名称或分类组搜索序列
Find your virus sequence(s) 查找您的病毒序列
Option 1:
Select Search by virus drop-down option from navigation menu Find Data tab on any of NCBI Virus pages. This will open the selection menu.
在任何NCBI病毒页面上,从导航菜单“查找数据”选项卡中选择“按病毒搜索”下拉选项。这将打开选择菜单。
Start typing in the text box, then select your taxid (NCBI taxonomy database ID). To select all viral sequences, enter and then select the term viruses.
开始在文本框中键入,然后选择您的taxid(NCBI分类数据库ID)。要选择所有病毒序列,请输入并选择术语“病毒”。
The results will be shown in the table. 结果将显示在表中
Note: Please view a list of all viral taxonomy terms using the NCBI taxonomy pages.
注意:请使用NCBI分类页面查看所有病毒分类术语的列表。
Option 2:
Click on button Search by virus located in the central part of NCBI virus home page.
点击NCBI病毒主页中央部分的“按病毒搜索”按钮
Start typing in the text box, then select your taxid (NCBI taxonomy database ID).
开始在文本框中键入,然后选择您的taxid(NCBI分类数据库ID)
This will open the tabular interface with sequences from the selected taxonomy group.
这将打开具有所选分类组中的序列的表格界面
Compare results in the Results Table
Click on the Nucleotide tab to access genomic sequences, the Protein tab to access amino acid sequences for individual proteins, or RefSeq Genome tab to access RefSeq genomes. For segmented viruses each RefSeq genome includes all segments for each segmented virus
单击核苷酸选项卡可访问基因组序列,单击蛋白质选项卡可访问单个蛋白质的氨基酸序列,或单击RefSeq基因组选项卡可访问RefSeq基因。对于分段病毒,每个RefSeq基因组包括每个分段病毒的所有片段
In virus search Results Table you can compare search results in tabular display using the following sortable default columns:
在病毒搜索结果表中,您可以使用以下可排序的默认列以表格形式比较搜索结果:
==Accession== - the NCBI accession number of the NCBI Virus database sequence.
==Submitters== - authors submitted the sequence. Only the first submitter's name displayed in the column (for example, Baranov,P.V., et al.). To obtain a full list of submitters, click on sequence accession number, this will open the details menu. Click on the accession number in the details panel, this will open GenBank Entrez page with all information available for the selected sequence. Alternatively, you can use the Download button with CSV format option. The column "Submitters" in the downloaded table will contain the name of all authors submitted each sequence.
==Release date== - the date when sequence was released (publicly appeared) in GenBank or other INSDC databases.
==Isolate== - Individual isolate from which the sequence was obtained, typically an alphanumeric sample ID. Isolate name parsed from "/isolate" field of GenBank record. SARS-CoV-2 sequence isolate name is formatted according to the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) definitions.
==Species== – virus species name.
==Molecule type== - viral nucleic acid type. Molecule type is provided by International Committee on Taxonomy of Viruses (ICTV) in the Master Species List and maintained in the NCBI Taxonomy database. RefSeqs that have "Unknown" molecule type belong to tax groups which were not recognized by the ICTV yet.
==Length== - sequence length.
==Geo Location== – country/region of virus specimen collection.
==USA.== If the sample was collected in the United States, the column shows the state abbreviation.
==Host== – virus isolation host (Read more about isolation source vocabulary mapping here). If isolation host is unknown (/host field of the GenBank record), but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the Results Table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the host column of the table.
Search results can be customized by adding/removing additional columns from the Results Table in Select Columns dropdown menu.
Additional columns include:
Isolation source – sequence isolation source (read more about isolation source here).
Collection Date – virus specimen collection date.
SRA accession - NCBI Sequence Read Archive (SRA) accession number.
Genus.
Family.
Sequence type – complete/partial/refseq (read more about sequence type here).
Nuc completeness - nucleotide completeness (note: it is preliminary data, not always accurate).
Genotype.
Segment – segment name in case of segmented viruses.
Publications - links to associated with sequences publications in PubMed.
BioSample – NCBI BioSample accession number.
BioProject – NCBI BioProject accession number.
GenBank title.
The default number of rows displayed in the Results Table is 200. You can change the number of table rows by selecting number results per page (200, 100, 50 or 25) in Select Columns menu.
Build multiple sequence alignment of selected results
Please, refer to the Build multiple sequence alignment of selected BLAST results, since functionality is the same.
请参阅所选BLAST结果的构建多序列比对,因为功能是相同的。
Build phylogenetic tree of selected results
Please, refer to the Build phylogenetic tree of selected BLAST results, since functionality is the same.
请参阅所选BLAST结果的构建系统发育树,因为功能是相同的。
Refine tabular results via filters
Please, refer to the Refine tabled BLAST results via filters, since functionality is the same.
请参阅通过过滤器优化表格BLAST结果,因为功能是相同的。
How to find, view and download SARS-CoV-2 sequences and related metadata?
如何查找、查看和下载SARS-CoV-2序列和相关元数据?
In order to provide free and easy access to genome and protein sequences and associated metadata from the SARS-CoV-2, we created a dedicated Severe acute respiratory syndrome coronavirus 2 data hub.
为了免费、方便地访问严重急性呼吸系统综合征冠状病毒2型的基因组和蛋白质序列以及相关元数据,我们创建了一个专门的严重急性呼吸综合征冠状病毒二型数据中心。
You can access the Results Table on SARS-CoV-2 data hub, by pressing “RefSeq genomes”, “nucleotide” or “protein” links on announcement(公告 ; 宣布 ; 通告 ; 宣告 ; 布告 ) banner located on NCBI home page, in “Find data” navigation menu or using “Up-to-date SARS-CoV-2” shortcut(快捷方式 ; 近路;捷径 ; 快捷办法,捷径 ) button in “Search by virus” form.
您可以访问SARS-CoV-2数据中心的结果表,方法是在NCBI主页的“查找数据”导航菜单中按公告横幅上的“RefSeq genomics”、“核苷酸”或“蛋白质”链接,或在“按病毒搜索”窗体中使用“最新严重急性急性呼吸系统疾病冠状病毒2型”快捷按钮。
SARS-CoV-2 data hub allows to search, retrieve, and analyze and vizualize SARS-CoV-2 data available in GenBank. This page also provides links to Betacoronavirus BLAST, SARS-CoV-2 articles in PubMed, SRA data, NCBI SARS-CoV-2 resources, Data Sets command line and CDC outbreak information.
SARS-CoV-2数据中心允许搜索、检索、分析和实时化GenBank中可用的SARS-CoV-2数据。该页面还提供了Betacoronavirus BLAST、PubMed中的SARS-CoV-2文章、SRA数据、NCBI SARS-CoV-2资源、数据集命令行和美国疾病控制与预防中心疫情信息的链接。
SARS-CoV-2 data hub results table has “Pangolin” column which is specific only to SARS-CoV-2 data. Pango lineages are determined by Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages). All SARS-CoV-2 GenBank records reprocessed nightly by Pangolin pipeline using UShER pipeline. The field is empty if the sequence was released after the Pangolin run that day. The field will show unclassifiable if the sequence does not meet requirements to be processed, and will show unassigned if the Pangolin tool is not able to determine the lineage for the sequence. You can view Pango version by downloading results in CSV format. You can view version strings in Pango Versions column. Each string includes the following sources: pangolin/pangolin-data/constellations/scorpio. For example, 4.0.6/1.8/v0.1.8/0.3.17.
SARS-CoV-2数据中心结果表有“Pangolin”列,该列仅适用于SARS-CoV-2数据。Pango谱系由Pangolin(名为全球爆发谱系的系统发育分配)决定。Pangolin管道使用UShER管道每晚重新处理所有SARS-CoV-2 GenBank记录。如果该序列是在当天Pangolin run后发布的,则该字段为空。如果序列不符合要处理的要求,该字段将显示为不可分类,如果Pangolin工具无法确定序列的谱系,则该字段显示为未分配。您可以通过下载CSV格式的结果来查看Pango版本。您可以在PangoVersions列中查看版本字符串。每个字符串包括以下来源:pangolin/pangolin-data/constellations/scorpio。例如,4.0.6/1.8/v0.1.8/0.3.17。
There are two filters on “Refine Results” panel which are specific only to SARS-CoV-2 data:
“优化结果”面板上有两个过滤器,仅适用于严重急性呼吸系统综合征冠状病毒2型数据:
Pango lineage(血统 ; 世系 ; 家系 ; 宗系) - allows to filter sequences a particular Pango lineage assigned. Pango lineages are determined by Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages). All SARS-CoV-2 GenBank records reprocessed nightly by Pangolin pipeline using UShER pipeline. The field is empty if the sequence is unclassifiable or if it was released after a UShER run that day. You can view Pango version by downloading results in CSV format. You can view version strings in Pango Versions column. Each string includes the following sources: pangolin/pangolin-data/constellations/scorpio. For example, 4.0.6/1.8/v0.1.8/0.3.17.
Pango谱系-允许过滤特定Pango谱系分配的序列。Pango谱系由Pangolin(名为全球爆发谱系的系统发育分配)决定。Pangolin管道使用UShER管道每晚重新处理所有严重急性呼吸系统综合征冠状病毒2型GenBank记录。如果序列不可分类,或者在当天UShER运行后发布,则该字段为空。您可以通过下载CSV格式的结果来查看Pango版本。您可以在Pango Versions列中查看版本字符串。每个字符串包括以下来源:pangolin/pangolin-data/constellations/scorpio。例如,4.0.6/1.8/v0.1.8/0.3.17。
Random sampling - allows to filter sequences that were collected randomly for the purpose of baseline surveillance.(监控;(对犯罪嫌疑人或可能发生犯罪的地方的)监视) For example, this filter can be helpful if you would like to know which lineages are increasing in frequency, or are looking for a rough estimate of the infection rate in geographical regions where that data isn’t available yet. Random sampling of samples (e.g., not for vaccine breakthrough or localized outbreak investigation) allows to make these estimates better.
随机采样-允许过滤为基线监测目的随机收集的序列。例如,如果你想知道哪些谱系的频率在增加,或者正在寻找尚未获得数据的地理区域的感染率的粗略估计,那么这个过滤器可能会有所帮助。样本的随机抽样(例如,不用于疫苗突破或局部疫情调查)可以使这些估计更好。
NCBI Virus scanns SARS-CoV-2 GenBank records and any linked BioSample records. If either of the following field/value pairs are found, then the sequence is included in our “random sampling” filter:
==NCBI病毒扫描严重急性呼吸系统综合征冠状病毒2型GenBank记录和任何相关的BioSample记录。如果找到以下字段/值对中的任何一个,则序列将包含在我们的“随机采样”过滤器中: ==
GenBank: KEYWORDS - purposeofsampling:baselinesurveillance
BioSample: purpose of sequencing - Baseline surveillance
Select Include - to include all randomly sampled SARS-CoV-2 sequences to the Results Table.
Select Exclude - to exclude all randomly sampled SARS-CoV-2 sequences from the Results Table.
Select Only - to view only randomly sampled SARS-CoV-2 sequences.
For other filters description please, refer to the Refine tabled BLAST results via filters, since functionality is the same.关于其他过滤器的描述,请参阅通过过滤器优化表格BLAST结果,因为功能是相同的。
By clicking on “SARS-CoV-2 interactive dashboard” link on the announcement banner located on NCBI home page you can access geographic and time distribution graphs. You also can access it through SARS-CoV-2 data hub
通过点击NCBI主页上公告横幅上的“严重急性呼吸系统综合征冠状病毒2型交互式仪表板”链接,您可以访问地理和时间分布图。您也可以通过严重急性呼吸系统综合征冠状病毒2型数据中心访问它
Where can I find SARS-CoV-2 lineage-related information?
You can explore lineage geo-temporal and mutation data using the interactive SARS-CoV-2 Variants Overview dashboard which can be accessed through the announcement banner located on NCBI home page.
您可以使用交互式SARS-CoV-2变异株Overview dashboard来探索谱系、地理时间和突变数据,该面板可以通过NCBI主页上的公告横幅访问。
Learn more using SARS-CoV-2 Variants Overview help center.
使用SARS-CoV-2变体了解更多信息概述帮助中心
View and download specific virus sequence sets 查看和下载特定的病毒序列集
Find specific data sets
Option 1:
From navigation menu Find data tab select the desired group of viruses: All viruses, Human viruses, Bacteriophages, New sequences (past one month) and Available SARS-CoV-2 sequences to view preselected data sets.
从导航菜单“查找数据”选项卡中,选择所需的病毒组:所有病毒、人类病毒、噬菌体、新序列(过去一个月)和可用的严重急性呼吸系统综合征冠状病毒2型序列,以查看预选的数据集。
Bacteriophages include virus groups with the following NCBI Taxonomy IDs: 10472, 10656, 10659, 10841, 10860, 10877, 11989, 28883, 1714270, 12333, 79205, 2136181
噬菌体包括具有以下NCBI分类ID的病毒群:10472、10656、10659、10841、10860、10877、11989、28883、1714270、12333、79205、2136181
You can also access the selected virus groups through the “Popular Searchers” panel located on the Results Table. The following virus groups can be accessed:
您也可以通过结果表上的“Popular Searchers”面板访问选定的病毒组。可以访问以下病毒组:
Influenza virus - allows access to data for the following genera: Alphainfluenzavirus, Betainfluenzavirus, Gammainfluenzavirus and Deltainfluenzavirus. Capital letters A, B, C and D in brackets indicate the predominant species in each genus.
流感病毒-允许访问以下属的数据:甲型流感病毒、β流感病毒、γ流感病毒和德尔坦流感病毒。括号中的大写字母A、B、C和D表示每个属的优势种。
Rotavirus 轮状病毒;轮状病毒疫苗;轮状病毒属;人类轮状病毒
Dengue virus 登革热病毒
West Nile virus 西尼罗河病毒
Zika virus 寨卡病毒
MERS coronavirus MERS冠状病毒
Ebolavirus 埃博拉病毒
SARS-CoV-2 coronavirus
Option 2:
Click on button Search by sequence located in the central part of NCBI virus home page.
点击NCBI病毒主页中央部分的“按序列搜索”按钮
Select the desired popular virus searches group button located beneath the text box.
选择位于文本框下方的所需流行病毒搜索组按钮
Both options will open the tabular display with the information about viruses from the selected group.
这两个选项都将打开包含所选组中病毒信息的表格显示。
Learn more how to compare results in tabular display, build multiple sequence alignment of selected results, build phylogenetic tree of selected results or refine the Results Table via filters.
了解更多如何在表格显示中比较结果、建立所选结果的多序列比对、建立所选择结果的系统发育树或通过过滤器完善结果表。
Option 3:
Use NCBI Visual Data Dashboard to explore, view and download the massive, normalized datasets. Learn more.
使用NCBI可视化数据面板来探索、查看和下载大规模的标准化数据集。了解更多信息
Download sequences
To download sequences in a variety of formats (FASTA, accession list, the Results Table as CSV or XML), choose Nucleotide, Protein, or RefSeq Genomes tab and optionally select individual sequences to download.
要下载各种格式的序列(FASTA、accession列表、CSV或XML形式的结果表),请选择核苷酸、蛋白质或RefSeq基因组选项卡,然后选择要下载的单个序列。
You can also specify if you want to download a randomized or stratified randomized sequence set.
还可以指定是要下载随机化序列集还是分层随机化序列集。
Download a randomized sequence set 下载随机序列集
Disclaimers 免责声明
Please note, our current platform does not have the capability to generate repeatable randomized searches. We realize the importance of repeatability in the scientific community and are working diligently(勤奋地;勤勉地) to include this feature in our upcoming updates.
请注意,我们目前的平台不具备生成可重复随机搜索的能力。我们意识到可重复性在科学界的重要性,并正在努力将这一功能纳入我们即将发布的更新中。
Downloading randomized subsets in either FASTA format or accession list is currently available for nucleotide, protein, and assembly records. We are working to make them available for coding region records in the future.
以FASTA格式或登录列表下载随机子集目前可用于核苷酸、蛋白质和组装记录。我们正在努力使它们在未来可用于编码区域记录。
A randomized subset of sequences (also referred to as ‘downsampling’) can allow a user to work with a smaller subset of sequences selected at random from a larger dataset, as an approximation of the full dataset
随机序列子集(也称为“下采样”)可以允许用户使用从较大数据集中随机选择的较小序列子集,作为完整数据集的近似值
A smaller, representative sequence set could make downstream analysis faster and less computationally intensive,( 密集的 ; 集约的 ; 彻底的 ; 十分细致的 ; 短时间内集中紧张进行的 😉 and still allow for interpretation of the larger collection. When downloading a randomized subset, the file name will include the date of download and the randomization seed used.
较小的、有代表性的序列集可以使下游分析更快、计算密集度更低,并且仍然允许对较大的集合进行解释。下载随机化子集时,文件名将包括下载日期和使用的随机化种子。
Filters can be applied prior to downsampling as described here. After clicking the download button, a menu will allow you to select the download format, then a 2nd step will include an option to download a randomized subset of all the records in your filtered dataset. You can download a set of randomized sequences in a variety of formats (FASTA, accession list, Results table in CSV, or XML formats). Before opening the “Download” menu, please make sure to select the tab above the Results Table which corresponds to the data type you want to download. If you picked the “Nucleotide” tab, you will only be able to download randomized sequence data in FASTA Nucleotide, Nucleotide Accession list, XML, and CSV formats. If you chose the “Protein” tab, you will only be able to download randomized sequence data in FASTA Protein, Protein Accession List, XML, and CSV formats. If you picked the “RefSeq Genomes” tab, you will only be able to download randomized sequence data in Accession Assembly list, XML, and CSV formats.
如本文所述,可以在下采样之前应用滤波器。单击下载按钮后,菜单将允许您选择下载格式,然后第二步将包括下载过滤数据集中所有记录的随机子集的选项。您可以下载一组各种格式的随机化序列(FASTA、accession列表、CSV或XML格式的结果表)。在打开“下载”菜单之前,请确保选择与要下载的数据类型相对应的结果表上方的选项卡。如果选择“核苷酸”选项卡,则只能下载FASTA核苷酸、核苷酸Accession列表、XML和CSV格式的随机序列数据。如果选择“蛋白质”选项卡,则只能下载FASTA蛋白质、蛋白质Accession列表、XML和CSV格式的随机序列数据。如果您选择了“RefSeq Genomes”选项卡,您将只能在Accession组件列表中下载随机化序列数据
Download a stratified randomized sequence set
Randomized subsets of sequences can be stratified, meaning equally distributed over a field of categories (also referred to as ‘stratified downsampling’). This enables a user to work with a subset of sequences selected from a dataset, as an approximation of the full dataset, with equal numbers of sequences from a selected category, to approximate a larger sequence collection. The fields currently available for stratification are Country and Host. Before opening the “Download” menu, please make sure to select the tab above the Results table which corresponds to the data type you want to download. If you picked the “Nucleotide” tab, you will only be able to download randomized sequence data in FASTA Nucleotide, Nucleotide Accession list, XML, and CSV formats. If you chose the “Protein” tab, you will only be able to download randomized sequence data in FASTA Protein, Protein Accession List, XML, and CSV formats. If you picked the “RefSeq Genomes” tab, you will only be able to download randomized sequence data in Accession Assembly list, XML, and CSV formats.
序列的随机子集可以被stratified,这意味着在一个类别字段上均匀分布(也称为“分层下采样”)。这使得用户能够使用从数据集中选择的序列子集,作为整个数据集的近似值,使用来自所选类别的相等数量的序列,来近似更大的序列集合。目前可用于分层的字段有Country和Host。在打开“下载”菜单之前,请确保选择“结果”表上方与您要下载的数据类型相对应的选项卡。如果选择“核苷酸”选项卡,则只能下载FASTA核苷酸、核苷酸Accession列表、XML和CSV格式的随机序列数据。如果选择“蛋白质”选项卡,则只能下载FASTA蛋白质、蛋白质Accession列表、XML和CSV格式的随机序列数据。如果您选择了“RefSeq Genomes”选项卡,您将只能下载run
When downloading a stratified randomized subset, the file name will include the date of download and the randomization seed used.
下载分层随机化子集时,文件名将包括下载日期和使用的随机化种子。
Step by step instructions how to download sequences
如何下载序列的分步说明
Click Download button on the upper left side of NCBI Virus Results Table page.
点击NCBI病毒结果表页面左上角的下载按钮
This will open the download menu consisting of 3 steps.
这将打开由3个步骤组成的下载菜单
Step 1: Select Data Type. 选择数据类型
Nucleotide, protein, or coding region sequence (CDS) in FASTA format. Please note, that currently, randomized subsets are not available for coding region sequence (CDS) FASTA files.
FASTA格式的核苷酸、蛋白质或编码区序列(CDS)。请注意,目前,随机化子集不可用于编码区域序列(CDS)FASTA文件。
Accession list for nucleotide, protein, or assembly records. Please note, currently, randomized subsets are not available for coding region sequence (CDS) accession lists.
核苷酸、蛋白质或组装记录的accession列表。请注意,目前,随机化子集不可用于编码区序列(CDS)accession列表。
Results Table – the contents of the Results Table, including the metadata, in CSV format (comma separated values table format) or in XML format.
结果表–结果表的内容,包括元数据,采用CSV格式(逗号分隔值表格式)或XML格式。
Step 2: Select Records. ==选择记录 ==
Select which records you would like to download:
选择要下载的记录:
only selected records, which were selected using checkboxes in the results table, all records in the results table, randomized subset of up to 2,000 records in the Results Table (for Nucleotide FASTA, Protein FASTA, Nucleotide Accession List, Protein Accession List, Assembly Accession List, CSV, and XML formats only).
仅使用结果表中的复选框选择的选定记录、结果表中所有记录、结果表格中最多2000条记录的随机化子集(仅适用于核苷酸FASTA、蛋白质FASTA、核苷酸Accession列表、蛋白质Accession列表、Assembly Accession列表、CSV和XML格式)。
Randomized subsets contain a limited number of sequences randomly selected from all of the available sequences in the Results Table. As an option, you can choose to stratify your subset by a field, meaning that a roughly equal number of sequences will be randomly selected for each value of that field.
随机化子集包含从结果表中的所有可用序列中随机选择的有限数量的序列。作为一种选择,您可以选择按字段对子集进行分层,这意味着将为该字段的每个值随机选择大致相等数量的序列。
To use options for randomized subsets, select ‘Download a randomized subset of all records’ and then select either a fully randomized subset or a stratified subset. Enter the total number of randomly sorted records that you want to download into the input box, and enter the category that you want to stratify across from the dropdown.
要使用随机化子集的选项,请选择“下载所有记录的随机化子集”,然后选择完全随机化子集或分层子集。在输入框中输入要下载的随机排序记录的总数,然后从下拉列表中输入要分层的类别。
Randomized subsets contain a limited number of sequences randomly selected from all the available sequences in the Results Table. As an option, you can choose to stratify your subset by a field (up to 20 records country or per host), meaning that a roughly equal number of sequences will be randomly selected for each value of that field.
随机化子集包含从结果表中的所有可用序列中随机选择的有限数量的序列。作为一种选择,您可以选择按字段(最多20个记录国家或每个host)对子集进行分层,这意味着将为该字段的每个值随机选择大致相等数量的序列。
To use options for randomized subsets, select 'Download a randomized subset of records (up to 2,000) and then select either a fully randomized subset or a stratified subset. Enter the number of randomly sorted records (up to 2,000 for randomized subset and up 20 records per value for stratified subset) that you want to download into the input box and enter the category that you want to stratify across from the dropdown.
要使用随机化子集的选项,请选择“下载记录的随机化子集(最多2000个)”,然后选择完全随机化子集或分层子集。在输入框中输入要下载的随机排序记录数(随机化子集最多2000条,分层子集每个值最多20条),然后从下拉列表中输入要分层的类别。
The fields currently available for stratification are Country and Host.
目前可用于分层的字段有Country和Host。
Click “Next” and follow the prompts on the 3rd step in the menu to begin your download.
单击“下一步”,然后按照菜单中第3步的提示开始下载。
Step 3.
If in step 1 you selected Sequence Data (FASTA format), in step 3 you can select FASTA definition line for the sequences that you are going to download.
如果在步骤1中选择了序列数据(FASTA格式),则在步骤3中可以为要下载的序列选择FASTA定义行。
In case if nucleotide or protein sequence data were selected in Step 1, the default FASTA definition line will be presented in the format (accession) | (GenBank title) and will include the GenBank sequence accession number and GenBank title:
如果在步骤1中选择了核苷酸或蛋白质序列数据,则默认FASTA定义行将以(accession)|(GenBank标题)的格式显示,并将包括GenBank序列accession号和GenBank标题:
AAO17794 |VP4 spike protein[Human rotavirus A].
In case if coding region option was selected, the default definition line format will be (nucleotide accession)(cds coordinates)| (GenBank title) and will include the related GenBank nucleotide sequence accession number, the indication that this is a coding region (cds), related GenBank protein accession number and related protein GenBank title:
如果选择了编码区选项,默认定义行格式将为(核苷酸accession)(cds 坐标)|(GenBank标题),并将包括相关GenBank核苷酸序列accession号、这是编码区的指示(cds)、相关GenBank蛋白质accession号和相关蛋白质GenBank标题:
NC_045425.1:319…1659 |replication endonuclease [Thermus phage phiOH3].
You can change this default defline to fit your own needs by selecting Build custom sequence title option. Here you can select the following options (columns):
您可以通过选择“构建自定义序列标题”选项来更改此默认定义以满足自己的需要。您可以在此处选择以下选项(列):
Assembly
SRA accession
Submitters
Release date
Pangolin
Random Sampling
Isolate
Species
Genus
Family
Molecule type
Length
Sequence type
Nucleotide Completeness
Genotype
Segment
Publication
Geo Location
Country
Host isolation source
Collection date
BioSample
BioProject
You can view description for each option in the description of the Results Table columns.
您可以在“结果表”列的说明中查看每个选项的说明。
If in Step 1 you selected the Accession list , you can download nucleotide, protein and and RefSeq genome assembly accession numbers with or without vesrsion number. For example: NC_045512 (without version) or NC_045512.2 (with version).
如果在步骤1中您选择了Accession列表,您可以下载核苷酸、蛋白质和RefSeq基因组组装的Accession号,包括或不包括vesrsion号。例如:NC_045512(不带版本)或NC_045512.2(带版本)。
If in Step 1 you selected the the Results Table in CSV format, the downloaded results will show all selected columns data. You can modify the selected columns and choose the columns you need in Step 3: Select columns to include in results set. You also can select if you want to include accession number with or without version number.
如果在步骤1中选择了CSV格式的结果表,则下载的结果将显示所有选定的列数据。您可以在步骤3中修改所选列并选择需要的列:选择要包含在结果集中的列。您还可以选择是否要包括带有或不带有版本号的accession号。
NCBI Visual Data Dashboards
NCBI Virus visual data dashboards support data exploration and discovery across our normalized datasets. They can be used to identify trends in data and to select specific subsets based on those trends.
NCBI病毒可视化数据仪表板支持在我们的标准化数据集上进行数据探索和发现。它们可以用于识别数据中的趋势,并根据这些趋势选择特定的子集。
Visual dashboards in Virus encompass: 病毒包围中的可视化仪表板
- Dashboard located on the NCBI Virus Home page, which provides virus sequence statistics, Virus Taxonomy Sunburst Chart, and a Host Distribution Bar Chart.
位于NCBI病毒主页上的仪表板,提供病毒序列统计信息、病毒分类Sunburst图表和Host分布条形图。 - Dashboard “Visual Filters for GenBank Sequences”, which displays data for specific viral taxa(分类群;分类单元;类群) and includes Sequence Type links with calculated virus sequence statistics, a Geographic Distribution choropleth(等值线图 ) that shows the geographic distribution of sequence records based on collection locations, and time sliders(滑块;滑动器;滑动条;游标;旅行者) for Collection and Release Date to dynamically show the number of sequences for each time interval.
仪表板“GenBank序列的视觉过滤器”,显示特定病毒分类群的数据,包括与计算的病毒序列统计信息的序列类型链接,根据采集位置显示序列记录的地理分布的地理分布choropleth,以及采集和发布日期的时间滑块,以动态显示每个时间间隔的序列数量。
1: Home Page Dashboard
Access sequence data via buttons located in the top row for the following statistics:
通过位于最上面一行的按钮访问序列数据,以进行以下统计:
RefSeq Nucleotides - all viral nucleotide reference sequences available at NCBI (find more about reference sequences here).
RefSeq核苷酸-NCBI提供的所有病毒核苷酸参考序列(点击此处了解更多参考序列)。
All Proteins - all NCBI viral protein sequences, including RefSeq proteins.
所有蛋白质-所有NCBI病毒蛋白质序列,包括RefSeq蛋白质。
All Nucleotides – all viral nucleotide records available at NCBI, including RefSeqs.
所有核苷酸——NCBI提供的所有病毒核苷酸记录,包括参考序列。
RefSeq Proteins - all viral protein reference sequences available at NCBI.
RefSeq蛋白质-NCBI提供的所有病毒蛋白质参考序列。
Complete Nucleotides – all NCBI viral nucleotide sequences, where GenBank ASN.1 format contains the following descriptors: descr/molinfo/completeness=complete or there is a word ‘complete’ present in the record’s definition line (defline). It also includes complete reference records (RefSeqs).
完整核苷酸–所有NCBI病毒核苷酸序列,其中GenBank ASN.1格式包含以下描述符:descr/molifo/complety=完整,或者在记录的定义行(defline)中存在“完整”一词。它还包括完整的参考记录(参考序列)。
Clicking on each button will show a results table with the corresponding sequences. Those results can be further refined by using filters for various sequence attributes (metadata) located on the left side of the Results Table page (learn more here).
单击每个按钮将显示具有相应序列的结果表。通过使用位于结果表页面左侧的各种序列属性(元数据)的过滤器,可以进一步细化这些结果(在此处了解更多信息)。
Explore virus taxonomy hierarchy using sunburst chart
Virus taxonomy can be explored via an interactive sunburst chart. The default view represents the classification for all available NCBI viral taxa. The inner layer (ring) represents four non-taxonomic groups of viruses: RNA viruses, DNA viruses, DNA/RNA viruses (which includes reverse-transcribing viruses), and Unclassified viruses. Only 4 levels of the whole hierarchy are visible on the plot at a given time.
==病毒分类法可以通过交互式的sunburst图表进行探索。默认视图表示所有可用NCBI病毒分类群的分类。内层(环)代表四类非分类病毒:RNA病毒、DNA病毒、DNA/RNA病毒(包括逆转录病毒)和未分类病毒。在给定的时间,整个层次结构中只有4个级别在绘图上可见。 ==
To explore virus taxonomy, click on any slice (section) of any layer on the sunburst chart. This will trigger the plot to zoom into the selected taxa and display any additional taxa below the selection. Each viral taxa name is displayed on a corresponding slice or can be viewed in the hover-over tool-tip by placing your cursor over the slice. Dynamic breadcrumbs with viral taxa names are located above the sunburst plot. Breadcrumbs are also a secondary navigation system that show the location of the taxa in the hierarchy and clicking on one will refocus the plot on the selected taxa. You can also see breadcrumbs by hovering( 盘旋 ; 翱翔 ; 靠近 ; 踌躇,彷徨 ; 处于不稳定状态 ; 停悬;空中悬停) over any slice in the sunburst. Clicking on the center of the sunburst chart will return you to the parent taxa.
要探索病毒分类,请单击日光图上任何层的任何切片(部分)。这将触发绘图放大到选定的分类群,并显示所选分类群下方的任何其他分类群。每个病毒分类群的名称都显示在相应的切片上,或者可以通过将光标放在切片上,在悬停工具提示中查看。带有病毒分类群名称的动态面包屑位于sunburst图上方。面包屑也是一种辅助导航系统,可以显示分类群在层次结构中的位置,单击其中一个会将绘图重新聚焦到选定的分类群上。你也可以通过在阳光下的任何any slice悬停来看到breadcrumbs。点击阳光爆发图的中心将返回到父分类群。
Select specific virus taxonomy group and view statistics for specific sequence sets with quick links to download them选择特定的病毒分类组,并通过快速下载链接查看特定序列集的统计信息
After selecting a specific taxonomy group on sunburst chart, you can view and explore the updated statistics in the top row of the dashboard.
在sunburst图表上选择特定的分类组后,您可以在面板的顶行查看和浏览更新的统计信息。
Select a host term from the Host Distribution bar chart and see the distribution of that host among the various viral taxa 从宿主分布条形图中选择一个宿主术语,并查看该宿主在各种病毒分类群中的分布
The interactive Host Distribution chart shows the distribution of virus host species. Each host bar is proportional to the number of virus sequences isolated from this host. The total number of virus sequences for each bar can be viewed by hovering over the bar.
交互式宿主分布图显示了病毒宿主物种的分布。每个宿主条与从该宿主分离的病毒序列的数量成比例。将鼠标悬停在条形图上可以查看每个条形图的病毒序列总数。
To select a host species, click on a bar or on a corresponding host name. This will highlight selected host, as well as all virus taxonomy groups containing sequences isolated from the selected host. Only one host can be selected at a time. Clicking on the selected host a second time will de-select it or you can use the Reset option available in the top right corner of the host chart. The statistics in the top row of the dashboard will be updated based on the selected host.
要选择宿主物种,请单击栏或相应的宿主名称。这将突出显示所选宿主,以及包含从所选宿主分离的序列的所有病毒分类组。一次只能选择一个宿主。第二次单击所选宿主将取消选择它,或者您可以使用主机图表右上角的“重置”选项。仪表板顶行中的统计信息将根据所选宿主进行更新。
You can search for a host species by scrolling the scrollbar on Host Distribution Chart, or by using keyboard combination “CTRL+F”.
您可以通过滚动“宿主分布图”上的滚动条或使用键盘组合“CTRL+F”来搜索宿主物种。
You can reset Host Distribution chart the the original view by pressing on button “Reset” in the upper right corner of the chart.
您可以通过按下图表右上角的“重置”按钮将主机分布图表重置为原始视图。
Explore viral taxonomy hierarchy within a given taxon highlighted by the host selection 探索宿主选择突出显示的给定分类单元中的病毒分类层次
By clicking on a highlighted taxonomy group, you can further explore viral taxonomy hierarchy on sunburst chart. The lower layers that include taxa with sequences from the selected host will be highlighted. While zooming in, not all taxa will be highlighted if not all taxa include sequences from the selected host.
通过点击突出显示的分类组,您可以进一步探索sunburst图表上的病毒分类层次结构。较低的层包括具有所选宿主序列的分类群将被突出显示。放大时,如果不是所有分类群都包括来自所选宿主的序列,则不是所有的分类群都会高亮显示。
2: “Visual Filters for GenBank Sequences” Dashboard
“GenBank序列的可视化过滤器”面板
“Visual Filters for GenBank Sequences” is a dashboard which enables filtering of your virus search results based on important attributes, like geographic location, collection, and release date, using visualized, graphical filters.
“GenBank序列的可视化过滤器”是一个仪表板,可以使用可视化的图形过滤器,根据地理位置、收藏和发布日期等重要属性过滤病毒搜索结果。
How to access “Visual Filters for GenBank Sequences”?
==如何访问“GenBank序列的可视化过滤器”? ==
There are several ways to access Visual Filters for GenBank Sequences.
有几种方法可以访问GenBank序列的视觉过滤器。
-
From NCBI Virus home page follow the steps below:
从NCBI病毒主页,按照以下步骤操作:
Select ‘Search by Virus’.
选择“按病毒搜索”
Type virus name, then select an option from the autocomplete list.
键入病毒名称,然后从自动完成列表中选择一个选项
View the results table for your virus of interest.
查看您感兴趣的病毒的结果表
Find a tab named “Visual Filters for GenBank Sequences” above the results table.
在结果表上方找到一个名为“GenBank序列的视觉过滤器”的选项卡
Click on the tab “Visual Filters for GenBank Sequences” to switch to visual filtering.
点击选项卡“GenBank序列的视觉过滤器”切换到视觉过滤 -
From the Results Table page access the “Visual Filters for GenBank Sequences” tab in the header above the results table.
从结果表页面访问结果表上方标题中的“GenBank序列的视觉过滤器”选项卡
Please note, if any filters were applied on the results table, switching to the “Visual Filters for GenBank Sequences” dashboard will reset all the filters except for the virus name.
请注意,如果在结果表上应用了任何筛选器,则切换到“GenBank序列的视觉筛选器”面板将重置除病毒名称之外的所有筛选器
3. By adding NCBI Virus “taxid” number directly to the page URL: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/dashboard?taxid=
For example, for Zika virus (taxid=64320), enter the following URL: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/dashboard?taxid=64320
How to use Visual Filters for GenBank Sequences?
Visual filters allow to filter your search by geographic location, collection time, and release time. Each filtering feature on the dashboard is interactive and connective, so when a filter is applied in one feature, it is also reflected in the other features. When using these filters, the top summary section is automatically updated to reflect the number of records in the NCBI RefSeq, Nucleotide, and Protein sets in the NCBI Virus database that fit the combined conditions of your search.
==视觉过滤器允许按地理位置、收集时间和发布时间过滤您的搜索。仪表板上的每个过滤功能都是交互式的和连接的,因此当过滤器应用于一个功能时,它也会反映在其他功能中。使用这些过滤器时,顶部摘要部分会自动更新,以反映NCBI病毒数据库中符合搜索组合条件的NCBI RefSeq、Nucleotide和Protein集合中的记录数。 ==
Geographic Distribution choropleth map allows to select sequence records collected at that location.
地理分布choropleth地图允许选择在该位置收集的序列记录
Click on a selected geographic location to filter sequences by collection location.
单击选定的地理位置以按采集位置筛选序列
Map allows to select multiple international locations or multiple locations in the USA. The selections will reset if you change between the International and USA maps.
地图允许选择多个国际位置或在美国的多个位置。如果您在国际和美国地图之间切换,则会重置选择。
To select a single location, start typing the name of the region and select the one from a dropdown list.
要选择一个位置,请开始键入区域名称,然后从下拉列表中选择一个
Please note, that color shades on the map are based on nucleotide record numbers for the virus; darker shades correspond to higher numbers, and lighter shades - to lower numbers.
请注意,地图上的色调是基于病毒的核苷酸记录编号;较深的阴影对应较高的数字,较浅的阴影对应较低的数字。
By using the Collection Time and Release Time sliders, you can view a histogram of distribution of nucleotide record numbers in different time intervals.
通过使用“采集时间”和“发布时间”滑块,可以查看不同时间间隔内核苷酸记录数分布的直方图。
Use the sliders or click date columns to select records by the sample collection date or the GenBank release date. Weekly, monthly and yearly time intervals can be selected.
使用滑块或单击日期列,按样本采集日期或GenBank发布日期选择记录。可以选择每周、每月和每年的时间间隔
Collection Time graph:采集时间图
Select collection date range of the samples by either selecting one time interval bar or dragging the ends of the sliders.
通过选择一个时间间隔条或拖动滑块的末端来选择采样的采集日期范围。
Slider displays data from the earliest collection year for this virus data to the current year.
滑块显示从该病毒数据的最早收集年份到当前年份的数据
If the collection time for a record is incomplete, we collapse it like this: If the record only has a year, the record is shown as Jan 1 of that year. If the record only has year and month, the record is shown on the first day of that month.
如果一个记录的收集时间不完整,我们会这样折叠它:如果该记录只有一年,则该记录显示为当年的1月1日。如果记录只有年份和月份,则记录显示在该月的第一天。
Release Time graph:发布时间图
Select release date range of the samples by either selecting one bar or dragging the ends of the sliders.
通过选择一个条或拖动滑块的末端来选择样本的发布日期范围
Slider displays data from the year this virus data was released first time to the current year.
滑块显示从该病毒数据首次发布的年份到当前年份的数据
You can also select different bi-yearly intervals, which will show you the portion of the graph for that time frame. However, you still have to click on the bar or select the time interval with the sliders to apply filtering.
您还可以选择不同的两年一次的间隔,这将显示该时间段的图形部分。但是,您仍然需要单击栏或使用滑块选择时间间隔来应用过滤。
The top header of the Dashboard includes a link back to the Results Table page where you can review your results in tabular format, apply more filters, and download FASTA sequences, an accession list, or the table itself.
仪表板的顶部标题包含一个返回结果表页面的链接,您可以在其中以表格格式查看结果,应用更多筛选器,并下载FASTA序列、accession列表或表本身。
Note, that all filters applied in the graphical view will remain in effect on the Result Table page. However, if you switch from the Results Table page back to the visual filters, all applied filters will be lost, except for the selected virus name.
请注意,图形视图中应用的所有过滤器在“结果表”页面上将保持有效。但是,如果从“结果表”页面切换回视觉筛选器,则除选定的病毒名称外,所有应用的筛选器都将丢失。
How to find, view and download HIV-1 sequences and related metadata?
Public HIV-1 nucleotide and protein sequence data are displayed in HIV-1 data hub.
公共HIV-1核苷酸和蛋白质序列数据显示在HIV-1数据中心
HIV-1 data hub can be accessed by typing and selecting HIV-1 in Search by virus name or taxonomy input form.
通过在“按病毒名称搜索”或分类法输入表单中键入并选择HIV-1,可以访问HIV-1数据中心
Alternatively, it can be accessed from NCBI home page by typing HIV-1 in search window. This will open another page with HIV-1 virus genome assembly information. Press on NCBI virus button to access HIV-1 data hub.
或者,可以通过在搜索窗口中键入HIV-1从NCBI主页访问它。这将打开另一个包含HIV-1病毒基因组组装信息的页面。按下NCBI病毒按钮访问HIV-1数据中心。
These are early days for HIV-1 data support in NCBI Virus. Please stay tuned for updates and further details relevant to HIV-1.
这是在NCBI病毒中支持HIV-1数据的早期阶段。请继续关注与HIV-1相关的更新和更多详细信息。