biopython中文指南_Biopython新手指南-第1部分

biopython中文指南

When you hear the word Biopython what is the first thing that came to your mind? A python library to handle biological data…? You are correct! Biopython provides a set of tools to perform bioinformatics computations on biological data such as DNA data and protein data. I have been using Biopython ever since I started studying bioinformatics and it has never let me down with its functions. It is an amazing library which provides a wide range of functions from reading large files with biological data to aligning sequences. In this article, I will introduce you to some basic functions of Biopython which can make implementations much easier with just a single call.

当您听到Biopython一词时,您想到的第一件事是什么? 一个处理生物学数据的python库...? 你是对的! Biopython提供了一套工具,可对DNA数据和蛋白质数据等生物学数据进行生物信息学计算。 自从我开始研究生物信息学以来,我就一直在使用Biopython,但是它从来没有让我失望过它的功能。 它是一个了不起的库,它提供了广泛的功能,从读取带有生物学数据的大文件到比对序列。 在本文中,我将向您介绍Biopython的一些基本功能,这些功能只需一次调用就可以使实现更加容易。

入门 (Getting started)

The latest version available when I’m writing this article is biopython-1.77 released in May 2020.

在我撰写本文时,可用的最新版本是2020年5月发布的biopython-1.77

You can install Biopython using pip

您可以使用pip安装Biopython

pip install biopython

or using conda.

或使用conda 。

conda install -c conda-forge biopython

You can test whether Biopython is properly installed by executing the following line in the python interpreter.

您可以通过在python解释器中执行以下行来测试Biopython是否已正确安装。

import Bio

If you get an error such as ImportError: No module named Bio then you haven’t installed Biopython properly in your working environment. If no error messages appear, we are good to go.

如果您收到诸如ImportError: No module named Bio类的错误,则说明您的工作环境中没有正确安装Biopython。 如果没有错误消息出现,我们很好。

In this article, I will be walking you through some examples where Seq, SeqRecord and SeqIO come in handy. We will go through the functions that perform the following tasks.

在本文中,我将向您介绍一些示例,其中SeqSeqRecordSeqIO会派上用场。 我们将介绍执行以下任务的功能。

  1. Creating a sequence

    创建一个序列
  2. Get the reverse complement of a sequence

    获取序列的反补
  3. Count the number of occurrences of a nucleotide

    计算核苷酸的出现次数
  4. Find the starting index of a subsequence

    查找子序列的起始索引
  5. Reading a sequence file

    读取序列文件
  6. Writing sequences to a file

    将序列写入文件
  7. Convert a FASTQ file to FASTA file

    将FASTQ文件转换为FASTA文件
  8. Separate sequences by ids from a list of ids

    按ID从ID列表中分离序列

1.创建一个序列 (1. Creating a sequence)

To create your own sequence, you can use the Biopython Seq object. Here is an example.

要创建自己的序列,可以使用Biopython Seq对象。 这是一个例子。

>>> from Bio.Seq import Seq
>>> my_sequence = Seq("ATGACGTTGCATG")
>>> print("The sequence is", my_sequence)
The sequence is ATGACGTTGCATG
>>> print("The length of the sequence is", len(my_sequence))
The length of the sequence is 13

2.获得序列的反补 (2. Get the reverse complement of a sequence)

You can easily get the reverse complement of a sequence using a single function call reverse_complement().

您可以使用单个函数reverse_complement()轻松获得序列的反向补码。

>>> 
The reverse complement if the sequence is CATGCAACGTCAT

3.计算核苷酸的出现次数 (3. Count the number of occurrences of a nucleotide)

You can get the number of occurrence of a particular nucleotide using the count() function.

您可以使用count()函数获得特定核苷酸的出现count()

>>> print("The number of As in the sequence", my_sequence.count("A"))
The number of As in the sequence 3

4.查找子序列的起始索引 (4. Find the starting index of a subsequence)

You can find the starting index of a subsequence using the find() function.

您可以使用find()函数find()序列的起始索引。

>>> print("Found TTG in the sequence at index", my_sequence.find("TTG"))
Found TTG in the sequence at index 6

5.读取序列文件 (5. Reading a sequence file)

Biopython’s SeqIO (Sequence Input/Output) interface can be used to read sequence files. The parse() function takes a file (with a file handle and format) and returns a SeqRecord iterator. Following is an example of how to read a FASTA file.

Biopython的SeqIO (序列输入/输出)接口可用于读取序列文件。 parse()函数获取一个文件(具有文件句柄和格式),并返回一个SeqRecord迭代器。 以下是如何读取FASTA文件的示例。

from Bio import SeqIOfor record in SeqIO.parse("example.fasta", "fasta"):
print(record.id)

record.id will return the identifier of the sequence. record.seq will return the sequence itself. record.description will return the sequence description.

record.id将返回序列的标识符。 record.seq将返回序列本身。 record.description将返回序列描述。

6.将序列写入文件 (6. Writing sequences to a file)

Biopython’s SeqIO (Sequence Input/Output) interface can be used to write sequences to files. Following is an example where a list of sequences are written to a FASTA file.

Biopython的SeqIO (序列输入/输出)接口可用于将序列写入文件。 以下是将序列列表写入FASTA文件的示例。

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import generic_dnasequences = ["AAACGTGG", "TGAACCG", "GGTGCA", "CCAATGCG"]records = (SeqRecord(Seq(seq, generic_dna), str(index)) for index,seq in enumerate(sequences))with open("example.fasta", "w") as output_handle:
SeqIO.write(

This code will result in a FASTA file with sequence ids starting from 0. If you want to give a custom id and a description you can create the records as follows.

此代码将生成一个FASTA文件,其序列ID从0开始。如果要提供自定义ID和说明,可以按以下方式创建记录。

sequences = ["AAACGTGG", "TGAACCG", "GGTGCA", "CCAATGCG"]
new_sequences = []i=1for
record = SeqRecord(
new_sequences.append(record)with open("example.fasta", "w") as output_handle:
SeqIO.write(

The SeqIO.write() function will return the number of sequences written.

SeqIO.write()函数将返回写入的序列数。

7.将FASTQ文件转换为FASTA文件 (7. Convert a FASTQ file to FASTA file)

We need to convert DNA data file formats in certain applications. For example, we can do file format conversions from FASTQ to FASTA as follows.

我们需要在某些应用程序中转换DNA数据文件格式。 例如,我们可以按照以下步骤进行从FASTQ到FASTA的文件格式转换。

from Bio import SeqIOwith open("path/to/fastq/file.fastq", "r") as input_handle, open("path/to/fasta/file.fasta", "w") as output_handle:    sequences = SeqIO.parse(input_handle, "fastq")        
count = SeqIO.write(sequences, output_handle, "fasta") print("Converted %i records" % count)

If you want to convert a GenBank file to FASTA format,

如果要将GenBank文件转换为FASTA格式,

from Bio import SeqIO
with open("

sequences = SeqIO.parse(input_handle, "genbank")
count = SeqIO.write(sequences, output_handle, "fasta")
print("Converted %i records" % count)

8.将ID序列与ID列表分开 (8. Separate sequences by ids from a list of ids)

Assume that you have a list of sequence identifiers in a file named list.lst where you want to separate the corresponding sequences from a FASTA file. You can run the following and write those sequences to a file.

假设您有一个名为list.lst的文件中的序列标识符列表,您想在其中将相应的序列与FASTA文件分开。 您可以运行以下命令,并将这些序列写入文件。

from Bio import SeqIOids = set(x[:-1] for x in open(path+"list.lst"))with open(path+'list.fq', mode='a') as my_output:

for seq in SeqIO.parse(path+"list_sequences.fq", "fastq"):

if seq.id in ids:
my_output.write(seq.format("fastq"))

最后的想法 (Final Thoughts)

Hope you got an idea of how to use Seq, SeqRecord and SeqIO Biopython functions and will be useful for your research work.

希望您对如何使用SeqSeqRecordSeqIO Biopython函数有所了解,并且对您的研究工作很有用。

Thank you for reading. I would love to hear your thoughts. Stay tuned for the next part of this article with more usages and Biopython functions.

感谢您的阅读。 我很想听听您的想法。 请继续关注本文的下一部分,了解更多用法和Biopython函数。

Cheers, and stay safe!

干杯,保持安全!

翻译自: https://medium.com/computational-biology/newbies-guide-to-biopython-part-1-9ec82c3dfe8f

biopython中文指南

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/387964.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

整合后台服务和驱动代码注入

整合后台服务和驱动代码注入 Home键的驱动代码: /dev/input/event1: 0001 0066 00000001 /dev/input/event1: 0000 0000 00000000 /dev/input/event1: 0001 0066 00000000 /dev/input/event1: 0000 0000 00000000 对应输入的驱动代码: sendevent/dev/…

Java作业09-异常

6. 为如下代码加上异常处理 byte[] content null; FileInputStream fis new FileInputStream("testfis.txt"); int bytesAvailabe fis.available();//获得该文件可用的字节数 if(bytesAvailabe>0){content new byte[bytesAvailabe];//创建可容纳文件大小的数组…

为数据计算提供强力引擎,阿里云文件存储HDFS v1.0公测发布

2019独角兽企业重金招聘Python工程师标准>>> 在2019年3月的北京云栖峰会上,阿里云正式推出全球首个云原生HDFS存储服务—文件存储HDFS,为数据分析业务在云上提供可线性扩展的吞吐能力和免运维的快速弹性伸缩能力,降低用户TCO。阿里…

对食材的敬畏之心极致产品_这些数据科学产品组合将给您带来敬畏和启发(2020年中的版本)

对食材的敬畏之心极致产品重点 (Top highlight)为什么选择投资组合? (Why portfolios?) Data science is a tough field. It combines in equal parts mathematics and statistics, computer science, and black magic. As of mid-2020, it is also a booming fiel…

android模拟用户输入

目录(?)[-] geteventsendeventinput keyevent 本文讲的是通过使用代码,可以控制手机的屏幕和物理按键,也就是说不只是在某一个APP里去操作,而是整个手机系统。 getevent/sendevent getevent&sendevent 是Android系统下的一个工具&#x…

真格量化常见报错信息和Debug方法

1.打印日志 1.1 在代码中添加运行到特定部分的提示: 如果我们在用户日志未能看到“调用到OnQuote事件”文字,说明其之前的代码就出了问题,导致程序无法运行到OnQuote函数里的提示部分。解决方案为仔细检查该部分之前的代码是否出现问题。 1.2…

向量积判断优劣弧_判断经验论文优劣的10条诫命

向量积判断优劣弧There are a host of pathologies associated with the current peer review system that has been the subject of much discussion. One of the most substantive issues is that results reported in leading journals are commonly papers with the most e…

自定义PopView

改代码是参考一个Demo直接改的&#xff0c;代码中有一些漏洞&#xff0c;如果发现其他的问题&#xff0c;可以下方直接留言 .h文件 #import <UIKit/UIKit.h> typedef void(^PopoverBlock)(NSInteger index); interface CustomPopView : UIView //property(nonatomic,copy…

线控耳机监听

当耳机的媒体按键被单击后&#xff0c;Android系统会发出一个广播&#xff0c;该广播的携带者一个Action名为MEDIA_BUTTON的Intent。监听该广播便可以获取手机的耳机媒体按键的单击事件。 在Android中有个AudioManager类&#xff0c;该类会维护MEDIA_BUTTON广播的分发&#xf…

当编程语言掌握在企业手中,是生机还是危机?

2019年4月&#xff0c;Java的收费时代来临了&#xff01; Java是由Sun微系统公司在1995年推出的编程语言&#xff0c;2010年Oracle收购了Sun之后&#xff0c;Java的所有者也就自然变成了Oracle。2019年&#xff0c;Oracle宣布将停止Java 8更新的免费支持&#xff0c;未来Java的…

sql如何处理null值_如何正确处理SQL中的NULL值

sql如何处理null值前言 (Preface) A friend who has recently started learning SQL asked me about NULL values and how to deal with them. If you are new to SQL, this guide should give you insights into a topic that can be confusing to beginners.最近开始学习SQL的…

名言警句分享

“当你想做一件事&#xff0c;却无能为力的时候&#xff0c;是最痛苦的。”基拉大和转载于:https://www.cnblogs.com/yuxijun/p/9986489.html

文字创作类App分享-简书

今天我用Mockplus做了一套简书App的原型&#xff0c;这是一款文字创作类的App&#xff0c;用户通过写文、点赞等互动行为&#xff0c;提高自己在社区的影响力&#xff0c;打造个人品牌。我运用了Mockplus基础组件、交互组件、移动组件等多个组件库&#xff0c;简单拖拽&#xf…

数据可视化 信息可视化_动机可视化

数据可视化 信息可视化John Snow’s map of Cholera cases near London’s Broad Street.约翰斯诺(John Snow)在伦敦宽街附近的霍乱病例地图。 John Snow, “the father of epidemiology,” is famous for his cholera maps. These maps represent so many of our aspirations …

android 接听和挂断实现方式

转载▼标签&#xff1a; android 接听 挂断 it 分类&#xff1a; android应用技巧 参考&#xff1a;android 来电接听和挂断 支持目前所有版本 注意&#xff1a;android2.3版本及以上不支持下面的自动接听方法。 &#xff08;会抛异常&#xff1a;java.lang.Securi…

Eclipse External Tool Configration Notepad++

Location&#xff1a; C:\Program Files\Notepad\notepad.exe Arguments&#xff1a;  ${resource_loc} 转载于:https://www.cnblogs.com/rgqancy/p/9987610.html

利用延迟关联或者子查询优化超多分页场景

2019独角兽企业重金招聘Python工程师标准>>> MySQL并不是跳过offset行&#xff0c;而是取offsetN行&#xff0c;然后返回放弃前offset行&#xff0c;返回N行&#xff0c;那当offset 特别大的时候&#xff0c;效率就非常的低下&#xff0c;要么控制返回的总页数&…

客户流失_了解客户流失

客户流失Big Data Analytics within a real-life example of digital music service数字音乐服务真实示例中的大数据分析 Customer churn is a key predictor of the long term success or failure of a business. It is the rate at which customers are leaving your busine…

Java 动态加载class 并反射调用方法

反射方法&#xff1a; public static void main(String[] args) throws Exception { File filenew File("D:/classtest");//类路径(包文件上一层) URL urlfile.toURI().toURL(); ClassLoader loadernew URLClassLoader(new URL[]{url});//创…

Nginx:Nginx limit_req limit_conn限速

简介 Nginx是一个异步框架的Web服务器&#xff0c;也可以用作反向代理&#xff0c;负载均衡器和HTTP缓存&#xff0c;最常用的便是Web服务器。nginx对于预防一些攻击也是很有效的&#xff0c;例如CC攻击&#xff0c;爬虫&#xff0c;本文将介绍限制这些攻击的方法&#xff0c;可…