欢迎关注我的CSDN:https://spike.blog.csdn.net/
本文地址:https://spike.blog.csdn.net/article/details/136170304
CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2
- CombFold: 使用组合装配算法和 AlphaFold2 预测大型蛋白质组装的结构
Combinatorial Assembly Algorithm:组合装配算法
GitHub: https://github.com/dina-lab3D/CombFold
组合装配 (Combinatorial Assembly) 是一种用于预测大型蛋白质复合物结构的算法,利用 AlphaFold2 预测的亚基之间的成对相互作用,通过组合和层次化的方式,逐步构建出复合物的整体结构。Combinatorial Assembly 的过程分为三个阶段:
- 结构预测阶段:在这个阶段,使用 AlphaFold2 预测所有亚基的单体结构,以及所有亚基对的结构。亚基对的结构是通过将两个亚基的序列连接起来,并用 AlphaFold2 预测其结构,然后将连接部分切除而得到的。这些结构可以看作是亚基之间相互作用的候选模型。
- 组合搜索阶段:在这个阶段,使用一种基于分支限界的算法,从亚基对的结构中搜索出最优的组合方式,使得复合物的结构能够最大程度地与亚基对的结构相一致。这个算法的核心是定义了一个评分函数,用于衡量复合物结构与亚基对结构的匹配程度,以及一个剪枝策略,用于排除不可能的组合方案。这个阶段的输出是一个或多个最优的组合方案,每个方案包含了亚基之间的相对位置和方向。
- 结构优化阶段:在这个阶段,使用 Rosetta 进行能量最小化,对每个组合方案的复合物结构进行优化,以提高其稳定性和可靠性。这个阶段的输出是一个或多个最终的复合物结构,以及评分和置信度。
组合装配 (Combinatorial Assembly) 是一种高效且准确的算法,能够预测出大型、不对称的蛋白质复合物的结构,甚至在缺乏实验数据的情况下也能够做到。还支持结合交联质谱的距离约束,以及快速枚举可能的复合物组成。组合装配的高精度,使得成为了扩展蛋白质结构覆盖范围的有力工具。
运行 run_on_pdbs_folder
脚本,日志如下:
- 搜索提供的 PDB 文件中的亚基
- 提取代表性亚基 (对于每个亚基,其在PDBs文件夹中的最佳评分模型)
- 提取亚基之间的成对变换 (从每个包含2个或更多亚基的PDB文件中)
- 完成统一表示的构建
- 运行组合装配算法,可能需要一段时间
- 完成组合装配,写入输出模型
- 组装了3个复合物,置信度: 88.9885-88.9885
即
--- Searching for subunits in supplied PDB files
found full A0 in AFM_A0_A0_A0_unrelaxed_rank_1_model_3.pdb chain B
found full A0 in AFM_A0_A0_A0_unrelaxed_rank_1_model_3.pdb chain C
found full A0 in AFM_A0_A0_A0_unrelaxed_rank_1_model_3.pdb chain D
# ...
--- Extracting representative subunits (for each subunit, its best scored model in the PDBs folder)
rep A0 has plddt score 96.00915686274507
rep G0 has plddt score 89.0577211394304
--- Extracting pairwise transformations between subunits (from each PDB file with 2 or more subunits)
- Extracting pairwise transformations from file /nfs_beijing_ai/chenlong/workspace/combfold-by-chenlong/example/pdbs/AFM_A0_A0_A0_unrelaxed_rank_1_model_3.pdb
# ...
- Extracting pairwise transformations from file /nfs_beijing_ai/chenlong/workspace/combfold-by-chenlong/example/pdbs/AFM_A0_A0_A0_unrelaxed_rank_2_model_1.pdb
# ...
found 8 transformations between A0 and A0
found 6 transformations between A0 and G0
found 2 transformations between G0 and G0
--- Finished building unified representation
--- Running combinatorial assembly algorithm, may take a while
--- Finished combinatorial assembly, writing output models
--- Assembled 3 complexes, confidence: 88.9885-88.9885
预测效果,真实 PDB 是 6YBQ,是 A6B6 的 12 链蛋白质,即:
运行脚本如下,参考源码 CombFold.ipynb:
#!/usr/bin/env python
# -- coding: utf-8 --
"""
Copyright (c) 2024. All rights reserved.
Created by C. L. Wang on 2024/2/19
"""import os
import shutil
import sysp = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
if p not in sys.path:sys.path.append(p)from scripts import run_on_pdbs
from myutils.project_utils import mkdir_if_not_exist
from root_dir import ROOT_DIRclass RunExamples(object):"""运行程序"""def __init__(self):passdef process(self):path_on_drive = os.path.join(ROOT_DIR, "example") # @param {type:"string"}max_results_number = "5" # @param [1, 5, 10, 20]create_cif_instead_of_pdb = False # @param {type:"boolean"}subunits_path = os.path.join(path_on_drive, "subunits.json")pdbs_folder = os.path.join(path_on_drive, "pdbs")assembled_folder = os.path.join(path_on_drive, "assembled")mkdir_if_not_exist(assembled_folder)tmp_assembled_folder = os.path.join(path_on_drive, "tmp_assembled")mkdir_if_not_exist(assembled_folder)mkdir_if_not_exist(tmp_assembled_folder)if os.path.exists(assembled_folder):answer = input(f"[Info] {assembled_folder} already exists, Should delete? (y/n)")if answer in ("y", "Y"):print("[Info] Deleting")shutil.rmtree(assembled_folder)else:print("[Info] Stopping")exit()if os.path.exists(tmp_assembled_folder):shutil.rmtree(tmp_assembled_folder)# 核心运行逻辑run_on_pdbs.run_on_pdbs_folder(subunits_path, pdbs_folder, tmp_assembled_folder,output_cif=create_cif_instead_of_pdb,max_results_number=int(max_results_number))shutil.copytree(os.path.join(tmp_assembled_folder, "assembled_results"),assembled_folder)print("[Info] Results saved to", assembled_folder)def main():re = RunExamples()re.process()if __name__ == '__main__':main()
ChimeraX 调整蛋白质的显示:
- 显示命令行:Tools - General - Command Line Interface
- Command 输入:
preset ribbon
,即可显示丝带样式