AlphaFold3 中 MmcifObject类
是 解析 mmCIF 文件的核心数据结构,用于存储解析后的蛋白质结构信息,包含PDB 头部信息、Biopython 解析的结构、链序列信息等。
下面代码包含 Monomer 、AtomSite、ResiduePosition、ResidueAtPosition、 MmcifObject以及ParsingResult数据类的定义。
源代码:
# Type aliases:
ChainId = str
PdbHeader = Mapping[str, Any]
PdbStructure = PDB.Structure.Structure
SeqRes = str
MmCIFDict = Mapping[str, Sequence[str]]@dataclasses.dataclass(frozen=True)
class Monomer:id: strnum: int# Note - mmCIF format provides no guarantees on the type of author-assigned
# sequence numbers. They need not be integers.
@dataclasses.dataclass(frozen=True)
class AtomSite:residue_name: strauthor_chain_id: strmmcif_chain_id: strauthor_seq_num: strmmcif_seq_num: intinsertion_code: strhetatm_atom: strmodel_num: int# Used to map SEQRES index to a residue in the structure.
@dataclasses.dataclass(frozen=True)
class ResiduePosition:chain_id: strresidue_number: intinsertion_code: str@dataclasses.dataclass(frozen=True)
class ResidueAtPosition:position: Optional[ResiduePosition]name: stris_missing: boolhetflag: str@dataclasses.dataclass(frozen=True)
class MmcifObject:"""Representation of a parsed mmCIF file.Contains:file_id: A meaningful name, e.g. a pdb_id. Should be unique amongst allfiles being processed.header: Biopython header.structure: Biopython structure.chain_to_seqres: Dict mapping chain_id to 1 letter amino acid sequence. E.g.{'A': 'ABCDEFG'}seqres_to_structure: Dict; for each chain_id contains a mapping betweenSEQRES index and a ResidueAtPosition. e.g. {'A': {0: ResidueAtPosition,1: ResidueAtPosition,...}}raw_string: The raw string used to construct the MmcifObject."""file_id: strheader: PdbHeaderstructure: PdbStructurechain_to_seqres: Mapping[ChainId, SeqRes]seqres_to_structure: Mapping[ChainId, Mapping[int, ResidueAtPosition]]raw_string: Any@dataclasses.dataclass(frozen=True)
class ParsingResult:"""Returned by the parse function.Contains:mmcif_object: A MmcifObject, may be None if no chain could be successfullyparsed.errors: A dict mapping (file_id, chain_id) to any exception generated."""mmcif_object: Optional[MmcifObject]errors: Mapping[Tuple[str, str], Any]
代码解读:
类型别名 (Type Aliases)
类型别名是对复杂类型的简写,方便代码的可读性和维护。
ChainId = str
PdbHeader = Mapping[str, Any]
PdbStructure = PDB.Structure.Structure
SeqRes = str
MmCIFDict = Mapping[str, Sequence[str]]
ChainId = str
- 表示蛋白质链的 ID,例如
"A"
、"B"
。
- 表示蛋白质链的 ID,例如
PdbHeader = Mapping[str, Any]
- 表