多个模版结构特征提取

HhsearchHitFeaturizer和HmmsearchHitFeaturizer类的get_templates方法返回TemplateSearchResult。TemplateSearchResult含有features(TEMPLATE_FEATURES字典类型)以及errors(列表类型) 和 warnings (列表类型),模版特征字典的值都为np.array 类型,第一维度为模版数,如本示例中template_aatype特征维度为(3, 396, 22),template_all_atom_positions特征的维度为:(3, 396, 37, 3)。3:模版数;396:查询序列长度;22:氨基酸one-hot编码向量;37:肽链中所有原子类型数;3: 每个原子的xyz值。比提取单一模版特征单一模版特征提取  多一个维度(模版数这一维度)。

### 多个模版特征提取
import dataclasses
from typing import Optional, List, Sequence, Tuple, Mapping, Any, Dict
import datetime
import abc
import glob
from absl import logging
import numpy as np
import re
import functools
from Bio import PDB
import io
import collections
from Bio.Data import SCOPDataatom_types = ['N', 'CA', 'C', 'CB', 'O', 'CG', 'CG1', 'CG2', 'OG', 'OG1', 'SG', 'CD','CD1', 'CD2', 'ND1', 'ND2', 'OD1', 'OD2', 'SD', 'CE', 'CE1', 'CE2', 'CE3','NE', 'NE1', 'NE2', 'OE1', 'OE2', 'CH2', 'NH1', 'NH2', 'OH', 'CZ', 'CZ2','CZ3', 'NZ', 'OXT'
]
atom_order = {atom_type: i for i, atom_type in enumerate(atom_types)}
atom_type_num = len(atom_types)  # := 37.HHBLITS_AA_TO_ID = {'A': 0,'B': 2,'C': 1,'D': 2,'E': 3,'F': 4,'G': 5,'H': 6,'I': 7,'J': 20,'K': 8,'L': 9,'M': 10,'N': 11,'O': 20,'P': 12,'Q': 13,'R': 14,'S': 15,'T': 16,'U': 1,'V': 17,'W': 18,'X': 20,'Y': 19,'Z': 3,'-': 21,
}class Error(Exception):"""Base class for exceptions."""class NoChainsError(Error):"""An error indicating that template mmCIF didn't have any chains."""class SequenceNotInTemplateError(Error):"""An error indicating that template mmCIF didn't contain the sequence."""class PrefilterError(Exception):"""A base class for template prefilter exceptions."""MmCIFDict = Mapping[str, Sequence[str]]TEMPLATE_FEATURES = {'template_aatype': np.float32,'template_all_atom_masks': np.float32,'template_all_atom_positions': np.float32,'template_domain_names': object,'template_sequence': object,'template_sum_probs': np.float32,
}@dataclasses.dataclass(frozen=True)
class TemplateHit:"""Class representing a template hit."""index: intname: straligned_cols: intsum_probs: Optional[float]query: strhit_sequence: strindices_query: List[int]indices_hit: List[int]@dataclasses.dataclass(frozen=True)
class TemplateSearchResult:features: Mapping[str, Any]errors: Sequence[str]warnings: Sequence[str]class Error(Exception):"""Base class for exceptions."""class NoChainsError(Error):"""An error indicating that template mmCIF didn't have any chains."""class NoAtomDataInTemplateError(Error):"""An error indicating that template mmCIF didn't contain atom positions."""class TemplateAtomMaskAllZerosError(Error):"""An error indicating that template mmCIF had all atom positions masked."""class AlignRatioError(PrefilterError):"""An error indicating that the hit align ratio to the query was too small."""class CaDistanceError(Error):"""An error indicating that a CA atom distance exceeds a threshold."""####### start: 处理mmCIF 格式字符串##########
# Type aliases:
ChainId = str
SeqRes = str
PdbHeader = Mapping[str, Any]
PdbStructure = PDB.Structure.Structure@dataclasses.dataclass(frozen=True)
class ResiduePosition:chain_id: strresidue_number: intinsertion_code: str@dataclasses.dataclass(frozen=True)
class ResidueAtPosition:position: Optional[ResiduePosition]name: stris_missing: boolhetflag: str@dataclasses.dataclass(frozen=True)
class SingleHitResult:features: Optional[Mapping[str, Any]]error: Optional[str]warning: Optional[str]@dataclasses.dataclass(frozen=True)
class Monomer:id: strnum: int@dataclasses.dataclass(frozen=True)
class MmcifObject:"""Representation of a parsed mmCIF file.Contains:file_id: A meaningful name, e.g. a pdb_id. Should be unique amongst allfiles being processed.header: Biopython header.structure: Biopython structure.chain_to_seqres: Dict mapping chain_id to 1 letter amino acid sequence. E.g.{'A': 'ABCDEFG'}seqres_to_structure: Dict; for each chain_id contains a mapping betweenSEQRES index and a ResidueAtPosition. e.g. {'A': {0: ResidueAtPosition,1: ResidueAtPosition,...}}raw_string: The raw string used to construct the MmcifObject."""file_id: strheader: PdbHeaderstructure: PdbStructurechain_to_seqres: Mapping[ChainId, SeqRes]seqres_to_structure: Mapping[ChainId, Mapping[int, ResidueAtPosition]]raw_string: Any@dataclasses.dataclass(frozen=True)
class ParsingResult:"""Returned by the parse function.Contains:mmcif_object: A MmcifObject, may be None if no chain could be successfullyparsed.errors: A dict mapping (file_id, chain_id) to any exception generated."""mmcif_object: Optional[MmcifObject]errors: Mapping[Tuple[str, str], Any]@dataclasses.dataclass(frozen=True)
class AtomSite:residue_name: strauthor_chain_id: strmmcif_chain_id: strauthor_seq_num: strmmcif_seq_num: intinsertion_code: strhetatm_atom: strmodel_num: intdef _is_set(data: str) -> bool:"""Returns False if data is a special mmCIF character indicating 'unset'."""return data not in ('.', '?')def mmcif_loop_to_dict(prefix: str,index: str,parsed_info: MmCIFDict,) -> Mapping[str, Mapping[str, str]]:"""Extracts loop associated with a prefix from mmCIF data as a dictionary.Args:prefix: Prefix shared by each of the data items in the loop.e.g. '_entity_poly_seq.', where the data items are _entity_poly_seq.num,_entity_poly_seq.mon_id. Should include the trailing period.index: Which item of loop data should serve as the key.parsed_info: A dict of parsed mmCIF data, e.g. _mmcif_dict from a Biopythonparser.Returns:Returns a dict of dicts; each dict represents 1 entry from an mmCIF loop,indexed by the index column."""entries = mmcif_loop_to_list(prefix, parsed_info)return {entry[index]: entry for entry in entries}def _get_atom_site_list(parsed_info: MmCIFDict) -> Sequence[AtomSite]:"""Returns list of atom sites; contains data not present in the structure."""return [AtomSite(*site) for site in zip(  # pylint:disable=g-complex-comprehensionparsed_info['_atom_site.label_comp_id'],parsed_info['_atom_site.auth_asym_id'],parsed_info['_atom_site.label_asym_id'],parsed_info['_atom_site.auth_seq_id'],parsed_info['_atom_site.label_seq_id'],parsed_info['_atom_site.pdbx_PDB_ins_code'],parsed_info['_atom_site.group_PDB'],parsed_info['_atom_site.pdbx_PDB_model_num'],)]def _get_atom_positions(mmcif_object: MmcifObject,auth_chain_id: str,max_ca_ca_distance: float) -> Tuple[np.ndarray, np.ndarray]:"""Gets atom positions and mask from a list of Biopython Residues."""num_res = len(mmcif_object.chain_to_seqres[auth_chain_id])relevant_chains = [c for c in mmcif_object.structure.get_chains()if c.id == auth_chain_id]if len(relevant_chains) != 1:raise MultipleChainsError(f'Expected exactly one chain in structure with id {auth_chain_id}.')chain = relevant_chains[0]all_positions = np.zeros([num_res, atom_type_num, 3])all_positions_mask = np.zeros([num_res, atom_type_num],dtype=np.int64)for res_index in range(num_res):pos = np.zeros([atom_type_num, 3], dtype=np.float32)mask = np.zeros([atom_type_num], dtype=np.float32)res_at_position = mmcif_object.seqres_to_structure[auth_chain_id][res_index]if not res_at_position.is_missing:res = chain[(res_at_position.hetflag,res_at_position.position.residue_number,res_at_position.position.insertion_code)]for atom in res.get_atoms():atom_name = atom.get_name()x, y, z = atom.get_coord()if atom_name in atom_order.keys():pos[atom_order[atom_name]] = [x, y, z]mask[atom_order[atom_name]] = 1.0elif atom_name.upper() == 'SE' and res.get_resname() == 'MSE':# Put the coordinates of the selenium atom in the sulphur column.pos[atom_order['SD']] = [x, y, z]mask[atom_order['SD']] = 1.0# Fix naming errors in arginine residues where NH2 is incorrectly# assigned to be closer to CD than NH1.cd = atom_order['CD']nh1 = atom_order['NH1']nh2 = atom_order['NH2']if (res.get_resname() == 'ARG' andall(mask[atom_index] for atom_index in (cd, nh1, nh2)) and(np.linalg.norm(pos[nh1] - pos[cd]) >np.linalg.norm(pos[nh2] - pos[cd]))):pos[nh1], pos[nh2] = pos[nh2].copy(), pos[nh1].copy()mask[nh1], mask[nh2] = mask[nh2].copy(), mask[nh1].copy()all_positions[res_index] = posall_positions_mask[res_index] = mask_check_residue_distances(all_positions, all_positions_mask, max_ca_ca_distance)return all_positions, all_positions_maskdef _get_protein_chains(*, parsed_info: Mapping[str, Any]) -> Mapping[ChainId, Sequence[Monomer]]:"""Extracts polymer information for protein chains only.Args:parsed_info: _mmcif_dict produced by the Biopython parser.Returns:A dict mapping mmcif chain id to a list of Monomers."""# Get polymer information for each entity in the structure.entity_poly_seqs = mmcif_loop_to_list('_entity_poly_seq.', parsed_info)polymers = collections.defaultdict(list)for entity_poly_seq in entity_poly_seqs:polymers[entity_poly_seq['_entity_poly_seq.entity_id']].append(Monomer(id=entity_poly_seq['_entity_poly_seq.mon_id'],num=int(entity_poly_seq['_entity_poly_seq.num'])))# Get chemical compositions. Will allow us to identify which of these polymers# are proteins.chem_comps = mmcif_loop_to_dict('_chem_comp.', '_chem_comp.id', parsed_info)# Get chains information for each entity. Necessary so that we can return a# dict keyed on chain id rather than entity.struct_asyms = mmcif_loop_to_list('_struct_asym.', parsed_info)entity_to_mmcif_chains = collections.defaultdict(list)for struct_asym in struct_asyms:chain_id = struct_asym['_struct_asym.id']entity_id = struct_asym['_struct_asym.entity_id']entity_to_mmcif_chains[entity_id].append(chain_id)# Identify and return the valid protein chains.valid_chains = {}for entity_id, seq_info in polymers.items():chain_ids = entity_to_mmcif_chains[entity_id]# Reject polymers without any peptide-like components, such as DNA/RNA.if any(['peptide' in chem_comps[monomer.id]['_chem_comp.type'].lower()for monomer in seq_info]):for chain_id in chain_ids:valid_chains[chain_id] = seq_inforeturn valid_chainsdef get_release_date(parsed_info: MmCIFDict) -> str:"""Returns the oldest revision date."""revision_dates = parsed_info['_pdbx_audit_revision_history.revision_date']return min(revision_dates)def mmcif_loop_to_list(prefix: str,parsed_info: MmCIFDict) -> Sequence[Mapping[str, str]]:"""Extracts loop associated with a prefix from mmCIF data as a list.Reference for loop_ in mmCIF:http://mmcif.wwpdb.org/docs/tutorials/mechanics/pdbx-mmcif-syntax.htmlArgs:prefix: Prefix shared by each of the data items in the loop.e.g. '_entity_poly_seq.', where the data items are _entity_poly_seq.num,_entity_poly_seq.mon_id. Should include the trailing period.parsed_info: A dict of parsed mmCIF data, e.g. _mmcif_dict from a Biopythonparser.Returns:Returns a list of dicts; each dict represents 1 entry from an mmCIF loop."""cols = []data = []for key, value in parsed_info.items():if key.startswith(prefix):cols.append(key)data.append(value)assert all([len(xs) == len(data[0]) for xs in data]), ('mmCIF error: Not all loops are the same length: %s' % cols)return [dict(zip(cols, xs)) for xs in zip(*data)]def _get_first_model(structure: PdbStructure) -> PdbStructure:"""Returns the first model in a Biopython structure."""return next(structure.get_models())def _get_header(parsed_info: MmCIFDict) -> PdbHeader:"""Returns a basic header containing method, release date and resolution."""header = {}experiments = mmcif_loop_to_list('_exptl.', parsed_info)header['structure_method'] = ','.join([experiment['_exptl.method'].lower() for experiment in experiments])# Note: The release_date here corresponds to the oldest revision. We prefer to# use this for dataset filtering over the deposition_date.if '_pdbx_audit_revision_history.revision_date' in parsed_info:header['release_date'] = get_release_date(parsed_info)else:logging.warning('Could not determine release_date: %s',parsed_info['_entry.id'])header['resolution'] = 0.00for res_key in ('_refine.ls_d_res_high', '_em_3d_reconstruction.resolution','_reflns.d_resolution_high'):if res_key in parsed_info:try:raw_resolution = parsed_info[res_key][0]header['resolution'] = float(raw_resolution)except ValueError:logging.debug('Invalid resolution format: %s', parsed_info[res_key])return header@functools.lru_cache(16, typed=False)
def mmcif_parse(*,file_id: str,mmcif_string: str,catch_all_errors: bool = True) -> ParsingResult:"""Entry point, parses an mmcif_string.Args:file_id: A string identifier for this file. Should be unique within thecollection of files being processed.mmcif_string: Contents of an mmCIF file.catch_all_errors: If True, all exceptions are caught and error messages arereturned as part of the ParsingResult. If False exceptions will be allowedto propagate.Returns:A ParsingResult."""print("in function mmcif_parse")errors = {}try:parser = PDB.MMCIFParser(QUIET=True)handle = io.StringIO(mmcif_string)full_structure = parser.get_structure('', handle)first_model_structure = _get_first_model(full_structure)# Extract the _mmcif_dict from the parser, which contains useful fields not# reflected in the Biopython structure.parsed_info = parser._mmcif_dict  # pylint:disable=protected-access#print(f"parsed_info :{parsed_info}")# Ensure all values are lists, even if singletons.for key, value in parsed_info.items():if not isinstance(value, list):parsed_info[key] = [value]header = _get_header(parsed_info)# Determine the protein chains, and their start numbers according to the# internal mmCIF numbering scheme (likely but not guaranteed to be 1).valid_chains = _get_protein_chains(parsed_info=parsed_info)if not valid_chains:return ParsingResult(None, {(file_id, ''): 'No protein chains found in this file.'})seq_start_num = {chain_id: min([monomer.num for monomer in seq])for chain_id, seq in valid_chains.items()}# Loop over the atoms for which we have coordinates. Populate two mappings:# -mmcif_to_author_chain_id (maps internal mmCIF chain ids to chain ids used# the authors / Biopython).# -seq_to_structure_mappings (maps idx into sequence to ResidueAtPosition).mmcif_to_author_chain_id = {}seq_to_structure_mappings = {}for atom in _get_atom_site_list(parsed_info):if atom.model_num != '1':# We only process the first model at the moment.continuemmcif_to_author_chain_id[atom.mmcif_chain_id] = atom.author_chain_idif atom.mmcif_chain_id in valid_chains:hetflag = ' 'if atom.hetatm_atom == 'HETATM':# Water atoms are assigned a special hetflag of W in Biopython. We# need to do the same, so that this hetflag can be used to fetch# a residue from the Biopython structure by id.if atom.residue_name in ('HOH', 'WAT'):hetflag = 'W'else:hetflag = 'H_' + atom.residue_nameinsertion_code = atom.insertion_codeif not _is_set(atom.insertion_code):insertion_code = ' 'position = ResiduePosition(chain_id=atom.author_chain_id,residue_number=int(atom.author_seq_num),insertion_code=insertion_code)seq_idx = int(atom.mmcif_seq_num) - seq_start_num[atom.mmcif_chain_id]current = seq_to_structure_mappings.get(atom.author_chain_id, {})current[seq_idx] = ResidueAtPosition(position=position,name=atom.residue_name,is_missing=False,hetflag=hetflag)seq_to_structure_mappings[atom.author_chain_id] = current# Add missing residue information to seq_to_structure_mappings.for chain_id, seq_info in valid_chains.items():author_chain = mmcif_to_author_chain_id[chain_id]current_mapping = seq_to_structure_mappings[author_chain]for idx, monomer in enumerate(seq_info):if idx not in current_mapping:current_mapping[idx] = ResidueAtPosition(position=None,name=monomer.id,is_missing=True,hetflag=' ')author_chain_to_sequence = {}for chain_id, seq_info in valid_chains.items():author_chain = mmcif_to_author_chain_id[chain_id]seq = []for monomer in seq_info:code = SCOPData.protein_letters_3to1.get(monomer.id, 'X')seq.append(code if len(code) == 1 else 'X')seq = ''.join(seq)author_chain_to_sequence[author_chain] = seqmmcif_object = MmcifObject(file_id=file_id,header=header,structure=first_model_structure,chain_to_seqres=author_chain_to_sequence,seqres_to_structure=seq_to_structure_mappings,raw_string=parsed_info)return ParsingResult(mmcif_object=mmcif_object, errors=errors)except Exception as e:  # pylint:disable=broad-excepterrors[(file_id, '')] = eif not catch_all_errors:raisereturn ParsingResult(mmcif_object=None, errors=errors)@functools.lru_cache(16, typed=False)
def _read_file(path):with open(path, 'r') as f:file_data = f.read()return file_data######## end: 处理mmCIF 格式字符串##########def _find_template_in_pdb(template_chain_id: str,template_sequence: str,mmcif_object: MmcifObject) -> Tuple[str, str, int]:"""Tries to find the template chain in the given pdb file.This method tries the three following things in order:1. Tries if there is an exact match in both the chain ID and the sequence.If yes, the chain sequence is returned. Otherwise:2. Tries if there is an exact match only in the sequence.If yes, the chain sequence is returned. Otherwise:3. Tries if there is a fuzzy match (X = wildcard) in the sequence.If yes, the chain sequence is returned.If none of these succeed, a SequenceNotInTemplateError is thrown.Args:template_chain_id: The template chain ID.template_sequence: The template chain sequence.mmcif_object: The PDB object to search for the template in.Returns:A tuple with:* The chain sequence that was found to match the template in the PDB object.* The ID of the chain that is being returned.* The offset where the template sequence starts in the chain sequence.Raises:SequenceNotInTemplateError: If no match is found after the steps describedabove."""# Try if there is an exact match in both the chain ID and the (sub)sequence.pdb_id = mmcif_object.file_idchain_sequence = mmcif_object.chain_to_seqres.get(template_chain_id)if chain_sequence and (template_sequence in chain_sequence):logging.info('Found an exact template match %s_%s.', pdb_id, template_chain_id)mapping_offset = chain_sequence.find(template_sequence)return chain_sequence, template_chain_id, mapping_offset# Try if there is an exact match in the (sub)sequence only.for chain_id, chain_sequence in mmcif_object.chain_to_seqres.items():if chain_sequence and (template_sequence in chain_sequence):logging.info('Found a sequence-only match %s_%s.', pdb_id, chain_id)mapping_offset = chain_sequence.find(template_sequence)return chain_sequence, chain_id, mapping_offset# Return a chain sequence that fuzzy matches (X = wildcard) the template.# Make parentheses unnamed groups (?:_) to avoid the 100 named groups limit.regex = ['.' if aa == 'X' else '(?:%s|X)' % aa for aa in template_sequence]regex = re.compile(''.join(regex))for chain_id, chain_sequence in mmcif_object.chain_to_seqres.items():match = re.search(regex, chain_sequence)if match:logging.info('Found a fuzzy sequence-only match %s_%s.', pdb_id, chain_id)mapping_offset = match.start()return chain_sequence, chain_id, mapping_offset# No hits, raise an error.raise SequenceNotInTemplateError('Could not find the template sequence in %s_%s. Template sequence: %s, ''chain_to_seqres: %s' % (pdb_id, template_chain_id, template_sequence,mmcif_object.chain_to_seqres))def _extract_template_features(mmcif_object: MmcifObject,pdb_id: str,mapping: Mapping[int, int],template_sequence: str,query_sequence: str,template_chain_id: str,kalign_binary_path: str) -> Tuple[Dict[str, Any], Optional[str]]:"""Parses atom positions in the target structure and aligns with the query.Atoms for each residue in the template structure are indexed to coincidewith their corresponding residue in the query sequence, according to thealignment mapping provided.Args:mmcif_object: mmcif_parsing.MmcifObject representing the template.pdb_id: PDB code for the template.mapping: Dictionary mapping indices in the query sequence to indices inthe template sequence.template_sequence: String describing the amino acid sequence for thetemplate protein.query_sequence: String describing the amino acid sequence for the queryprotein.template_chain_id: String ID describing which chain in the structure protoshould be used.kalign_binary_path: The path to a kalign executable used for templaterealignment.Returns:A tuple with:* A dictionary containing the extra features derived from the templateprotein structure.* A warning message if the hit was realigned to the actual mmCIF sequence.Otherwise None.Raises:NoChainsError: If the mmcif object doesn't contain any chains.SequenceNotInTemplateError: If the given chain id / sequence can'tbe found in the mmcif object.QueryToTemplateAlignError: If the actual template in the mmCIF filecan't be aligned to the query.NoAtomDataInTemplateError: If the mmcif object doesn't containatom positions.TemplateAtomMaskAllZerosError: If the mmcif object doesn't have anyunmasked residues."""if mmcif_object is None or not mmcif_object.chain_to_seqres:raise NoChainsError('No chains in PDB: %s_%s' % (pdb_id, template_chain_id))warning = Nonetry:seqres, chain_id, mapping_offset = _find_template_in_pdb(template_chain_id=template_chain_id,template_sequence=template_sequence,mmcif_object=mmcif_object)except SequenceNotInTemplateError:# If PDB70 contains a different version of the template, we use the sequence# from the mmcif_object.chain_id = template_chain_idwarning = (f'The exact sequence {template_sequence} was not found in 'f'{pdb_id}_{chain_id}. Realigning the template to the actual sequence.')logging.warning(warning)# This throws an exception if it fails to realign the hit.seqres, mapping = _realign_pdb_template_to_query(old_template_sequence=template_sequence,template_chain_id=template_chain_id,mmcif_object=mmcif_object,old_mapping=mapping,kalign_binary_path=kalign_binary_path)logging.info('Sequence in %s_%s: %s successfully realigned to %s',pdb_id, chain_id, template_sequence, seqres)# The template sequence changed.template_sequence = seqres# No mapping offset, the query is aligned to the actual sequence.mapping_offset = 0try:# Essentially set to infinity - we don't want to reject templates unless# they're really really bad.all_atom_positions, all_atom_mask = _get_atom_positions(mmcif_object, chain_id, max_ca_ca_distance=150.0)except (CaDistanceError, KeyError) as ex:raise NoAtomDataInTemplateError('Could not get atom data (%s_%s): %s' % (pdb_id, chain_id, str(ex))) from exall_atom_positions = np.split(all_atom_positions, all_atom_positions.shape[0])all_atom_masks = np.split(all_atom_mask, all_atom_mask.shape[0])output_templates_sequence = []templates_all_atom_positions = []templates_all_atom_masks = []for _ in query_sequence:# Residues in the query_sequence that are not in the template_sequence:templates_all_atom_positions.append(np.zeros((atom_type_num, 3)))templates_all_atom_masks.append(np.zeros(atom_type_num))output_templates_sequence.append('-')for k, v in mapping.items():template_index = v + mapping_offsettemplates_all_atom_positions[k] = all_atom_positions[template_index][0]templates_all_atom_masks[k] = all_atom_masks[template_index][0]output_templates_sequence[k] = template_sequence[v]# Alanine (AA with the lowest number of atoms) has 5 atoms (C, CA, CB, N, O).if np.sum(templates_all_atom_masks) < 5:raise TemplateAtomMaskAllZerosError('Template all atom mask was all zeros: %s_%s. Residue range: %d-%d' %(pdb_id, chain_id, min(mapping.values()) + mapping_offset,max(mapping.values()) + mapping_offset))output_templates_sequence = ''.join(output_templates_sequence)templates_aatype = sequence_to_onehot(output_templates_sequence, HHBLITS_AA_TO_ID)return ({'template_all_atom_positions': np.array(templates_all_atom_positions),'template_all_atom_masks': np.array(templates_all_atom_masks),'template_sequence': output_templates_sequence.encode(),'template_aatype': np.array(templates_aatype),'template_domain_names': f'{pdb_id.lower()}_{chain_id}'.encode(),},warning)def _is_after_cutoff(pdb_id: str,release_dates: Mapping[str, datetime.datetime],release_date_cutoff: Optional[datetime.datetime]) -> bool:"""Checks if the template date is after the release date cutoff.Args:pdb_id: 4 letter pdb code.release_dates: Dictionary mapping PDB ids to their structure release dates.release_date_cutoff: Max release date that is valid for this query.Returns:True if the template release date is after the cutoff, False otherwise."""if release_date_cutoff is None:raise ValueError('The release_date_cutoff must not be None.')if pdb_id in release_dates:return release_dates[pdb_id] > release_date_cutoffelse:# Since this is just a quick prefilter to reduce the number of mmCIF files# we need to parse, we don't have to worry about returning True here.return Falsedef _build_query_to_hit_index_mapping(hit_query_sequence: str,hit_sequence: str,indices_hit: Sequence[int],indices_query: Sequence[int],original_query_sequence: str) -> Mapping[int, int]:"""Gets mapping from indices in original query sequence to indices in the hit.hit_query_sequence and hit_sequence are two aligned sequences containing gapcharacters. hit_query_sequence contains only the part of the original querysequence that matched the hit. When interpreting the indices from the .hhr, weneed to correct for this to recover a mapping from original query sequence tothe hit sequence.Args:hit_query_sequence: The portion of the query sequence that is in the .hhrhithit_sequence: The portion of the hit sequence that is in the .hhrindices_hit: The indices for each aminoacid relative to the hit sequenceindices_query: The indices for each aminoacid relative to the original querysequenceoriginal_query_sequence: String describing the original query sequence.Returns:Dictionary with indices in the original query sequence as keys and indicesin the hit sequence as values."""# If the hit is empty (no aligned residues), return empty mappingif not hit_query_sequence:return {}# Remove gaps and find the offset of hit.query relative to original query.hhsearch_query_sequence = hit_query_sequence.replace('-', '')hit_sequence = hit_sequence.replace('-', '')hhsearch_query_offset = original_query_sequence.find(hhsearch_query_sequence)# Index of -1 used for gap characters. Subtract the min index ignoring gaps.min_idx = min(x for x in indices_hit if x > -1)fixed_indices_hit = [x - min_idx if x > -1 else -1 for x in indices_hit]min_idx = min(x for x in indices_query if x > -1)fixed_indices_query = [x - min_idx if x > -1 else -1 for x in indices_query]# Zip the corrected indices, ignore case where both seqs have gap characters.mapping = {}for q_i, q_t in zip(fixed_indices_query, fixed_indices_hit):if q_t != -1 and q_i != -1:if (q_t >= len(hit_sequence) orq_i + hhsearch_query_offset >= len(original_query_sequence)):continuemapping[q_i + hhsearch_query_offset] = q_treturn mappingdef _process_single_hit(query_sequence: str,hit: TemplateHit,mmcif_dir: str,max_template_date: datetime.datetime,release_dates: Mapping[str, datetime.datetime],obsolete_pdbs: Mapping[str, Optional[str]],kalign_binary_path: str,strict_error_check: bool = False) -> SingleHitResult:"""Tries to extract template features from a single HHSearch hit."""print("in function _process_single_hit")#print(f"release_dates:{release_dates}")#print(f"obsolete_pdbs:{obsolete_pdbs}")# Fail hard if we can't get the PDB ID and chain name from the hit.hit_pdb_code, hit_chain_id = _get_pdb_id_and_chain(hit)#print(f"hit_pdb_code {hit_pdb_code}")#print(f"hit_chain_id {hit_chain_id}")# This hit has been removed (obsoleted) from PDB, skip it.if hit_pdb_code in obsolete_pdbs and obsolete_pdbs[hit_pdb_code] is None:return SingleHitResult(features=None, error=None, warning=f'Hit {hit_pdb_code} is obsolete.')if hit_pdb_code not in release_dates:if hit_pdb_code in obsolete_pdbs:hit_pdb_code = obsolete_pdbs[hit_pdb_code]# Pass hit_pdb_code since it might have changed due to the pdb being obsolete.try:_assess_hhsearch_hit(hit=hit,hit_pdb_code=hit_pdb_code,query_sequence=query_sequence,release_dates=release_dates,release_date_cutoff=max_template_date)except PrefilterError as e:msg = f'hit {hit_pdb_code}_{hit_chain_id} did not pass prefilter: {str(e)}'print("got PrefilterError")print(msg)logging.info(msg)if strict_error_check and isinstance(e, (DateError, DuplicateError)):# In strict mode we treat some prefilter cases as errors.return SingleHitResult(features=None, error=msg, warning=None)return SingleHitResult(features=None, error=None, warning=None)mapping = _build_query_to_hit_index_mapping(hit.query, hit.hit_sequence, hit.indices_hit, hit.indices_query,query_sequence)# The mapping is from the query to the actual hit sequence, so we need to# remove gaps (which regardless have a missing confidence score).template_sequence = hit.hit_sequence.replace('-', '')cif_path = os.path.join(mmcif_dir, hit_pdb_code + '.cif')logging.debug('Reading PDB entry from %s. Query: %s, template: %s', cif_path,query_sequence, template_sequence)# Fail if we can't find the mmCIF file.cif_string = _read_file(cif_path)parsing_result = mmcif_parse(file_id=hit_pdb_code, mmcif_string=cif_string)#print(f"cif_string:{cif_string}")#print(f"parsing_result:{parsing_result}")if parsing_result.mmcif_object is not None:hit_release_date = datetime.datetime.strptime(parsing_result.mmcif_object.header['release_date'], '%Y-%m-%d')if hit_release_date > max_template_date:error = ('Template %s date (%s) > max template date (%s).' %(hit_pdb_code, hit_release_date, max_template_date))if strict_error_check:return SingleHitResult(features=None, error=error, warning=None)else:logging.debug(error)return SingleHitResult(features=None, error=None, warning=None)try:features, realign_warning = _extract_template_features(mmcif_object=parsing_result.mmcif_object,pdb_id=hit_pdb_code,mapping=mapping,template_sequence=template_sequence,query_sequence=query_sequence,template_chain_id=hit_chain_id,kalign_binary_path=kalign_binary_path)if hit.sum_probs is None:features['template_sum_probs'] = [0]else:features['template_sum_probs'] = [hit.sum_probs]# It is possible there were some errors when parsing the other chains in the# mmCIF file, but the template features for the chain we want were still# computed. In such case the mmCIF parsing errors are not relevant.return SingleHitResult(features=features, error=None, warning=realign_warning)except (NoChainsError, NoAtomDataInTemplateError,TemplateAtomMaskAllZerosError) as e:# These 3 errors indicate missing mmCIF experimental data rather than a# problem with the template search, so turn them into warnings.warning = ('%s_%s (sum_probs: %s, rank: %s): feature extracting errors: ''%s, mmCIF parsing errors: %s'% (hit_pdb_code, hit_chain_id, hit.sum_probs, hit.index,str(e), parsing_result.errors))if strict_error_check:return SingleHitResult(features=None, error=warning, warning=None)else:return SingleHitResult(features=None, error=None, warning=warning)except Error as e:error = ('%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: ''%s, mmCIF parsing errors: %s'% (hit_pdb_code, hit_chain_id, hit.sum_probs, hit.index,str(e), parsing_result.errors))return SingleHitResult(features=None, error=error, warning=None)def _get_pdb_id_and_chain(hit: TemplateHit) -> Tuple[str, str]:"""Returns PDB id and chain id for an HHSearch Hit."""# PDB ID: 4 letters. Chain ID: 1+ alphanumeric letters or "." if unknown.id_match = re.match(r'[a-zA-Z\d]{4}_[a-zA-Z0-9.]+', hit.name)if not id_match:raise ValueError(f'hit.name did not start with PDBID_chain: {hit.name}')pdb_id, chain_id = id_match.group(0).split('_')return pdb_id.lower(), chain_iddef _assess_hhsearch_hit(hit: TemplateHit,hit_pdb_code: str,query_sequence: str,release_dates: Mapping[str, datetime.datetime],release_date_cutoff: datetime.datetime,max_subsequence_ratio: float = 0.95,min_align_ratio: float = 0.05) -> bool:# 默认 min_align_ratio: float = 0.1,为了演示改为0.05"""Determines if template is valid (without parsing the template mmcif file).Args:hit: HhrHit for the template.hit_pdb_code: The 4 letter pdb code of the template hit. This might bedifferent from the value in the actual hit since the original pdb mighthave become obsolete.query_sequence: Amino acid sequence of the query.release_dates: Dictionary mapping pdb codes to their structure releasedates.release_date_cutoff: Max release date that is valid for this query.max_subsequence_ratio: Exclude any exact matches with this much overlap.min_align_ratio: Minimum overlap between the template and query.Returns:True if the hit passed the prefilter. Raises an exception otherwise.Raises:DateError: If the hit date was after the max allowed date.AlignRatioError: If the hit align ratio to the query was too small.DuplicateError: If the hit was an exact subsequence of the query.LengthError: If the hit was too short."""print("in function _assess_hhsearch_hit")aligned_cols = hit.aligned_colsalign_ratio = aligned_cols / len(query_sequence)print(f"align_ratio {align_ratio}")template_sequence = hit.hit_sequence.replace('-', '')length_ratio = float(len(template_sequence)) / len(query_sequence)print(f"length_ratio {length_ratio}")# Check whether the template is a large subsequence or duplicate of original# query. This can happen due to duplicate entries in the PDB database.duplicate = (template_sequence in query_sequence andlength_ratio > max_subsequence_ratio)if _is_after_cutoff(hit_pdb_code, release_dates, release_date_cutoff):raise DateError(f'Date ({release_dates[hit_pdb_code]}) > max template date 'f'({release_date_cutoff}).')if align_ratio <= min_align_ratio:raise AlignRatioError('Proportion of residues aligned to query too small. 'f'Align ratio: {align_ratio}.')if duplicate:raise DuplicateError('Template is an exact subsequence of query with large 'f'coverage. Length ratio: {length_ratio}.')if len(template_sequence) < 10:raise LengthError(f'Template too short. Length: {len(template_sequence)}.')return Truedef _check_residue_distances(all_positions: np.ndarray,all_positions_mask: np.ndarray,max_ca_ca_distance: float):"""Checks if the distance between unmasked neighbor residues is ok."""ca_position = atom_order['CA']prev_is_unmasked = Falseprev_calpha = Nonefor i, (coords, mask) in enumerate(zip(all_positions, all_positions_mask)):this_is_unmasked = bool(mask[ca_position])if this_is_unmasked:this_calpha = coords[ca_position]if prev_is_unmasked:distance = np.linalg.norm(this_calpha - prev_calpha)if distance > max_ca_ca_distance:raise CaDistanceError('The distance between residues %d and %d is %f > limit %f.' % (i, i + 1, distance, max_ca_ca_distance))prev_calpha = this_calphaprev_is_unmasked = this_is_unmaskeddef sequence_to_onehot(sequence: str,mapping: Mapping[str, int],map_unknown_to_x: bool = False) -> np.ndarray:"""Maps the given sequence into a one-hot encoded matrix.Args:sequence: An amino acid sequence.mapping: A dictionary mapping amino acids to integers.map_unknown_to_x: If True, any amino acid that is not in the mapping will bemapped to the unknown amino acid 'X'. If the mapping doesn't containamino acid 'X', an error will be thrown. If False, any amino acid not inthe mapping will throw an error.Returns:A numpy array of shape (seq_len, num_unique_aas) with one-hot encoding ofthe sequence.Raises:ValueError: If the mapping doesn't contain values from 0 tonum_unique_aas - 1 without any gaps."""num_entries = max(mapping.values()) + 1if sorted(set(mapping.values())) != list(range(num_entries)):raise ValueError('The mapping must have values from 0 to num_unique_aas-1 ''without any gaps. Got: %s' % sorted(mapping.values()))one_hot_arr = np.zeros((len(sequence), num_entries), dtype=np.int32)for aa_index, aa_type in enumerate(sequence):if map_unknown_to_x:if aa_type.isalpha() and aa_type.isupper():aa_id = mapping.get(aa_type, mapping['X'])else:raise ValueError(f'Invalid character in the sequence: {aa_type}')else:aa_id = mapping[aa_type]one_hot_arr[aa_index, aa_id] = 1return one_hot_arrclass TemplateHitFeaturizer(abc.ABC):"""An abstract base class for turning template hits to template features."""def __init__(self,mmcif_dir: str,max_template_date: str,max_hits: int,kalign_binary_path: str,release_dates_path: Optional[str],obsolete_pdbs_path: Optional[str],strict_error_check: bool = False):"""Initializes the Template Search.Args:mmcif_dir: Path to a directory with mmCIF structures. Once a template IDis found by HHSearch, this directory is used to retrieve the templatedata.max_template_date: The maximum date permitted for template structures. Notemplate with date higher than this date will be returned. In ISO8601date format, YYYY-MM-DD.max_hits: The maximum number of templates that will be returned.kalign_binary_path: The path to a kalign executable used for templaterealignment.release_dates_path: An optional path to a file with a mapping from PDB IDsto their release dates. Thanks to this we don't have to redundantlyparse mmCIF files to get that information.obsolete_pdbs_path: An optional path to a file containing a mapping fromobsolete PDB IDs to the PDB IDs of their replacements.strict_error_check: If True, then the following will be treated as errors:* If any template date is after the max_template_date.* If any template has identical PDB ID to the query.* If any template is a duplicate of the query.* Any feature computation errors."""self._mmcif_dir = mmcif_dirif not glob.glob(os.path.join(self._mmcif_dir, '*.cif')):logging.error('Could not find CIFs in %s', self._mmcif_dir)raise ValueError(f'Could not find CIFs in {self._mmcif_dir}')try:self._max_template_date = datetime.datetime.strptime(max_template_date, '%Y-%m-%d')except ValueError:raise ValueError('max_template_date must be set and have format YYYY-MM-DD.')self._max_hits = max_hitsself._kalign_binary_path = kalign_binary_pathself._strict_error_check = strict_error_checkif release_dates_path:logging.info('Using precomputed release dates %s.', release_dates_path)self._release_dates = _parse_release_dates(release_dates_path)else:self._release_dates = {}if obsolete_pdbs_path:logging.info('Using precomputed obsolete pdbs %s.', obsolete_pdbs_path)self._obsolete_pdbs = _parse_obsolete(obsolete_pdbs_path)else:self._obsolete_pdbs = {}@abc.abstractmethoddef get_templates(self,query_sequence: str,hits: Sequence[TemplateHit]) -> TemplateSearchResult:"""Computes the templates for given query sequence."""class HhsearchHitFeaturizer(TemplateHitFeaturizer):"""A class for turning a3m hits from hhsearch to template features."""def get_templates(self,query_sequence: str,hits: Sequence[TemplateHit]) -> TemplateSearchResult:"""Computes the templates for given query sequence (more details above)."""logging.info('Searching for template for: %s', query_sequence)template_features = {}for template_feature_name in TEMPLATE_FEATURES:template_features[template_feature_name] = []num_hits = 0errors = []warnings = []#print(f"sorted hits:{sorted(hits, key=lambda x: x.sum_probs, reverse=True)}")for hit in sorted(hits, key=lambda x: x.sum_probs, reverse=True):# We got all the templates we wanted, stop processing hits.if num_hits >= self._max_hits:breakresult = _process_single_hit(query_sequence=query_sequence,hit=hit,mmcif_dir=self._mmcif_dir,max_template_date=self._max_template_date,release_dates=self._release_dates,obsolete_pdbs=self._obsolete_pdbs,strict_error_check=self._strict_error_check,kalign_binary_path=self._kalign_binary_path)#print(f"_process_single_hit result: {result}")if result.error:errors.append(result.error)# There could be an error even if there are some results, e.g. thrown by# other unparsable chains in the same mmCIF file.if result.warning:warnings.append(result.warning)if result.features is None:logging.info('Skipped invalid hit %s, error: %s, warning: %s',hit.name, result.error, result.warning)else:# Increment the hit counter, since we got features out of this hit.num_hits += 1for k in template_features:template_features[k].append(result.features[k])for name in template_features:if num_hits > 0:template_features[name] = np.stack(template_features[name], axis=0).astype(TEMPLATE_FEATURES[name])else:# Make sure the feature has correct dtype even if empty.template_features[name] = np.array([], dtype=TEMPLATE_FEATURES[name])return TemplateSearchResult(features=template_features, errors=errors, warnings=warnings)class HmmsearchHitFeaturizer(TemplateHitFeaturizer):"""A class for turning a3m hits from hmmsearch to template features."""def get_templates(self,query_sequence: str,hits: Sequence[TemplateHit]) -> TemplateSearchResult:"""Computes the templates for given query sequence (more details above)."""logging.info('Searching for template for: %s', query_sequence)template_features = {}for template_feature_name in TEMPLATE_FEATURES:template_features[template_feature_name] = []already_seen = set()errors = []warnings = []if not hits or hits[0].sum_probs is None:sorted_hits = hitselse:sorted_hits = sorted(hits, key=lambda x: x.sum_probs, reverse=True)#print(f"sorted_hits:{sorted_hits}")for hit in sorted_hits:# We got all the templates we wanted, stop processing hits.if len(already_seen) >= self._max_hits:breakresult = _process_single_hit(query_sequence=query_sequence,hit=hit,mmcif_dir=self._mmcif_dir,max_template_date=self._max_template_date,release_dates=self._release_dates,obsolete_pdbs=self._obsolete_pdbs,strict_error_check=self._strict_error_check,kalign_binary_path=self._kalign_binary_path)if result.error:errors.append(result.error)# There could be an error even if there are some results, e.g. thrown by# other unparsable chains in the same mmCIF file.if result.warning:warnings.append(result.warning)if result.features is None:logging.debug('Skipped invalid hit %s, error: %s, warning: %s',hit.name, result.error, result.warning)else:already_seen_key = result.features['template_sequence']if already_seen_key in already_seen:continue# Increment the hit counter, since we got features out of this hit.already_seen.add(already_seen_key)for k in template_features:template_features[k].append(result.features[k])if already_seen:for name in template_features:template_features[name] = np.stack(template_features[name], axis=0).astype(TEMPLATE_FEATURES[name])else:num_res = len(query_sequence)# Construct a default template with all zeros.print("Construct a default template with all zeros.")template_features = {'template_aatype': np.zeros((1, num_res, len(residue_constants.restypes_with_x_and_gap)),np.float32),'template_all_atom_masks': np.zeros((1, num_res, residue_constants.atom_type_num), np.float32),'template_all_atom_positions': np.zeros((1, num_res, residue_constants.atom_type_num, 3), np.float32),'template_domain_names': np.array([''.encode()], dtype=object),'template_sequence': np.array([''.encode()], dtype=object),'template_sum_probs': np.array([0], dtype=np.float32)}return TemplateSearchResult(features=template_features, errors=errors, warnings=warnings)### Hhsearch软件搜索pdb结构数据库得到的模版特征提取
import pickle
with open('test_pdb_hits.pkl', 'rb') as file:
#with open('/home/zheng/test/test_pdb_hits.pkl', 'rb') as file:# Load the data from the filepdb_template_hits = pickle.load(file)pdb_template_hits = pdb_template_hits[0:5] # # 取部分演示数据
#print(type(pdb_template_hits))
#print(f"pdb_template_hits:{pdb_template_hits}")## 根据pdb_template_hits结果,下载mmcif文件到制定目录
pdb_ids = []
for hit in pdb_template_hits:# name='5UXX_C BaquA.17208.a, BaquA.17842.a; SSGCID, Bartonella quintana, sigma factor; HET: SO4, MSE; 2.45A {Bartonella quintana}'pdb_id = hit.name.split()[0]pdb_id = pdb_id.split("_")[0]pdb_ids.append(pdb_id)from Bio.PDB import PDBList
import os# 创建PDBList对象
pdbl = PDBList()# 设置下载目录
template_mmcif_dir = "/home/zheng/test/mmcif"print("开始下载mmcif文件")
## 批量下载结构数据
for pdb_id in pdb_ids:pdbl.retrieve_pdb_file(pdb_code = pdb_id, pdir = template_mmcif_dir, file_format = 'mmCif')print(f"mmCIF file downloaded to: {template_mmcif_dir}")
"""
"""max_template_date = "2023-11-27"  # format YYYY-MM-DD 
#max_template_date = datetime.datetime.strptime(
#          max_template_date, '%Y-%m-%d')MAX_TEMPLATE_HITS = 3
kalign_binary_path = "home/zheng/anaconda3/envs/deep_learning/bin/kalign"
#print(max_template_date)# 实例化HhsearchHitFeaturizer类
#template_featurizer = HhsearchHitFeaturizer(mmcif_dir=template_mmcif_dir,
#                                            max_template_date=max_template_date,
#                                            max_hits=MAX_TEMPLATE_HITS,
#                                            kalign_binary_path=kalign_binary_path,
#                                            release_dates_path=None,
#                                            obsolete_pdbs_path=None)
# 实例化HmmsearchHitFeaturizer类
template_featurizer = HmmsearchHitFeaturizer(mmcif_dir=template_mmcif_dir,max_template_date=max_template_date,max_hits=MAX_TEMPLATE_HITS,kalign_binary_path=kalign_binary_path,release_dates_path=None,obsolete_pdbs_path=None)print(template_featurizer)## 输入序列
input_fasta_file = '/home/zheng/test/Q94K49.fasta'
## 从fasta文件提取 query_sequence(str格式)
input_sequence = ""
with open(input_fasta_file) as f:for line in f.readlines():if line.startswith(">"):continueinput_sequence += line.strip()templates_result = template_featurizer.get_templates(query_sequence=input_sequence,hits=pdb_template_hits)
print(f"templates_result.errors: {templates_result.errors}")
print(f"templates_result.warnings:{templates_result.warnings}")print(f"输入序列为:{input_sequence} 长度为:{len(input_sequence)}")for k, v in templates_result.features.items():print(k)print(f"值的类型为:{type(v)}")print(f"值的维度为:{v.shape}")print(v)#print(f"[2,120,:]:{templates_result.features['template_all_atom_positions'][2,120,     :]}")

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/190592.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【PyTorch】(四)损失函数与优化器

文章目录 1. 损失函数2. 优化器 1. 损失函数 2. 优化器

[数据结构]红黑树的定义以及添加原则

红黑树是一种自平衡的二叉查找树&#xff0c;是一种常用的数据结构 1972年出现&#xff0c;在当时被称为平衡二叉B树。后来1978年被修改为如今的“红黑树” 它是一个特殊的二叉查找树&#xff0c;红黑树的每一个节点上都有储存位表示节点的颜色 每一个节点可以是红或者黑&#…

代码生成器——MyBatisX插件

MyBatisX插件 MyBatis-Plus为我们提供了强大的mapper和service模板&#xff0c;能够大大的提高开发效率。 但是在真正开发过程中&#xff0c;MyBatis-Plus并不能为我们解决所有问题&#xff0c;例如一些复杂的SQL&#xff0c;多表联查&#xff0c;我们就需要自己去编写代码和SQ…

修复 Ubuntu 2204 Wi-Fi 热点无法连接问题

修复 Ubuntu 2204 Wi-Fi 热点无法连接问题 Ubuntu 升级到 Ubuntu 22.04 之后&#xff0c; 系统自带的 Wi-Fi 热点功能就不能用了&#xff0c; 共享的热点无法连接&#xff0c; 应该是 wpa_supplicant-2.10 导致的 (https://blog.incompetent.me/2022/07/27/workaround-ubuntu-…

22.Python 操作目录

目录&#xff09; 1. 认识路径相对路径绝对路径 2. 拼接路径3.检测目录4.创建和删除目录5.遍历目录 1. 认识路径 目录也称文件夹&#xff0c;用于分层保护文件&#xff0c;通过目录可以分门别类地存放文件&#xff0c;也可以通过目录快速地找到想要的文件&#xff0c;在Python…

每天一点python——day83

#每天一点Python——83 #python报错原因 bug&#xff1a; 在计算机程序中&#xff0c;bug是指程序中的错误或故障&#xff0c;导致程序不能按照预期执行或产生不正确的结果。 这种错误可能是由编码或设计错误、运行环境不一致、输入数据异常等多种原因导致的。常见的bug类型包括…

密码学实验三

第一题&#xff1a; 寻找满足特定条件的 e&#xff1b; 第一步&#xff1a; 第二步&#xff1a; 由式1.7知&#xff0c;给定e,p,q&#xff0c;就可计算出相应的RSA不动点的数目。因此设计算法步骤如下&#xff1a; 枚举找出所有与φ(n)互素的e。枚举所有满足条件的e&#xff…

oracle 去重

Oracle去重 在Oracle中进行去重的SQL语句有以下几种&#xff1a; 1.使用DISTINCT关键字 使用SELECT DISTINCT列名 FROM 表名来选择唯一的值。例如&#xff1a;SELECT DISTINCT column_name FROM table_name; 2.使用GROUP BY子句 使用GROUP BY子句将重复的值分组&#xff0…

为XiunoBBS4.0开启redis缓存且支持密码验证

修改模块文件1 xiunoPHP/cache_redis.class.php: <?phpclass cache_redis {public $conf array();public $link NULL;public $cachepre ;public $errno 0;public $errstr ;public function __construct($conf array()) {if(!extension_loaded(Redis)) {return $thi…

HTTP 基本概念(计算机网络)

一、HTTP 是什么&#xff1f; HTTP(HyperText Transfer Protocol) &#xff1a;超文本传输协议。 HTTP 是一个在计算机世界里专门在「两点」之间「传输」文字、图片、音频、视频等「超文本」数据的「约定和规范」。 「HTTP 是用于从互联网服务器传输超文本到本地浏览器的协议…

Linux部分基础指令讲解

目录 1.echo指令 2.more指令 3.less指令&#xff08;重要&#xff09; 4.head指令 5.tail指令 6.管道| 7.时间相关的指令 8.cal指令 9.find指令 10.grep指令 1.echo指令 我们先看效果 如图所示我们可以看到显示器显示出了hellow world和hellow这两句话&#xff0c;我们的echo的…

51单片机对SHT30的驱动,读取温湿度

一、SHT30简介 SHT30温湿度传感器是一种数字式温湿度传感器&#xff0c;由Sensirion公司开发和生产。它具有高精度、快速响应和稳定性强的特点&#xff0c;被广泛用于气象观测、室内环境监测、智能家居和工业自动化等领域。 以下是SHT30温湿度传感器的主要特点&#xff1a; 高精…

简单句子成分、阅读技巧

四、段落的主旨题&#xff1a;问这一段讲了什么&#xff08;一般都在段落的第一句话或最后一句话&#xff09; 词汇题的答案一般都在生词的上一句或者下一句 做题步骤&#xff1a; 1、先标段落 2、看题&#xff0c;划出关键词 3、去原文定位&#xff0c;标注中文意思 4、第一遍…

Dart编程基础 - 一种新的编程语言

Dart编程基础 – 一种新的编程语言 Dart Programming Essentials - A New Type of Programming Language By JacksonML Dart is a client-optimized language for fast apps on any platform From dart.dev 在1999年之前&#xff0c;和我一样对计算机技术感兴趣的伙伴们&…

1094. 拼车(差分堆排序)

Problem: 1094. 拼车 文章目录 题目思路Review 差分数组定义区间加法减法更新差分数组&#xff1a;为啥这样更新 思路1 Code思路2 Code 题目 车上最初有 capacity 个空座位。车 只能 向一个方向行驶&#xff08;也就是说&#xff0c;不允许掉头或改变方向&#xff09; 给定整…

高级前端面试中的三个 “送命题” !!!

原型与原型链 说到原型&#xff0c;就不得不提一下构造函数&#xff0c;首先我们看下面一个简单的例子&#xff1a; function Dog(name,age){this.name name;this.age age; }let dog1 new Dog("哈士奇",3); let dog2 new Dog("泰迪",2);首先创造空的…

Mybatis-plus逻辑删除

转载自:www.javaman.cn 1、application.yml配置 mybatis-plus: 表示这是 MyBatis-Plus 的配置部分。global-config: 全局配置。db-config: 数据库相关配置。logic-delete-field: 指定逻辑删除的字段名。在这里&#xff0c;指定的字段名是 deleted。这意味着&#xff0c;当你调…

《算法通关村——滑动窗口高频问题》

《算法通关村——滑动窗口高频问题》 239. 滑动窗口最大值 给你一个整数数组 nums&#xff0c;有一个大小为 k 的滑动窗口从数组的最左侧移动到数组的最右侧。你只可以看到在滑动窗口内的 k 个数字。滑动窗口每次只向右移动一位。 返回 滑动窗口中的最大值 。 示例 1&#…

机械臂运动规划、抓取末端执行器、抓取开源项目

运动规划 1.1已有抓取点 假设抓取点已检测到。这些方法设计了从机器人手到目标物体抓取点的路径。这里运动表示是关键问题。虽然存在从机器人手到目标抓握点的无限数量的轨迹&#xff0c;但是由于机器人臂的限制&#xff0c;许多区域无法到达。因此&#xff0c;需要对轨迹进行…

Hdoop学习笔记(HDP)-Part.01 关于HDP

目录 Part.01 关于HDP Part.02 核心组件原理 Part.03 资源规划 Part.04 基础环境配置 Part.05 Yum源配置 Part.06 安装OracleJDK Part.07 安装MySQL Part.08 部署Ambari集群 Part.09 安装OpenLDAP Part.10 创建集群 Part.11 安装Kerberos Part.12 安装HDFS Part.13 安装Ranger …