Helper functions#
Helper functions — prolif.utils
#
- prolif.utils.get_residues_near_ligand(lig: BaseRDKitMol, prot: BaseRDKitMol, cutoff: float = 6.0, use_segid: bool = False) list[prolif.residue.ResidueId] [source]#
Detects residues close to a reference ligand
- Parameters:
lig (prolif.molecule.Molecule or prolif.residue.Residue) – Select residues that are near this ligand/residue
prot (prolif.molecule.Molecule) – Protein containing the residues
cutoff (float) – If any interatomic distance between the ligand reference points and a residue is below or equal to this cutoff, the residue will be selected
use_segid (bool, default = False) – Use the segment number rather than the chain identifier as a chain.
- Returns:
residues (list) – A list of unique
ResidueId
that are close to the ligand.. versionchanged:: 2.1.0 – Added use_segid.
- prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, selections: str, **kwargs: Any) AtomGroup [source]#
- prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, *selections: str, **kwargs: Any) list[MDAnalysis.core.groups.AtomGroup]
Returns AtomGroup(s) that satisfy each distance-based
selection
over the entire trajectory rather than only the first frame. Only useful for distance-based selections. Usegroup {0}
in any additional selection to refer to the AtomGroup generated by the first selection, orgroup {N-1}
for the Nth selection.
- prolif.utils.to_bitvectors(df: DataFrame) list[rdkit.DataStructs.cDataStructs.ExplicitBitVect] [source]#
Converts an interaction DataFrame to a list of RDKit ExplicitBitVector
- Parameters:
df (pandas.DataFrame) – A DataFrame where each column corresponds to an interaction between two residues
- Returns:
bv – A list of
ExplicitBitVect
for each frame- Return type:
Example
>>> from rdkit.DataStructs import TanimotoSimilarity >>> bv = prolif.to_bitvectors(df) >>> TanimotoSimilarity(bv[0], bv[1]) 0.42
- prolif.utils.to_dataframe(ifp: IFPResults, interactions: Collection[str], count: bool = False, dtype: type | None = None, drop_empty: bool = True, index_col: str = 'Frame') DataFrame [source]#
Converts IFPs to a pandas DataFrame
- Parameters:
ifp (dict) – A dict in the format
{<frame number>: {(<residue_id>, <residue_id>): <interactions>}}
.<interactions>
is either anumpy.ndarray
bitvector, or a tuple of dict in the format{<interaction name>: <metadata dict>}
.interactions (list) – A list of interactions, in the same order as used to detect the interactions.
count (bool) – Whether to output a count fingerprint or not.
dtype (object or None) – Cast the dataframe values to this type. If
None
, usesnp.uint8
ifcount=True
, elsebool
.drop_empty (bool) – Drop columns with only empty values
index_col (str) – Name of the index column in the DataFrame
- Returns:
df – A 3-levels DataFrame where each ligand residue, protein residue, and interaction type are in separate columns
- Return type:
Example
>>> df = prolif.to_dataframe(results, fp.interactions.keys(), dtype=int) >>> print(df) ligand LIG1.G protein ILE59 ILE55 TYR93 interaction Hydrophobic HBAcceptor Hydrophobic Hydrophobic PiStacking Frame 0 0 1 0 0 0 ...
Changed in version 0.3.2: Moved the
return_atoms
parameter from therun
methods to the dataframe conversion codeChanged in version 2.0.0: Removed the
return_atoms
parameter. Added thecount
parameter. Removed support forifp
containingnp.ndarray
bitvectors.
Protein helper functions — prolif.io.protein_helper
#
This module provides helper functions for working with protein structures (but not limited to protein structures).
Yu-Yuan (Stuart) Yang, 2025
- class prolif.io.protein_helper.ProteinHelper(templates: list[dict] | dict | None = None)[source]#
ProteinHelper
is a class to standardize the residue names and fix the bond orders when reading the non-standard residues with RDKit for a molecule.New in version 2.1.0.
- Parameters:
templates (list[dict] or dict or None, optional) – The templates to use for standardizing the protein molecule. If None, the standard amino acid template is used. If a dict is provided, it should contain the templates for residues. If a list is provided, it should contain dictionaries for each template. Default is None.
Note
This class is designed to work with ProLIF Molecule instances or PDB files. It reads the input topology, standardizes the residue names, and fixes the bond orders based on the provided templates or standard amino acid template.
Example
>>> import prolif as plf >>> from prolif.io import ProteinHelper >>> protein_helper = ProteinHelper(templates=[{"ALA": {"SMILES": "CC(C(=O)O)N"}}]) >>> mol = protein_helper.standardize_protein(input_topology="path/to/protein.pdb") >>> plf.display_residues(mol)
- static check_resnames(resnames_to_check: set[str], templates: list[dict] | None = None) None [source]#
Check if the residue names are standard or in templates and raise a warning if not.
- Parameters:
- Raises:
ValueError – If any residue name is not standard or not in the templates.
- static convert_to_standard_resname(resname: str, forcefield_name: str = 'unknown') str [source]#
Convert a residue name to its standard form based on the forcefield.
- Parameters:
- Returns:
The standard residue name.
- Return type:
Note
This conversion is designed to distinguish residues with different possible H-bond donors at side chains, instead of the actual protonated states of residues.
For example, neutral and protonated arginine are both assigned to ARG, while neutral and deprotonated arginine in GROMOS force field are assigned to ARGN and ARG, respectively.
- static fix_molecule_bond_orders(residue: Residue, templates: list[dict] | None = None) Residue [source]#
Fix the bond orders of a molecule.
- Parameters:
residue (Residue) – The residue to fix the bond orders for.
templates (list[dict] or None, optional) –
The templates to use for fixing the bond orders. If None, the standard amino acid template is used. Default is None. If the residue is not a standard amino acid, the user should provide a custom template.
Also, note that the order of templates is relevant, as the first template that matches the residue name will be used.
Note
1. If the user provides a SMILES template, it will be converted to an RDKit molecule, and the bond orders will be assigned from the template.
2. SMILES templates are prioritized over CIF templates.
3. For CIF templates, any bonds and chiral designation on the input molecule will be removed at the start of the process. This function is adapted from the pdbinf/_pdbinf.py module’s assign_pdb_bonds function, which is used to assign bonds and aromaticity based on the standard amino acid templates.
Source: OpenFreeEnergy/pdbinf
- static forcefield_guesser(conv_resnames: set[str]) str [source]#
Guess the forcefield based on the residue names.
- static n_residue_heavy_atoms(residue: Residue) int [source]#
Count the number of heavy atoms in a residue.
- static n_template_residue_heavy_atoms(templates: list[dict] | None = None) dict[str, int] [source]#
Count the number of heavy atoms in a residue based on the templates.
- standardize_protein(input_topology: prolif.molecule.Molecule | str | pathlib.Path) Molecule [source]#
Standardize the protein molecule.
This function will standardize the residue names, fix the bond orders, and check the residue names against the templates.
- Parameters:
input_topology (Molecule | str | Path) – The input topology to standardize. It can be a ProLIF Molecule or a path to a PDB file.
- Returns:
The standardized protein molecule.
- Return type:
Example
>>> from prolif.io import ProteinHelper >>> protein_helper = ProteinHelper( templates=[{"ALA": {"SMILES": "CC(C(=O)O)N"}}] ) >>> mol = protein_helper.standardize_protein( "path/to/protein.pdb" )
Important
If your input for standardize_protein is a
prolif.Molecule
, it will modify your original molecule in place. Your residue names will be updated to the standardized names and residue’s bond orders will be fixed to the corresponding protonated states.
- prolif.io.protein_helper.assign_bond_orders_from_smiles(template_smiles: str, mol: prolif.residue.Residue | rdkit.Chem.rdchem.Mol) Mol [source]#
Assign bond orders from a SMILES template to a residue.
- prolif.io.protein_helper.assign_intra_props(mol: Mol, reference_block: dict) Mol [source]#
Assign bonds and aromaticity based on residue and atom names from a reference block (cif template molecule).
- Parameters:
mol (rdkit.Chem.Mol) – The input molecule, this is modified in place and returned.
reference_block (dict) – The cif template molecule from which to assign bonds.
- Returns:
The modified molecule with assigned bonds and aromaticity.
- Return type:
rdkit.Chem.Mol
Note
This function is adapted from the pdbinf/_pdbinf.py module’s assign_intra_props function, which is used to assign bonds and aromaticity based on the standard amino acid templates.
Source: OpenFreeEnergy/pdbinf
- prolif.io.protein_helper.strip_bonds(m: Mol) Mol [source]#
Strip all bonds and chiral tags from a molecule.
- Parameters:
m (rdkit.Chem.Mol) – The input molecule to strip bonds from.
- Returns:
The modified molecule with all bonds and chiral tags removed.
- Return type:
rdkit.Chem.Mol
Note
This function is adapted from the pdbinf/_pdbinf.py module’s strip_bonds function.
Source: OpenFreeEnergy/pdbinf
- prolif.io.cif.cif_parser_lite(cif_string: str) dict [source]#
Parses a CIF string and returns a dictionary of data blocks.
New in version 2.1.0.
- Parameters:
cif_string (str) – The CIF string to parse.
- prolif.io.cif.cif_template_reader(cif_filepath: pathlib.Path | str) dict [source]#
Reads a CIF file and returns a dictionary of data blocks.
New in version 2.1.0.