Helper functions#

Helper functions — prolif.utils#

prolif.utils.get_residues_near_ligand(lig: BaseRDKitMol, prot: BaseRDKitMol, cutoff: float = 6.0, use_segid: bool = False) list[prolif.residue.ResidueId][source]#

Detects residues close to a reference ligand

Parameters:
  • lig (prolif.molecule.Molecule or prolif.residue.Residue) – Select residues that are near this ligand/residue

  • prot (prolif.molecule.Molecule) – Protein containing the residues

  • cutoff (float) – If any interatomic distance between the ligand reference points and a residue is below or equal to this cutoff, the residue will be selected

  • use_segid (bool, default = False) – Use the segment number rather than the chain identifier as a chain.

Returns:

  • residues (list) – A list of unique ResidueId that are close to the ligand

  • .. versionchanged:: 2.1.0 – Added use_segid.

prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, selections: str, **kwargs: Any) AtomGroup[source]#
prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, *selections: str, **kwargs: Any) list[MDAnalysis.core.groups.AtomGroup]

Returns AtomGroup(s) that satisfy each distance-based selection over the entire trajectory rather than only the first frame. Only useful for distance-based selections. Use group {0} in any additional selection to refer to the AtomGroup generated by the first selection, or group {N-1} for the Nth selection.

prolif.utils.to_bitvectors(df: DataFrame) list[rdkit.DataStructs.cDataStructs.ExplicitBitVect][source]#

Converts an interaction DataFrame to a list of RDKit ExplicitBitVector

Parameters:

df (pandas.DataFrame) – A DataFrame where each column corresponds to an interaction between two residues

Returns:

bv – A list of ExplicitBitVect for each frame

Return type:

list

Example

>>> from rdkit.DataStructs import TanimotoSimilarity
>>> bv = prolif.to_bitvectors(df)
>>> TanimotoSimilarity(bv[0], bv[1])
0.42
prolif.utils.to_dataframe(ifp: IFPResults, interactions: Collection[str], count: bool = False, dtype: type | None = None, drop_empty: bool = True, index_col: str = 'Frame') DataFrame[source]#

Converts IFPs to a pandas DataFrame

Parameters:
  • ifp (dict) – A dict in the format {<frame number>: {(<residue_id>, <residue_id>): <interactions>}}. <interactions> is either a numpy.ndarray bitvector, or a tuple of dict in the format {<interaction name>: <metadata dict>}.

  • interactions (list) – A list of interactions, in the same order as used to detect the interactions.

  • count (bool) – Whether to output a count fingerprint or not.

  • dtype (object or None) – Cast the dataframe values to this type. If None, uses np.uint8 if count=True, else bool.

  • drop_empty (bool) – Drop columns with only empty values

  • index_col (str) – Name of the index column in the DataFrame

Returns:

df – A 3-levels DataFrame where each ligand residue, protein residue, and interaction type are in separate columns

Return type:

pandas.DataFrame

Example

>>> df = prolif.to_dataframe(results, fp.interactions.keys(), dtype=int)
>>> print(df)
ligand             LIG1.G
protein             ILE59                  ILE55       TYR93
interaction   Hydrophobic HBAcceptor Hydrophobic Hydrophobic PiStacking
Frame
0                       0          1           0           0          0
...

Changed in version 0.3.2: Moved the return_atoms parameter from the run methods to the dataframe conversion code

Changed in version 2.0.0: Removed the return_atoms parameter. Added the count parameter. Removed support for ifp containing np.ndarray bitvectors.

Protein helper functions — prolif.io.protein_helper#

This module provides helper functions for working with protein structures (but not limited to protein structures).

Yu-Yuan (Stuart) Yang, 2025

class prolif.io.protein_helper.ProteinHelper(templates: list[dict] | dict | None = None)[source]#

ProteinHelper is a class to standardize the residue names and fix the bond orders when reading the non-standard residues with RDKit for a molecule.

New in version 2.1.0.

Parameters:

templates (list[dict] or dict or None, optional) – The templates to use for standardizing the protein molecule. If None, the standard amino acid template is used. If a dict is provided, it should contain the templates for residues. If a list is provided, it should contain dictionaries for each template. Default is None.

templates#

The templates used for standardizing the protein molecule.

Type:

list[dict]

Note

This class is designed to work with ProLIF Molecule instances or PDB files. It reads the input topology, standardizes the residue names, and fixes the bond orders based on the provided templates or standard amino acid template.

Example

>>> import prolif as plf
>>> from prolif.io import ProteinHelper
>>> protein_helper = ProteinHelper(templates=[{"ALA": {"SMILES": "CC(C(=O)O)N"}}])
>>> mol = protein_helper.standardize_protein(input_topology="path/to/protein.pdb")
>>> plf.display_residues(mol)
static check_resnames(resnames_to_check: set[str], templates: list[dict] | None = None) None[source]#

Check if the residue names are standard or in templates and raise a warning if not.

Parameters:
  • resnames_to_check (set[str]) – Set of residue names to check.

  • templates (list[dict] or None, optional) – The templates to use for checking the residue names. If None, the standard amino acid template is used. Default is None.

Raises:

ValueError – If any residue name is not standard or not in the templates.

static convert_to_standard_resname(resname: str, forcefield_name: str = 'unknown') str[source]#

Convert a residue name to its standard form based on the forcefield.

Parameters:
  • resname (str) – The residue name to convert.

  • forcefield_name (str) – The name of the forcefield for assigning the correct standard name for CYS. Default is “unknown”.

Returns:

The standard residue name.

Return type:

str

Note

This conversion is designed to distinguish residues with different possible H-bond donors at side chains, instead of the actual protonated states of residues.

For example, neutral and protonated arginine are both assigned to ARG, while neutral and deprotonated arginine in GROMOS force field are assigned to ARGN and ARG, respectively.

static fix_molecule_bond_orders(residue: Residue, templates: list[dict] | None = None) Residue[source]#

Fix the bond orders of a molecule.

Parameters:
  • residue (Residue) – The residue to fix the bond orders for.

  • templates (list[dict] or None, optional) –

    The templates to use for fixing the bond orders. If None, the standard amino acid template is used. Default is None. If the residue is not a standard amino acid, the user should provide a custom template.

    Also, note that the order of templates is relevant, as the first template that matches the residue name will be used.

Note

1. If the user provides a SMILES template, it will be converted to an RDKit molecule, and the bond orders will be assigned from the template.

2. SMILES templates are prioritized over CIF templates.

3. For CIF templates, any bonds and chiral designation on the input molecule will be removed at the start of the process. This function is adapted from the pdbinf/_pdbinf.py module’s assign_pdb_bonds function, which is used to assign bonds and aromaticity based on the standard amino acid templates.

Source: OpenFreeEnergy/pdbinf

static forcefield_guesser(conv_resnames: set[str]) str[source]#

Guess the forcefield based on the residue names.

Parameters:

conv_resnames (set[str]) – Set of residue names in the protein molecule.

Returns:

The guessed forcefield name.

Return type:

str

static n_residue_heavy_atoms(residue: Residue) int[source]#

Count the number of heavy atoms in a residue.

Parameters:

residue (Residue) – The residue to count the heavy atoms in.

Returns:

The number of heavy atoms in the residue.

Return type:

int

static n_template_residue_heavy_atoms(templates: list[dict] | None = None) dict[str, int][source]#

Count the number of heavy atoms in a residue based on the templates.

Parameters:

templates (list or None, optional) – The templates to use for counting the heavy atoms. If None, the standard amino acid template is used. Default is None.

Returns:

The dictionary with residue names as keys and the number of heavy atoms as values.

Return type:

dict[str, int]

standardize_protein(input_topology: prolif.molecule.Molecule | str | pathlib.Path) Molecule[source]#

Standardize the protein molecule.

This function will standardize the residue names, fix the bond orders, and check the residue names against the templates.

Parameters:

input_topology (Molecule | str | Path) – The input topology to standardize. It can be a ProLIF Molecule or a path to a PDB file.

Returns:

The standardized protein molecule.

Return type:

Molecule

Example

>>> from prolif.io import ProteinHelper
>>> protein_helper = ProteinHelper(
        templates=[{"ALA": {"SMILES": "CC(C(=O)O)N"}}]
    )
>>> mol = protein_helper.standardize_protein(
       "path/to/protein.pdb"
    )

Important

If your input for standardize_protein is a prolif.Molecule, it will modify your original molecule in place. Your residue names will be updated to the standardized names and residue’s bond orders will be fixed to the corresponding protonated states.

prolif.io.protein_helper.assign_bond_orders_from_smiles(template_smiles: str, mol: prolif.residue.Residue | rdkit.Chem.rdchem.Mol) Mol[source]#

Assign bond orders from a SMILES template to a residue.

Parameters:
  • template_smiles (str) – The SMILES string of the template to assign bond orders from.

  • residue (Residue | Chem.Mol) – The residue or molecule to assign bond orders to.

Returns:

The residue or molecule with assigned bond orders from the template.

Return type:

Chem.Mol

prolif.io.protein_helper.assign_intra_props(mol: Mol, reference_block: dict) Mol[source]#

Assign bonds and aromaticity based on residue and atom names from a reference block (cif template molecule).

Parameters:
  • mol (rdkit.Chem.Mol) – The input molecule, this is modified in place and returned.

  • reference_block (dict) – The cif template molecule from which to assign bonds.

Returns:

The modified molecule with assigned bonds and aromaticity.

Return type:

rdkit.Chem.Mol

Note

This function is adapted from the pdbinf/_pdbinf.py module’s assign_intra_props function, which is used to assign bonds and aromaticity based on the standard amino acid templates.

Source: OpenFreeEnergy/pdbinf

prolif.io.protein_helper.strip_bonds(m: Mol) Mol[source]#

Strip all bonds and chiral tags from a molecule.

Parameters:

m (rdkit.Chem.Mol) – The input molecule to strip bonds from.

Returns:

The modified molecule with all bonds and chiral tags removed.

Return type:

rdkit.Chem.Mol

Note

This function is adapted from the pdbinf/_pdbinf.py module’s strip_bonds function.

Source: OpenFreeEnergy/pdbinf

prolif.io.cif.cif_parser_lite(cif_string: str) dict[source]#

Parses a CIF string and returns a dictionary of data blocks.

New in version 2.1.0.

Parameters:

cif_string (str) – The CIF string to parse.

prolif.io.cif.cif_template_reader(cif_filepath: pathlib.Path | str) dict[source]#

Reads a CIF file and returns a dictionary of data blocks.

New in version 2.1.0.

Parameters:

cif_filepath (str) – The path to the CIF file to read.

Returns:

A dictionary containing the parsed data blocks.

Return type:

dict