Advanced usage#
Prerequisite#
We’ll setup the protein and ligand files for the different usecases here:
import MDAnalysis as mda
import prolif as plf
# load protein
protein_file = str(plf.datafiles.datapath / "vina" / "rec.pdb")
u = mda.Universe(protein_file)
protein_mol = plf.Molecule.from_mda(u)
# load docking poses
poses_path = str(plf.datafiles.datapath / "vina" / "vina_output.sdf")
pose_iterable = plf.sdf_supplier(poses_path)
# load 1 ligand from poses
ligand_mol = pose_iterable[0]
/home/docs/checkouts/readthedocs.org/user_builds/prolif/envs/302/lib/python3.11/site-packages/MDAnalysis/topology/TPRParser.py:161: DeprecationWarning: 'xdrlib' is deprecated and slated for removal in Python 3.13
import xdrlib
Interactions#
This section shows how to modify existing interactions or defined new ones.
Modifying interaction parameters#
In ProLIF, each interaction is defined as a Python class. You can get a list of all the available interactions (including base abstract classes) with the following code snippet:
plf.Fingerprint.list_available(show_hidden=True)
['BasePiStacking',
'Distance',
'DoubleAngle',
'SingleAngle',
'Anionic',
'CationPi',
'Cationic',
'EdgeToFace',
'FaceToFace',
'HBAcceptor',
'HBDonor',
'Hydrophobic',
'ImplicitHBAcceptor',
'ImplicitHBDonor',
'MetalAcceptor',
'MetalDonor',
'PiCation',
'PiStacking',
'VdWContact',
'XBAcceptor',
'XBDonor']
When you create an interaction fingerprint with prolif.Fingerprint
, those classes are
created with default parameters and attached to the new fingerprint object as methods:
fp = plf.Fingerprint(["Hydrophobic", "HBDonor", "HBAcceptor"])
fp.hydrophobic
<prolif.Hydrophobic at 0x71045e4c6c90>
These methods will yield all occurences for that specific interaction between 2 residues (1 for the ligand and 1 for the protein).
Note
To know which parameters are available for interactions, see the
interactions
module.
Example 1: modifying distance
in Hydrophobic
#
In this example, we’ll reparametrize the hydrophobic interaction with a shorter distance and see how this affects the number of occurences of an interaction for a given pair of residues.
Let’s start with a test case. With the default parameters in Hydrophobic
class, TYR109 is interacting with our ligand in 4 different occasions:
fp = plf.Fingerprint()
# calculate hydrophobic interactions betweem first ligand pose and TYR109
hydrophobic_interactions_tyr109 = list(
fp.hydrophobic(ligand_mol[0], protein_mol["TYR109.A"])
)
# print the number of hydrophobic interactions
len(hydrophobic_interactions_tyr109)
0
Now, we’ll simply change the distance threshold to 4.0
instead of the default 4.5
. To do this, we simply provide the parameters
argument with a dictionary mapping the name of the interaction to reconfigure with updated parameters.
This modification will affect all the interaction analysis code run by the fingerprint object, i.e. when using run
or run_from_iterable
.
fp = plf.Fingerprint(parameters={"Hydrophobic": {"distance": 4.0}})
# calculate hydrophobic interactions betweem first ligand pose and TYR109
hydrophobic_interactions_tyr109 = list(
fp.hydrophobic(ligand_mol[0], protein_mol["TYR109.A"])
)
# print the number of hydrophobic interactions
len(hydrophobic_interactions_tyr109)
0
As you can see, by reducing the distance threshold, the hydrophobic interaction between our ligand and TYR109 is now ignored.
Example 2: modifying preset
in VdWContact
#
Here, we show another example to modify the preset
parameter within the VdWContact
class to use a different set of van der Waals radii definitions. This example is particularly useful when your topology includes transition metals or other less common elements (such as cobalt in vitamin B12) that are not covered by the default preset (mdanalysis
).
Let’s start with a test case. First, we check the default preset of the vdwradii
.
fp = plf.Fingerprint(["VdWContact"])
default_vdwradii = fp.vdwcontact.vdwradii
# show the first 5 vdwradii of the dictionary
print(dict(list(default_vdwradii.items())[:5]))
# show the number of elements in the preset
default_vdw_elements = set(default_vdwradii.keys())
print(len(default_vdw_elements))
{'H': 1.1, 'He': 1.4, 'Li': 1.82, 'Be': 1.53, 'B': 1.92}
54
Next, we create a new fingerprint object with the preset
of rdkit
and check its vdwradii
.
fp_modified = plf.Fingerprint(
["VdWContact"], parameters={"VdWContact": {"preset": "rdkit"}}
)
rdkit_vdwradii = fp_modified.vdwcontact.vdwradii
# show the first 5 vdwradii of the dictionary
print(dict(list(rdkit_vdwradii.items())[:5]))
# show the number of elements in the rdkit preset
rdkit_vdw_elements = set(rdkit_vdwradii.keys())
print(len(rdkit_vdw_elements))
{'H': 1.2, 'He': 1.4, 'Li': 2.2, 'Be': 1.9, 'B': 1.8}
118
As you can see, the values and the number of the vdwradii
in two presets are different. The default preset (mdanalysis
) has merely 54 elements whereas the rdkit
preset defines the van der Waals radii for 118 elements.
Furthermore, we can look at the defined elements in two presets.
# show the elements that are in the default preset but not in the rdkit preset
print(sorted(default_vdw_elements - rdkit_vdw_elements))
['Aa', 'Hh', 'Rr']
# show the elements that are in the rdkit preset but not in the default preset
print(sorted(rdkit_vdw_elements - default_vdw_elements))
['Ac', 'Am', 'As', 'Bh', 'Bk', 'Ce', 'Cf', 'Cm', 'Cn', 'Co', 'Cr', 'Db', 'Ds', 'Dy', 'Er', 'Es', 'Eu', 'Fe', 'Fl', 'Fm', 'Gd', 'Hf', 'Hg', 'Ho', 'Hs', 'Ir', 'La', 'Lr', 'Lu', 'Lv', 'Mc', 'Md', 'Mn', 'Mo', 'Mt', 'Nb', 'Nd', 'Nh', 'No', 'Np', 'Og', 'Os', 'Pa', 'Pm', 'Pr', 'Pu', 'Rb', 'Re', 'Rf', 'Rg', 'Rh', 'Ru', 'Sc', 'Sg', 'Sm', 'Ta', 'Tb', 'Tc', 'Th', 'Ti', 'Tm', 'Ts', 'V', 'W', 'Y', 'Yb', 'Zr']
From this example, you can see most of the transition metals are not in the default preset, so you might consider to use alternative presets for your calculation.
Note
To know which alternative presets are available now, see the preset
parameter in
VdWContact
class. You can also manually update the vdwradii
certain atoms by the following example:
fp = plf.Fingerprint(
["VdWContact"],
parameters={
"VdWContact": {
"vdwradii": {"C": 1.65, "H": 1.13}
}})
>>> fp.vdwcontact.vdwradii # {'H': 1.13, ..., 'C': 1.65, ...}
However, it is recommended to select an preset rather than to define the vdwradii
manually or to mix different presets (see issue #222 (comment)).
Finally, let’s run the fingerprint by using different presets and compare the results.
# run fingerprint
fp.run_from_iterable(pose_iterable, protein_mol)
fp_modified.run_from_iterable(pose_iterable, protein_mol)
<prolif.fingerprint.Fingerprint: 1 interactions: ['VdWContact'] at 0x71045e3b8e90>
By comparing two results in the same view, you can see how the values of the preset affect the calculation of van der Waals contacts.
from prolif.plotting.complex3d import Complex3D
pose_index = 4 # change this to see different poses
# create Complex3D objects (default)
comp3D = Complex3D.from_fingerprint(
fp, pose_iterable[pose_index], protein_mol, frame=pose_index
)
# (modified)
other_comp3D = Complex3D.from_fingerprint(
fp_modified, pose_iterable[pose_index], protein_mol, frame=pose_index
)
# compare the two Complex3D objects
view = comp3D.compare(other_comp3D)
view
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
<prolif.plotting.complex3d.Complex3D at 0x71045e628f90>
Repurposing an existing interaction#
In case you want to reuse an existing class for a different type of interaction, the
easiest way is to define an interaction class that inherits one of the classes listed
in the interactions
module, and just update its __init__
method with the
appropriate parameters.
There are some generic interactions, like the Distance
class, if you just need to define two chemical moieties within a certain distance. Both
the Hydrophobic, Ionic, and Metallic interactions inherit from this class!
In most cases, defining an interaction only based on a distance is not enough and
requires one or two angles constraints as well. For this purpose, the
SingleAngle
and DoubleAngle
interactions can be used.
Here we’ll define a C-H...O
HBond interaction by reusing the existing HBond classes:
class CHOAcceptor(plf.interactions.HBAcceptor):
def __init__(
self,
acceptor="O",
donor="[#6]-[H]",
distance=3.5,
DHA_angle=(90, 180),
):
super().__init__(
acceptor=acceptor, donor=donor, distance=distance, DHA_angle=DHA_angle
)
# create inverse interaction as well
CHODonor = CHOAcceptor.invert_role(
"CHODonor",
"C-H...O Hbond interaction between a ligand (donor) and a residue (acceptor)",
)
# calculate both classical and weak hbonds
fp = plf.Fingerprint(["HBAcceptor", "CHOAcceptor", "HBDonor", "CHODonor"])
fp.run_from_iterable(pose_iterable, protein_mol)
# show dataframe
df = fp.to_dataframe()
df
ligand | UNL1 | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
protein | TYR38.A | TYR109.A | ASP123.A | LEU126.A | ASP129.A | PRO184.A | ARG188.A | ... | SER334.B | MET337.B | PHE346.B | HSE347.B | LEU348.B | PHE351.B | ASP352.B | THR355.B | TYR359.B | ||||
interaction | HBAcceptor | CHODonor | CHODonor | CHODonor | CHOAcceptor | CHODonor | HBDonor | CHODonor | CHOAcceptor | CHODonor | ... | CHODonor | CHOAcceptor | CHODonor | CHODonor | CHODonor | CHOAcceptor | CHODonor | CHOAcceptor | CHODonor | CHODonor |
Frame | |||||||||||||||||||||
0 | False | False | True | False | False | False | True | True | False | False | ... | False | False | False | False | False | False | True | False | False | False |
1 | False | False | False | False | False | True | False | True | False | False | ... | False | False | False | False | False | False | True | False | True | False |
2 | False | False | False | False | False | True | False | True | False | False | ... | False | False | False | False | False | False | True | False | True | False |
3 | False | False | False | False | False | False | False | True | False | False | ... | False | False | False | False | False | False | True | False | False | False |
4 | True | False | False | False | False | False | False | False | False | False | ... | False | True | True | False | True | True | False | False | False | False |
5 | False | False | False | False | False | False | False | True | False | False | ... | False | False | False | False | False | False | True | False | True | False |
6 | False | True | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | True | False | True |
7 | False | False | False | False | False | False | False | True | False | False | ... | True | False | False | True | True | False | False | False | False | False |
8 | False | False | False | True | True | True | False | True | True | True | ... | False | False | False | False | False | False | False | False | False | False |
9 rows × 29 columns
We can also display these new interactions:
from prolif.plotting.complex3d import Complex3D
# assign colors for the new interactions on the 3D plot
Complex3D.COLORS.update(
{
"CHODonor": "red",
"CHOAcceptor": "blue",
}
)
# show specific docking pose
pose_index = 4
fp.plot_3d(pose_iterable[pose_index], protein_mol, frame=pose_index)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
<prolif.plotting.complex3d.Complex3D at 0x71045e11ec50>
Defining a custom interaction#
Important
Before you dive into this section, make sure that there isn’t already an interaction that could just be repurposed to do what you want!
For this, the best is to check the interactions
module.
With that being said, there are a few rules that you must respect when writing your own interaction:
Inherit the ProLIF Interaction class#
The Interaction
class is the base class that provides
some functionalities to automatically register the interactions in Fingerprint
objects.
Naming convention#
For non-symmetrical interactions, like hydrogen bonds or salt-bridges, the convention used here is to name the class after the role of the ligand.
For example, the class HBDonor
detects if a ligand acts as a hydrogen bond donor, and
the class Cationic
detects if a ligand acts as a cation.
Define a detect
method#
This method takes exactly two positional arguments: a ligand Residue and a protein Residue (in this order).
Return value for the detect
method#
You must yield a dictionary containing some basic metadata about the interaction when it
is detected. To help with this process, the metadata
method should be used (see
example below) for which the arguments are listed here:
the input residues (
lig_res
andprot_res
arguments, of typerdkit.Chem.Mol
),the indices of atoms responsible for the interaction, (
lig_indices
andprot_indices
arguments, of typetuple[int, ...]
),any other relevant metric (distances or angles), named as you want. Distances should be in angstroms, and preferably named
distance
, and angles should be in degrees.
Note
You don’t have to return anything if no interaction is detected for a pair of residues.
Here’s an example implementing a close-contact interaction using numpy:
import numpy as np
from scipy.spatial import distance_matrix
class CloseContact(plf.interactions.Interaction):
def __init__(self, contact_threshold=2.0):
self.contact_threshold = contact_threshold
def detect(self, ligand_residue, protein_residue):
# distance matrix between atoms of both residues
dist_matrix = distance_matrix(ligand_residue.xyz, protein_residue.xyz)
below_threshold = dist_matrix < self.contact_threshold
for ligand_indices in np.argwhere(below_threshold.any(axis=1)):
ligand_index = int(ligand_indices[0])
for protein_indices in np.argwhere(below_threshold[ligand_index]):
protein_index = int(protein_indices[0])
# yield dict with metadata on the interaction
# required arguments: input residues, and tuple of indices of atoms
# responsible for the interaction
# optional arguments: any additional `key=value` pair (e.g. distance)
yield self.metadata(
lig_res=ligand_residue,
prot_res=protein_residue,
lig_indices=(ligand_index,),
prot_indices=(protein_index,),
distance=dist_matrix[ligand_index, protein_index],
)
# run analysis
fp = plf.Fingerprint(["CloseContact"])
fp.run_from_iterable([ligand_mol], protein_mol)
# show results
df = fp.to_dataframe()
df
ligand | UNL1 | |||||
---|---|---|---|---|---|---|
protein | VAL200.A | VAL201.A | THR209.A | THR213.A | PHE330.B | PHE351.B |
interaction | CloseContact | CloseContact | CloseContact | CloseContact | CloseContact | CloseContact |
Frame | ||||||
0 | True | True | True | True | True | True |
# assign colors for the new interactions on the 3D plot
Complex3D.COLORS.update(
{
"CloseContact": "brown",
}
)
# display
fp.plot_3d(ligand_mol, protein_mol, frame=0)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
<prolif.plotting.complex3d.Complex3D at 0x71045e0fca90>
Fingerprint generation#
This section contains information about modifying some aspects of the fingerprint generation.
Ignoring the protein backbone#
In some cases, you might want to dismiss backbone interactions. You cannot simply remove the backbone from your protein input file(s), as it will either result in charges added on the side-chains’ end, or you would need to add dummy atoms at the end, but these could also result in artifacts during the interaction detection.
One workaround is to use a substructure search (SMARTS) to delete the backbone atoms after the structures has been parsed by MDAnalysis.
from rdkit import Chem
from rdkit.Chem.AllChem import DeleteSubstructs
# SMARTS for backbone
backbone_smarts = "[C^2](=O)-[C;X4](-[H])-[N;+0]-[H]"
backbone_query = Chem.MolFromSmarts(backbone_smarts)
def make_mol_and_strip_backbone(atomgroup, **kwargs):
mol = atomgroup.convert_to.rdkit(**kwargs)
mol = DeleteSubstructs(mol, backbone_query)
return plf.Molecule(mol)
# patch the `from_mda` method with our modified version
plf.Molecule.from_mda = make_mol_and_strip_backbone
You can then prepare your system and run the analysis as you normally would.
Accessing results#
Once the fingerprint analysis has been run, there are multiple ways to access the data. The most convenient one showcased in the tutorials is through a pandas DataFrame, however this only shows the residues involved in each interaction.
fp.to_dataframe()
ligand | UNL1 | |||||
---|---|---|---|---|---|---|
protein | VAL200.A | VAL201.A | THR209.A | THR213.A | PHE330.B | PHE351.B |
interaction | CloseContact | CloseContact | CloseContact | CloseContact | CloseContact | CloseContact |
Frame | ||||||
0 | True | True | True | True | True | True |
The complete data is stored on the ifp
attribute of the fingerprint object as a dictionary indexed by residues:
frame_number = 0
ligand_residue = "UNL1"
protein_residue = "VAL200.A"
fp.ifp[frame_number][(ligand_residue, protein_residue)]
{'CloseContact': ({'indices': {'ligand': (51,), 'protein': (9,)},
'parent_indices': {'ligand': (51,), 'protein': (2619,)},
'distance': 1.9710418679398418},)}
To make it easier to work with this deeply nested data structure, the results can also be accessed in a flatter structure like so:
for interaction_data in fp.ifp[frame_number].interactions():
print(interaction_data)
break
InteractionData(ligand=ResidueId(UNL, 1, None), protein=ResidueId(THR, 209, A), interaction='CloseContact', metadata={'indices': {'ligand': (67,), 'protein': (9,)}, 'parent_indices': {'ligand': (67,), 'protein': (2767,)}, 'distance': 1.6563646792698756})