Implicit hydrogen bond interaction#

This tutorial showcases how to calculate hydrogen bond (H-bond) interations for a topology with only heavy atoms (implicit hydrogens). Also, the difference between the implicit and explicit methods are demonstrated.

Before getting started to the implicit H-bond calculation, we need to make sure the bond orders of molecules are corrected. We introduce a protein helper function (ProteinHelper) to do all in once.

Protein helper function#

The protein helper function is designed to fix the bond orders in your topology file based on a given molecule template. The algorithms of fixing the bond orders are inspired by pdbinf and RDKit’s function (rdkit.Chem.AllChem.AssignBondOrdersFromTemplate).

A small example: ACE-TPO-NME#

This example is originally come from pdbinf. This example shows a molecule with three residues: ACE, TPO, NME. You can find the correct bond orders at RCSB PDB.

from rdkit import Chem
from rdkit.Chem.rdDetermineBonds import DetermineConnectivity

from prolif import Molecule
from prolif.datafiles import datapath

protein_path = datapath / "protein_helper/tpo.pdb"

# Read the data with RDKit
mol = Chem.MolFromPDBFile(str(protein_path), removeHs=False)
DetermineConnectivity(mol, useHueckel=True)
for atm in mol.GetAtoms():
    atm.SetNoImplicit(False)  # set no implicit to False

# Convert to ProLIF Molecule
mol = Molecule.from_rdkit(mol)
print([res for res in mol.residues])
mol
/home/docs/checkouts/readthedocs.org/user_builds/prolif/envs/302/lib/python3.11/site-packages/MDAnalysis/topology/TPRParser.py:161: DeprecationWarning: 'xdrlib' is deprecated and slated for removal in Python 3.13
  import xdrlib
[ResidueId(ACE, 1, ), ResidueId(TPO, 2, ), ResidueId(NME, 3, )]
../_images/73b8c1186bd9f7fa77be62ce2deaf46a5f4abbde770a5831d71c6e9beeb806f2.png
import prolif as plf

plf.display_residues(mol)
../_images/2c1224fc55066f48eb5249775eb9912b1b59cdad0311faec1a27dfda5b1e0c2a.svg

Considering these three residues are not standard protein residues (20 amino acids), you will need to provide a template for the additional residues. We provide two ways to load your templates into your protein helper function.

  1. CIF template: you can look up your residue (ligand) at RCSB PDB’s ligand page and download it by clicking Download Files > Definition (CIF format).

  2. SMILES string template: simply provide a SMILES string for your residue or ligand.

from prolif.datafiles import datapath
from prolif.io.cif import cif_template_reader
from prolif.io.protein_helper import ProteinHelper

# load templates
ace_template = {"ACE": {"SMILES": "CC=O"}}
tpo_template = {"TPO": {"SMILES": "C[C@H]([C@@H](C(=O))N)OP(=O)(O)O"}}
# tpo_template = cif_template_reader(
#     datapath / "protein_helper/templates/TPO.cif"
# )  # alternative
nme_template = cif_template_reader(datapath / "protein_helper/templates/NME.cif")

# fix molecule by the templates
protein_helper = ProteinHelper(templates=[ace_template, tpo_template, nme_template])

Important

If you would like to use SMILES templates, you have to first indicate your residue name, and then define your SMILES string in a dictionary with key SMILES. For instance:

{
    "[residue name]": {
        "SMILES": "[your smiles string]"
    },
    ...
}

Tip

If using SMILES templates, you might need to modify your SMILES string by pruning one oxygen in an acetate group (especially, if your residue is not in the terminus). For instance in the above example, original TPO residue’s SMILES is C[C@H]([C@@H](C(=O)O)N)OP(=O)(O)O (if you look at RCSB TPO page), but your template should be C[C@H]([C@@H](C(=O))N)OP(=O)(O)O. It is because prolif splits your molecule into residues at the amide bonds before calculating molecular interactions. The ProteinHelper is correcting your residues, so you actually have to provide templates without -OH in the acetate group forming amide bond.

Let’s fix the residues.

# fix molecule by the templates
fixed_mol = protein_helper.standardize_protein(mol)

plf.display_residues(mol)
../_images/c862350878f4bef7b9276647f87ea502ab0a216b3adaf9045fbf6c4d734cdd0f.svg

Now, it works.

After learning how to use protein helper function, we can now learn to calculate the implicit H-bond interactions.

Implicit H-bond interactions#

First, we read the receptor (protein) and fix the bond orders with ProteinHelper.

# read the receptor
from rdkit import Chem

from prolif.datafiles import datapath
from prolif.io.protein_helper import ProteinHelper

data_path = datapath / "implicitHbond"
protein_path_i = data_path / "receptor.pdb"
protein_mol_i = Molecule.from_rdkit(Chem.MolFromPDBFile((str(protein_path_i))))

protein_helper = ProteinHelper()

# fix the receptor
protein_mol_i = protein_helper.standardize_protein(protein_mol_i)

plf.display_residues(protein_mol_i, slice(100, 108))
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue MET1.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS2.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue VAL5.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue ARG22.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS27.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU30.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS34.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS55.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU58.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/home/docs/checkouts/readthedocs.org/user_builds/prolif/checkouts/302/prolif/io/protein_helper.py:161: UserWarning: Could not guess the forcefield based on the residue names. CYS is assigned to neutral CYS (charge = 0).
  standardized_resname = self.convert_to_standard_resname(
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU122.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU124.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS125.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS126.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue THR143.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU146.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS147.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU155.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue TYR167.A has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS2.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue VAL5.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue SER18.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue LYS55.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue ASN56.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU58.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU119.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
/tmp/ipykernel_1277/216093026.py:14: UserWarning: Residue GLU146.B has a different number of heavy atoms than the standard residue. This may affect H-bond detection.
  protein_mol_i = protein_helper.standardize_protein(protein_mol_i)
../_images/764e53b80f04ad65f6c09cb2ccf4991e642a74964fec7f850bb91bc3632bcdbb.svg

Don’t worry if there is a lot of warning message generated. It is because your topology file is not complete for some residues (e.g., only have backbone and miss some heavy atoms at the side chain). If the residues are not close to your ligand, it should be fine.

Note

The protein helper function automatically determines the appropriate forcefield based on the residue name. This is important because, in certain forcefields, specific residue names correspond to particular protonation states of the residue. The function ensures that your residue is adjusted to match the correct protonation state associated with the selected forcefield.

In the above example, the protein helper function fails to guess the forcefield, and thus it assigns the residues’ protonated states based on default settings (basically, pH = 7 and CYS is neutralized).

Tip

You can modify the residue name for your residue to get different protonated state of the residue when using protein helper function. This approach is particularly useful when investigating different protonation states of residues.

Next, we read the ligand. Because the file is in .sdf format, we used prolif.sdf_supplier to read it.

# Read the ligand
ligand_i = plf.sdf_supplier(data_path / "1.D.sdf")[0]
plf.display_residues(ligand_i, size=(400, 200))
../_images/e99fbe4094beadce5a33e64c166d82c70c77cb2edacf2c28b3c76820dcdfadb2.svg

We then used prolif.Fingerprint to find the implicit hydrogen bonds by setting interactions= ["ImplicitHBAcceptor", "ImplicitHBDonor"].

Then, we used run_from_iterable to calculate the implicit H-bond interactions.

fp_i = plf.Fingerprint(
    interactions=["ImplicitHBAcceptor", "ImplicitHBDonor"],
    count=True,
)
fp_i.run_from_iterable([ligand_i], protein_mol_i)
<prolif.fingerprint.Fingerprint: 2 interactions: ['ImplicitHBAcceptor', 'ImplicitHBDonor'] at 0x7165ba67d690>

Note

Some parameters for the implicit H-bond method can be set.

For instance, if you want to include waters for detecting interactions, use "include_water": True.

Geometric checks play an important role in the implicit hydrogen bond method by filtering out impossible interaction pairs. You can make the method more flexible by adjusting the tolerance thresholds for acceptor or donor atom angles and plane angles (plane angles only for sp2-hybridized heavy atoms). This allows you to include more or fewer interactions based on your needs. Increasing the tolerance reduces the geometric restrictions. Our default settings are fine-tuned by a dataset from RCSB PDB. If you don’t want any geometry check, you can simply apply "ignore_geometry_checks": True.

Here is an example to adjust the parameters:

fp_i = plf.Fingerprint(
    interactions=["ImplicitHBAcceptor", "ImplicitHBDonor"],
    count=True,
    parameters={
        "ImplicitHBAcceptor": {
            "include_water": True,  # include water residues in the detection
            "tolerance_dev_daa": 30,  # atom angle deviation tolerance for donor
            "tolerance_dev_dpa": 30,  # plane angle deviation tolerance for donor
            "ignore_geometry_checks": True,  # to skip geometry checks
        },
        "ImplicitHBDonor": {
            "include_water": True,  # include water residues in the detection
            "tolerance_dev_daa": 30,  # atom angle deviation tolerance for donor
            "tolerance_dev_dpa": 30,  # plane angle deviation tolerance for donor
            "ignore_geometry_checks": False,  # not to skip geometry checks
        },
    },

)

Similar to the previous tutorials, we converted the fingerprint into a table.

df_i = fp_i.to_dataframe().T
df_i
Frame 0
ligand protein interaction
UNL1 SER14.A ImplicitHBAcceptor 1
ImplicitHBDonor 1
TYR17.A ImplicitHBAcceptor 1
ImplicitHBDonor 1
ASP95.A ImplicitHBDonor 1
GLU101.A ImplicitHBDonor 2
TYR167.B ImplicitHBDonor 1

Metadata#

If you would like to have more details (metadata), use .ifp with the key ("[ligand's resname][resid].[chainid]", "[protine's resname][resid].[chainid]"). Mostly, metadata are related to the geometric checks, including raw atom and plane angles and their deviations. We also included vina_hbond_potential to assess the likelihood of an interaction occurring. Higher values of vina_hbond_potential suggest a stronger probability of the interaction.

fp_i.ifp[0][("UNL1", "TYR167.B")]
{'ImplicitHBDonor': ({'indices': {'ligand': (11,), 'protein': (3,)},
   'parent_indices': {'ligand': (11,), 'protein': (2508,)},
   'distance': 2.7428045865500525,
   'ideal_acceptor_angle': 120.0,
   'acceptor_atom_angles': [119.56582217170318],
   'acceptor_atom_angle_deviation': 0.43417782829682494,
   'acceptor_plane_angle': 14.191233519808888,
   'ideal_donor_angle': 120.0,
   'donor_atom_angles': [140.58040843212183],
   'donor_atom_angle_deviation': 20.58040843212183,
   'donor_plane_angle': 1.9892680854814626,
   'vina_hbond_potential': 0.9921165792423713},)}

Tip

If you have underscore (_) in your chain ID, you might have KeyError. Try to use ResidueId rather than string format as the dictionary key.

>>> fp_i.ifp[0][("UNL1", "HOH1._")] # Not working
KeyError: (ResidueId(UNL, 1, None), ResidueId(HOH, 1, None))

# try this
>>> from prolif.residue import ResidueId

>>> fp_i.ifp[0][(ResidueId("UNL", 1, None), ResidueId("HOH", 1, "_"))]
{'ImplicitHBAcceptor': ... }

Visualization#

For 2D ligand-centered plot, use plot_lignetwork with your ligand.

view = fp_i.plot_lignetwork(ligand_i, kind="frame", frame=0, display_all=True)
view

For a protein-ligand complex, plot_3d is a better way to explore the interactions. It is noted that you might need to set the water molecules into "sphere" to visualize it.

view = fp_i.plot_3d(
    ligand_i, protein_mol_i, frame=0, display_all=True, remove_hydrogens=False
)
view.setStyle(
    {
        "resn": "HOH",
    },
    {"sphere": {"radius": 0.5, "color": "red"}},
)
view

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<prolif.plotting.complex3d.Complex3D at 0x7165ba6096d0>

Explicit H-bond interaction method (standard method)#

As a reference, we also calculated the explicit H-bond interactions for the same system.

import MDAnalysis as mda

from prolif import Molecule
from prolif.datafiles import datapath

# Read the protonated protein file
data_path = datapath / "implicitHbond"
protein_path = data_path / "receptor_ph7_amber2.pdb"
u = mda.Universe(str(protein_path))
protein_mol = Molecule.from_mda(u)

plf.display_residues(protein_mol, slice(100, 108))
../_images/cf0212e01101f4ca5001007e71088fec7d3e035ff47c0e2e178f7c722981906f.svg
# Read the protonated ligand file
ligand = plf.sdf_supplier(str(data_path / "1.D_protonated.sdf"))[0]
plf.display_residues(ligand, size=(400, 200))
../_images/e99fbe4094beadce5a33e64c166d82c70c77cb2edacf2c28b3c76820dcdfadb2.svg
# Calculate default H-bond interactions
fp_count = plf.Fingerprint(["HBDonor", "HBAcceptor"], count=True)
fp_count.run_from_iterable([ligand], protein_mol)
<prolif.fingerprint.Fingerprint: 2 interactions: ['HBDonor', 'HBAcceptor'] at 0x7165ba61fb90>
# Get the fingerprint DataFrame
df = fp_count.to_dataframe().T
df
Frame 0
ligand protein interaction
UNL1 SER14.A HBAcceptor 1
TYR17.A HBDonor 1
TYR167.B HBDonor 1

2D/3D Visualization#

view = fp_count.plot_lignetwork(ligand, kind="frame", frame=0, display_all=True)
view
view = fp_count.plot_3d(
    ligand, protein_mol, frame=0, display_all=False, remove_hydrogens=False
)
view

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<prolif.plotting.complex3d.Complex3D at 0x7165b3f57b50>

Comparison between explicit and implicit H-bond interaction methods#

Finally, let’s compare the explicit and implicit hydrogen bond interaction methods. As shown in the visualization below, the implicit method tends to detect more interactions. This is because the explicit hydrogen bond method relies on how hydrogens are added to the ligand and protein structures. For some rotatable hydrogen donors, you can imagine these donors as being flexible during molecular dynamics simulations. The explicit method typically captures a single conformational state of hydrogen bond interactions, whereas the implicit method can be seen as integrating results from multiple conformational states. This is the fundamental difference between the two methods, and neither is inherently superior. The choice depends on the type of information you aim to extract from your data.

from prolif.plotting.complex3d import Complex3D

# create Complex3D objects (explicit, left)
comp3D = Complex3D.from_fingerprint(fp_count, ligand, protein_mol, frame=0)
# (implicit, right)
other_comp3D = Complex3D.from_fingerprint(fp_i, ligand_i, protein_mol_i, frame=0)

# compare the two Complex3D objects
view = comp3D.compare(other_comp3D, display_all=True)
view

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<prolif.plotting.complex3d.Complex3D at 0x7165b3f606d0>