Docking ligands to P450 enzymes with PyRosetta

This protocol has been adpted from Rosetta Ligand Docking tutorial 1, Ligand Dock Application 2 and 08.01 PyRosetta notebook 3 and utilised during our study on docking various inhibitors to P450 orthologs. Comparative modelling for the twelve CYP51 orthologus sequences has been completed as described in comparative modelling with Modeller section.

The protocol has been utilised in our large scale CYP51 docking study:

O.O. Akapo, J.M. Macnar, J.D. Kryś, P.R. Syed, K. Syed and D. Gront, In silico structural modeling and analysis of interactions of Tremellomycetes cytochrome P450 monooxygenases CYP51s with substrates and azoles, submitted to Scientific Reports

The following Rosetta version was employed:

INFO:pyrosetta.rosetta:PyRosetta-4 2019
PyRosetta4.conda.linux.CentOS.python36.Release2019.44+release.5aed75f2e796a33ab71515b6c1daa321eb2294a2
2019-10-29T08:37:43 retrieved from: http://www.pyrosetta.org

Preparation of docking

Protein preparation

We used Rosetta clean-pdb.py script (distributed with the software package) to prepare a protein chain for further steps:

python2.7 $ROSETTA/tools/protein_tools/scripts/clean_pdb.py protein.pdb chain_letter

where chain_letter is a single-letter ID of the chain (typically ‘a’) and $ROSETTA shell variable provides the locations of your Rosetta package installation. As a result you will get a protein_chain_letter.pdb and protein_chain_letter.fasta files.

Cofactor preparation

To prepare HEM .params file you will need a HEM.sdf file that have to be prepared from a protein-HEM complex to have the correct coordinates and have added hydrogens. You can easily do that using for example PyMOL 5 or any other software you like.

python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
      -n HEM -p HEM --chain=B HEM.sdf

As a result, you will get a HEM_0001.pdb and HEM.params files in which HEM will be present in chain B.

Ligand preparation

Ligands subjected for docking also needs a .params file but this time the conformers will be required as well. To get them use bcl 8 following the protocole used in Meiler Lab. The input ligand.sdf file can be downloaded from PubChem 6, PDB 4 or prepared using PyMOL 5 or openbabel 7 from other type files. First, check if the molecule has reasonable bonds/geometries/clashing/etc. Ligands that pass are the matched and those that fail are the unmatched.

bcl.exe molecule:Filter -add_h -neutralize -defined_atom_types \
      -3d -input_filenames ligand.sdf \
      -output_matched ligand.CLEANED.sdf \
      -output_unmatched ligand.UNCLEANED.sdf -message_level Debug

bcl.exe molecule:ConformerGenerator -rotamer_library cod \
  -top_models 100 -ensemble_filenames ligand.CLEANED.sdf \
  -conformers_single_file ligand.CLEANED.conf.sdf \
  -conformation_comparer 'Dihedral(method=Max)' 30 -max_iterations 1000

Then .params file can be prepared as follow:

python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
      -n LIG -p LIG --chain=X --conformers-in-one-file ligand.CLEANED.conf.sdf

Output should contain LIG.pdb, LIG_conformers.pdb and LIG.params files. Check the .params if PDB_ROTAMERS line is present at the end with the correct conformers filename.

Putting protein, cofactor and ligand together

All those components must be in the same .pdb file. First, add HEM and minimize the side chians:

cat protein_chain_letter.pdb HEM_0001.pdb > protein_HEM.pdb
$ROSETTA/main/source/bin/ligand_rpkmin.linuxgccrelease \
    -database $ROSETTA/main/database \
      -in:file:extra_res_fa HEM.params \
      -ex1 -ex2 -no_optH false -flip_HNQ \
      -docking:ligand:old_estat \
      -docking:ligand:soft_rep \
      -nstruct 10 \
      -s protein_HEM.pdb

Then the structure with the lowest energy should be choosen for the next steps with the following command:

k=` awk '/^pose / {print FILENAME, $NF}' protein_HEM_*.pdb | sort -nk2 |head -1|awk '{print $1}'` && cp $k protein_HEM_input.pdb

which simply sorts file names by the respective scores and selects the file corresponding to the lowest energy value. The last step, which adds the ligand .pdb at the end of the input file, is done by:

cat protein_HEM_input.pdb LIG.pdb > protein_HEM_LIG.pdb

Docking Procedures

To perform docking the following Python script have to be run:

"""
This script was prepared on the canva of
https://github.com/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/08.01-Ligand-Docking-XMLObjects.ipynb

Before you use this script remove "outputs" folder from your curent directory!
"""


# Setup

import sys
import logging
logging.basicConfig(level=logging.INFO)
import matplotlib
import pandas as pd
import os
import pyrosetta
from pyrosetta import *
from pyrosetta.rosetta.protocols.rosetta_scripts import *
import pyrosetta.distributed.viewer as viewer
from pyrosetta.rosetta.protocols.ligand_docking import *


# Declare path to input files

pdb_filename = sys.argv[1]      #"protein_HEM_LIG.pdb"
ligand_params = sys.argv[2]     # "LIG.params"
ligand_conformers = sys.argv[3] # "LIG_conformers.pdb"
ligand_filename = sys.argv[4]   # "LIG.pdb"
HEM_params = "HEM.params"       # assuming that it is present in the current directory

# Flags normaly used in options.short

flags = f"""
-extra_res_fa {HEM_params} {ligand_params}
-s '{pdb_filename} {ligand_filename}'
-mute all
-ex1
-ex2
-no_optH false
-flip_HNQ true
-ignore_ligand_chi true
-overwrite
"""
pyrosetta.distributed.init(flags)

# Uncomment if you would like to use options.short as input parameter file
#pyrosetta.init('-no_fconfig @options2.short')


# Create a pose object - we need it for further docking
pose = pyrosetta.io.pose_from_file(filename=pdb_filename)


# xml file
# information about coordinates of a center of box <Coordinates x="2000.0" y="2000.0" z="2000.0" pdb_tag=""/>
# Complex filename  <PDB filename="./4RM4_HEM_OLA.pdb"/>
# Ligand to SlideTogether (I'm not sure if we still need it) <SlideTogether name="OLA" chains="X"/>

xml_ = """
<ROSETTASCRIPTS>

    <SCOREFXNS>

        <ScoreFunction name="ligand_soft_rep" weights="ligand_soft_rep">
            <Reweight scoretype="fa_elec" weight="0.42"/>
            <Reweight scoretype="hbond_bb_sc" weight="1.3"/>
            <Reweight scoretype="hbond_sc" weight="1.3"/>
            <Reweight scoretype="rama" weight="0.2"/>
        </ScoreFunction>

        <ScoreFunction name="hard_rep" weights="ligand">
            <Reweight scoretype="fa_intra_rep" weight="0.004"/>
            <Reweight scoretype="fa_elec" weight="0.42"/>
            <Reweight scoretype="hbond_bb_sc" weight="1.3"/>
            <Reweight scoretype="hbond_sc" weight="1.3"/>
            <Reweight scoretype="rama" weight="0.2"/>
        </ScoreFunction>

    </SCOREFXNS>

    <LIGAND_AREAS>
        <LigandArea name="docking_sidechain" chain="X" cutoff="6.0"     add_nbr_radius="true" all_atom_mode="true" minimize_ligand="10"/>
        <LigandArea name="final_sidechain" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="true"/>

        <LigandArea name="final_backbone" chain="X" cutoff="7.0" add_nbr_radius="false" all_atom_mode="true" Calpha_restraints="0.3"/>
    </LIGAND_AREAS>

    <INTERFACE_BUILDERS>
        <InterfaceBuilder name="side_chain_for_docking" ligand_areas="docking_sidechain"/>
        <InterfaceBuilder name="side_chain_for_final" ligand_areas="final_sidechain"/>
        <InterfaceBuilder name="backbone" ligand_areas="final_backbone" extension_window="3"/>
    </INTERFACE_BUILDERS>

    <MOVEMAP_BUILDERS>
        <MoveMapBuilder name="docking" sc_interface="side_chain_for_docking" minimize_water="true"/>
        <MoveMapBuilder name="final" sc_interface="side_chain_for_final" bb_interface="backbone" minimize_water="true"/>
    </MOVEMAP_BUILDERS>

    <SCORINGGRIDS ligand_chain="X" width="30.0">
        <ClassicGrid grid_name="vdw" weight="1.0"/>
    </SCORINGGRIDS>

    <MOVERS>

        <StartFrom name="start_from" chain="X">
            <Coordinates x="27.739"  y="11.822"  z="18.856" pdb_tag=""/>
            <PDB filename="{}"/>
        </StartFrom>

        <Transform name="transform" chain="X" box_size="10.0" move_distance="0.1" angle="5.0" cycles="500" repeats="1" temperature="5" initial_perturb="3.0" />

        <HighResDocker name="high_res_docker" cycles="6" repack_every_Nth="3" scorefxn="ligand_soft_rep" movemap_builder="docking"/>

        <FinalMinimizer name="final" scorefxn="hard_rep" movemap_builder="final"/>

        <InterfaceScoreCalculator name="add_scores" chains="X" scorefxn="hard_rep" compute_grid_scores="0"/>

        <ParsedProtocol name="low_res_dock">
            <Add mover_name="start_from"/>
            <Add mover_name="transform"/>
        </ParsedProtocol>

        <ParsedProtocol name="high_res_dock">
            <Add mover_name="high_res_docker"/>
            <Add mover_name="final"/>
        </ParsedProtocol>

        <ParsedProtocol name="reporting">
            <Add mover_name="add_scores"/>
        </ParsedProtocol>

    </MOVERS>

    <PROTOCOLS>
        <Add mover_name="low_res_dock"/>
        <Add mover_name="high_res_dock"/>
        <Add mover_name="reporting"/>
    </PROTOCOLS>
</ROSETTASCRIPTS>

""".format(pdb_filename)
#print(xml_)

xml = pyrosetta.rosetta.protocols.rosetta_scripts.XmlObjects.create_from_string(xml_).get_mover("ParsedProtocol")
# Produce 5 local ligand docking trajectories in "outputs" directory, using `PyJobDistributor`:

scorefxn = pyrosetta.create_score_function("ligand_soft_rep")
if not os.getenv("DEBUG"):
    working_dir = os.getcwd()
    output_dir = "output"
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)
    os.chdir(output_dir)

    jd = pyrosetta.toolbox.py_jobdistributor.PyJobDistributor(pdb_name=pdb_filename, nstruct=5, scorefxn=scorefxn) # Output name, number of structures, scorefunction used
    jd.native_pose = pose
    #df = pd.DataFrame()
    while not jd.job_complete:
        test_pose = pose.clone()
        xml.apply(test_pose)
        #test_df = pd.DataFrame.from_records(dict(test_pose.scores), index=[jd.current_name])
        #df = df.append(test_df)
        jd.output_decoy(test_pose)
    os.chdir(working_dir)

Results clustering using hierarchical agglomerative algorithm

Docking calculations are followed by results analysis, which starts from hierarchical clustering of the obtained poses. This step has been done according to this protocol

References

1

https://www.rosettacommons.org/demos/latest/tutorials/ligand_docking/ligand_docking_tutorial/

2

https://www.rosettacommons.org/manuals/archive/rosetta3.1_user_guide/app_ligand_docking.html

3

https://github.com/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/08.01-Ligand-Docking-XMLObjects.ipynb

4

http://www.rcsb.org/

5(1,2)

https://pymol.org/2/

6

https://pubchem.ncbi.nlm.nih.gov/

7

https://openbabel.org/docs/dev/Command-line_tools/babel.html

8

http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1