Docking ligands to P450 enzymes with RosettaScripts

This protocol has been adpted from Rosetta Ligand Docking tutorial 1 and Ligand Dock Application 2 and utilised during our study on docking various inhibitors to P450 orthologs. Comparative modelling for the twelve CYP51 orthologus sequences has been completed as described in comparative modelling with Modeller section.

The following Rosetta version was employed:

INFO:pyrosetta.rosetta:PyRosetta-4 2019
PyRosetta4.conda.linux.CentOS.python36.Release2019.44+release.5aed75f2e796a33ab71515b6c1daa321eb2294a2
2019-10-29T08:37:43 retrieved from: http://www.pyrosetta.org

Note

This is the version 2.0 of the docking protocol. The previous version, that has been used in the following paper:

O.O. Akapo, J.M. Macnar, J.D. Kryś, P.R. Syed, K. Syed and D. Gront, In silico structural modeling and analysis of interactions of Tremellomycetes cytochrome P450 monooxygenases CYP51s with substrates and azoles, submitted to IJMS

is available on this page

Preparation of docking

Protein preparation

We used Rosetta clean-pdb.py script (distributed with the software package) to prepare a protein chain for further steps:

python2.7 $ROSETTA/tools/protein_tools/scripts/clean_pdb.py protein.pdb chain_letter

where protein.pdb is the PDB input file (e.g. 4LXJ.pdb), chain_letter is a single-letter ID of the chain (typically ‘A’) and $ROSETTA shell variable provides the locations of your Rosetta package installation. As a result you will get a protein_chain_letter.pdb (e.g. 4LXJ_A.pdb) and protein_chain_letter.fasta (e.g. 4LXJ_A.fasta) files.

Cofactor preparation

To prepare HEM .params file you will need a HEM.cif file (It can be downloaded from PDB 3 database). It is needed to prepare HEM.mol2 file with correct atom names and hydrogens present in the output. To get this use PyBioShell script cif_to_mol2.py as described below:

python3.9 $BIOSHELL/doc_examples/py-examples/core/data/cif_to_mol2/cif_to_mol2.py HEM.cif >HEM.mol2

$BIOSHELL shell variable provides the locations of your BIOSHELL package installation.

Then you can run following script to finally produce the `.params file.

python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
      -n HEM -p HEM --chain=B --keep-names HEM.mol2

As a result, you will get a HEM_0001.pdb and HEM.params files in which HEM will be present in chain B. The HEM_0001.pdb will not be used in further steps and should be deleted.

Ligand preparation

Ligands subjected for docking also needs a .params file but this time the conformers will be required as well. To get them use bcl 7 following the protocole used in Meiler Lab. The input ligand.sdf file can be downloaded from PubChem 5, PDB 3 or prepared using PyMOL 4 or openbabel 6 from other type files. An example of using openbabel 6 to prepare .sdf from .pdb file is described below (followig the Meiler Lab protocol).

obabel -ipdb ligand.pdb -osdf -O ligand.sdf --gen3d

First, check if the molecule has reasonable bonds/geometries/clashing/etc. Ligands that pass are the matched and those that fail are the unmatched.

bcl.exe molecule:ConformerGenerator -rotamer_library cod \
  -top_models 100 -ensemble_filenames ligand.sdf \
  -conformers_single_file ligand.conf.sdf \
  -conformation_comparer 'Dihedral(method=Max)' 30 -max_iterations 1000

Then .params file can be prepared as follow:

python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
      -n LIG -p LIG --chain=X --conformers-in-one-file --keep-names ligand.conf.sdf

Check the .params if PDB_ROTAMERS line is present at the end with the correct conformers filename:

grep PDB_ROTAMERS LIG.params

Putting protein, cofactor and ligand together

All those components must be in the same .pdb file. First, prepare HEM.pdb from a protein-HEM complex to have the correct coordinates, using grep. Then sed is used to substitute 22nd character to the letter B (Chain ID is the 22nd character in the line).

grep HETATM protein.pdb | grep HEM | sed 's/./B/22' > HEM.pdb

It is important that the chain in the HEM.pdb is named B. This can be also achieved by using your favourite text editor like gedit, vim, nano; to change the chain letter.

Add the cofactor file to the protein and minimize the side chains:

cat protein_chain_letter.pdb HEM.pdb > protein_HEM.pdb
$ROSETTA/main/source/bin/ligand_rpkmin.hdf5.linuxgccrelease \
    -database $ROSETTA/main/database \
      -in:file:extra_res_fa HEM.params \
      -ex1 -ex2 -no_optH false -flip_HNQ \
      -docking:ligand:old_estat \
      -docking:ligand:soft_rep \
      -nstruct 10 \
      -s protein_HEM.pdb

Then the structure with the lowest energy should be chosen for the next steps with the following command:

k=` awk '/^pose / {print FILENAME, $NF}' protein_HEM_*.pdb | sort -nk2 | head -1 | awk '{print $1}'` && cp $k protein_HEM_input.pdb

which simply sorts file names by the respective scores and selects the file corresponding to the lowest energy value. The last step, which adds the ligand .pdb at the end of the input file, is done by:

cat protein_HEM_input.pdb LIG.pdb > protein_HEM_LIG.pdb

It is important that output file should be named protein_HEM_LIG.pdb or the name has to be changed in dock_protocol.xml and options.short files below.

Docking Procedures

To perform docking the following XML file, named dock_protocole.xml, have to be provided to Rosetta:

"""
TUTAJ POTRZEBA INFORMACJI SKĄD TEN XML - jak to się robi w Rosetta?
"""

<ROSETTASCRIPTS>

    <SCOREFXNS>

        <ScoreFunction name="ligand_soft_rep" weights="ligand_soft_rep">

            <Reweight scoretype="fa_elec" weight="0.42"/>

            <Reweight scoretype="hbond_bb_sc" weight="1.3"/>

            <Reweight scoretype="hbond_sc" weight="1.3"/>

            <Reweight scoretype="rama" weight="0.2"/>

        </ScoreFunction>



        <ScoreFunction name="hard_rep" weights="ligand">

            <Reweight scoretype="fa_intra_rep" weight="0.004"/>

            <Reweight scoretype="fa_elec" weight="0.42"/>

            <Reweight scoretype="hbond_bb_sc" weight="1.3"/>

            <Reweight scoretype="hbond_sc" weight="1.3"/>

            <Reweight scoretype="rama" weight="0.2"/>

        </ScoreFunction>

    </SCOREFXNS>

    <LIGAND_AREAS>

        <LigandArea name="docking_sidechain" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="true" minimize_ligand="10"/>

        <LigandArea name="final_sidechain" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="true"/>

        <LigandArea name="final_backbone" chain="X" cutoff="7.0" add_nbr_radius="false" all_atom_mode="true" Calpha_restraints="0.3"/>

    </LIGAND_AREAS>



    <INTERFACE_BUILDERS>

        <InterfaceBuilder name="side_chain_for_docking" ligand_areas="docking_sidechain"/>

        <InterfaceBuilder name="side_chain_for_final" ligand_areas="final_sidechain"/>

        <InterfaceBuilder name="backbone" ligand_areas="final_backbone" extension_window="3"/>

    </INTERFACE_BUILDERS>
    <MOVEMAP_BUILDERS>

        <MoveMapBuilder name="docking" sc_interface="side_chain_for_docking" minimize_water="true"/>

        <MoveMapBuilder name="final" sc_interface="side_chain_for_final" bb_interface="backbone" minimize_water="true"/>

    </MOVEMAP_BUILDERS>



    <SCORINGGRIDS ligand_chain="X" width="30.0">

        <ClassicGrid grid_name="vdw" weight="1.0"/>

    </SCORINGGRIDS>



    <MOVERS>

        <StartFrom name="start_from" chain="X">

            <Coordinates x="27.739"  y="11.822"  z="18.856" pdb_tag=""/>
            <PDB filename="protein_HEM_LIG.pdb"/>
        </StartFrom>

        <Transform name="transform" chain="X" box_size="10.0" move_distance="0.1" angle="5.0" cycles="500" repeats="1" temperature="5" initial_perturb="3.0" />

        <HighResDocker name="high_res_docker" cycles="6" repack_every_Nth="3" scorefxn="ligand_soft_rep" movemap_builder="docking"/>

        <FinalMinimizer name="final" scorefxn="hard_rep" movemap_builder="final"/>

        <InterfaceScoreCalculator name="add_scores" chains="X" scorefxn="hard_rep" compute_grid_scores="0"/>



        <ParsedProtocol name="low_res_dock">

            <Add mover_name="start_from"/>

            <Add mover_name="transform"/>

        </ParsedProtocol>



        <ParsedProtocol name="high_res_dock">

            <Add mover_name="high_res_docker"/>

            <Add mover_name="final"/>

        </ParsedProtocol>




        <ParsedProtocol name="reporting">

            <Add mover_name="add_scores"/>

        </ParsedProtocol>

    </MOVERS>



    <PROTOCOLS>

        <Add mover_name="low_res_dock"/>

        <Add mover_name="high_res_dock"/>

        <Add mover_name="reporting"/>

    </PROTOCOLS>
</ROSETTASCRIPTS>

with this flags placed in option.short file:

-parser:protocol dock_protocol.xml
-in:auto_setup_metals
-nstruct 5
-s 'protein_HEM_LIG.pdb'
-extra_res_fa LIG.params HEM.params
-ignore_ligand_chi true
-overwrite
-flip_HNQ true
-no_optH false
-ex1
-ex2

assuming that all inputs files are present in the current directory. To run Rosetta just type:

$ROSETTA/main/source/bin/rosetta_scripts.default.EXT  -database $ROSETTA/main/database @options.short

where EXT depend on your system (most commons are linuxgccrelease or macosclangrelease). The outputs will be present in the current directory.

Results clustering using hierarchical agglomerative algorithm

Docking calculations are followed by results analysis, which starts from hierarchical clustering of the obtained poses. This step has been done according to this protocol

References

1

https://www.rosettacommons.org/demos/latest/tutorials/ligand_docking/ligand_docking_tutorial/

2

https://www.rosettacommons.org/manuals/archive/rosetta3.1_user_guide/app_ligand_docking.html

3(1,2)

http://www.rcsb.org/

4

https://pymol.org/2/

5

https://pubchem.ncbi.nlm.nih.gov/

6(1,2)

https://openbabel.org/docs/dev/Command-line_tools/babel.html

7

http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1