Docking ligands to P450 enzymes with RosettaScripts¶
This protocol has been adpted from Rosetta Ligand Docking tutorial 1 and Ligand Dock Application 2 and utilised during our study on docking various inhibitors to P450 orthologs. Comparative modelling for the twelve CYP51 orthologus sequences has been completed as described in comparative modelling with Modeller section.
The following Rosetta version was employed:
INFO:pyrosetta.rosetta:PyRosetta-4 2019
PyRosetta4.conda.linux.CentOS.python36.Release2019.44+release.5aed75f2e796a33ab71515b6c1daa321eb2294a2
2019-10-29T08:37:43 retrieved from: http://www.pyrosetta.org
Note
This is the version 2.0 of the docking protocol. The previous version, that has been used in the following paper:
O.O. Akapo, J.M. Macnar, J.D. Kryś, P.R. Syed, K. Syed and D. Gront, In silico structural modeling and analysis of interactions of Tremellomycetes cytochrome P450 monooxygenases CYP51s with substrates and azoles, submitted to IJMS
is available on this page
Preparation of docking¶
Protein preparation¶
We used Rosetta clean-pdb.py script (distributed with the software package) to prepare a protein chain for further steps:
python2.7 $ROSETTA/tools/protein_tools/scripts/clean_pdb.py protein.pdb chain_letter
where protein.pdb
is the PDB input file (e.g. 4LXJ.pdb), chain_letter
is a single-letter ID of the chain (typically ‘A’) and $ROSETTA
shell variable provides the locations of your Rosetta package installation. As a result you will get a protein_chain_letter.pdb
(e.g. 4LXJ_A.pdb) and protein_chain_letter.fasta
(e.g. 4LXJ_A.fasta) files.
Cofactor preparation¶
To prepare HEM .params file you will need a HEM.cif
file (It can be downloaded from PDB 3 database). It is needed to prepare HEM.mol2
file with correct atom names and hydrogens present in the output. To get this use PyBioShell script cif_to_mol2.py
as described below:
python3.9 $BIOSHELL/doc_examples/py-examples/core/data/cif_to_mol2/cif_to_mol2.py HEM.cif >HEM.mol2
$BIOSHELL
shell variable provides the locations of your BIOSHELL package installation.
Then you can run following script to finally produce the `.params file.
python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
-n HEM -p HEM --chain=B --keep-names HEM.mol2
As a result, you will get a HEM_0001.pdb
and HEM.params
files in which HEM will be present in chain B. The HEM_0001.pdb
will not be used in further steps and should be deleted.
Ligand preparation¶
Ligands subjected for docking also needs a .params file but this time the conformers will be required as well. To get them use bcl 7 following the protocole used in Meiler Lab. The input ligand.sdf file can be downloaded from PubChem 5, PDB 3 or prepared using PyMOL 4 or openbabel 6 from other type files. An example of using openbabel 6 to prepare .sdf from .pdb file is described below (followig the Meiler Lab protocol).
obabel -ipdb ligand.pdb -osdf -O ligand.sdf --gen3d
First, check if the molecule has reasonable bonds/geometries/clashing/etc. Ligands that pass are the matched and those that fail are the unmatched.
bcl.exe molecule:ConformerGenerator -rotamer_library cod \
-top_models 100 -ensemble_filenames ligand.sdf \
-conformers_single_file ligand.conf.sdf \
-conformation_comparer 'Dihedral(method=Max)' 30 -max_iterations 1000
Then .params file can be prepared as follow:
python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
-n LIG -p LIG --chain=X --conformers-in-one-file --keep-names ligand.conf.sdf
Check the .params if PDB_ROTAMERS line is present at the end with the correct conformers filename:
grep PDB_ROTAMERS LIG.params
Putting protein, cofactor and ligand together¶
All those components must be in the same .pdb file. First, prepare HEM.pdb
from a protein-HEM complex to have the correct coordinates, using grep
. Then sed
is used to substitute 22nd character to the letter B (Chain ID is the 22nd character in the line).
grep HETATM protein.pdb | grep HEM | sed 's/./B/22' > HEM.pdb
It is important that the chain in the HEM.pdb
is named B. This can be also achieved by using your favourite text editor like gedit, vim, nano; to change the chain letter.
Add the cofactor file to the protein and minimize the side chains:
cat protein_chain_letter.pdb HEM.pdb > protein_HEM.pdb
$ROSETTA/main/source/bin/ligand_rpkmin.hdf5.linuxgccrelease \
-database $ROSETTA/main/database \
-in:file:extra_res_fa HEM.params \
-ex1 -ex2 -no_optH false -flip_HNQ \
-docking:ligand:old_estat \
-docking:ligand:soft_rep \
-nstruct 10 \
-s protein_HEM.pdb
Then the structure with the lowest energy should be chosen for the next steps with the following command:
k=` awk '/^pose / {print FILENAME, $NF}' protein_HEM_*.pdb | sort -nk2 | head -1 | awk '{print $1}'` && cp $k protein_HEM_input.pdb
which simply sorts file names by the respective scores and selects the file corresponding to the lowest energy value. The last step, which adds the ligand .pdb at the end of the input file, is done by:
cat protein_HEM_input.pdb LIG.pdb > protein_HEM_LIG.pdb
It is important that output file should be named protein_HEM_LIG.pdb
or the name has to be changed in dock_protocol.xml and options.short files below.
Docking Procedures¶
To perform docking the following XML file, named dock_protocole.xml, have to be provided to Rosetta:
"""
TUTAJ POTRZEBA INFORMACJI SKĄD TEN XML - jak to się robi w Rosetta?
"""
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ligand_soft_rep" weights="ligand_soft_rep">
<Reweight scoretype="fa_elec" weight="0.42"/>
<Reweight scoretype="hbond_bb_sc" weight="1.3"/>
<Reweight scoretype="hbond_sc" weight="1.3"/>
<Reweight scoretype="rama" weight="0.2"/>
</ScoreFunction>
<ScoreFunction name="hard_rep" weights="ligand">
<Reweight scoretype="fa_intra_rep" weight="0.004"/>
<Reweight scoretype="fa_elec" weight="0.42"/>
<Reweight scoretype="hbond_bb_sc" weight="1.3"/>
<Reweight scoretype="hbond_sc" weight="1.3"/>
<Reweight scoretype="rama" weight="0.2"/>
</ScoreFunction>
</SCOREFXNS>
<LIGAND_AREAS>
<LigandArea name="docking_sidechain" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="true" minimize_ligand="10"/>
<LigandArea name="final_sidechain" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="true"/>
<LigandArea name="final_backbone" chain="X" cutoff="7.0" add_nbr_radius="false" all_atom_mode="true" Calpha_restraints="0.3"/>
</LIGAND_AREAS>
<INTERFACE_BUILDERS>
<InterfaceBuilder name="side_chain_for_docking" ligand_areas="docking_sidechain"/>
<InterfaceBuilder name="side_chain_for_final" ligand_areas="final_sidechain"/>
<InterfaceBuilder name="backbone" ligand_areas="final_backbone" extension_window="3"/>
</INTERFACE_BUILDERS>
<MOVEMAP_BUILDERS>
<MoveMapBuilder name="docking" sc_interface="side_chain_for_docking" minimize_water="true"/>
<MoveMapBuilder name="final" sc_interface="side_chain_for_final" bb_interface="backbone" minimize_water="true"/>
</MOVEMAP_BUILDERS>
<SCORINGGRIDS ligand_chain="X" width="30.0">
<ClassicGrid grid_name="vdw" weight="1.0"/>
</SCORINGGRIDS>
<MOVERS>
<StartFrom name="start_from" chain="X">
<Coordinates x="27.739" y="11.822" z="18.856" pdb_tag=""/>
<PDB filename="protein_HEM_LIG.pdb"/>
</StartFrom>
<Transform name="transform" chain="X" box_size="10.0" move_distance="0.1" angle="5.0" cycles="500" repeats="1" temperature="5" initial_perturb="3.0" />
<HighResDocker name="high_res_docker" cycles="6" repack_every_Nth="3" scorefxn="ligand_soft_rep" movemap_builder="docking"/>
<FinalMinimizer name="final" scorefxn="hard_rep" movemap_builder="final"/>
<InterfaceScoreCalculator name="add_scores" chains="X" scorefxn="hard_rep" compute_grid_scores="0"/>
<ParsedProtocol name="low_res_dock">
<Add mover_name="start_from"/>
<Add mover_name="transform"/>
</ParsedProtocol>
<ParsedProtocol name="high_res_dock">
<Add mover_name="high_res_docker"/>
<Add mover_name="final"/>
</ParsedProtocol>
<ParsedProtocol name="reporting">
<Add mover_name="add_scores"/>
</ParsedProtocol>
</MOVERS>
<PROTOCOLS>
<Add mover_name="low_res_dock"/>
<Add mover_name="high_res_dock"/>
<Add mover_name="reporting"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
with this flags placed in option.short file:
-parser:protocol dock_protocol.xml
-in:auto_setup_metals
-nstruct 5
-s 'protein_HEM_LIG.pdb'
-extra_res_fa LIG.params HEM.params
-ignore_ligand_chi true
-overwrite
-flip_HNQ true
-no_optH false
-ex1
-ex2
assuming that all inputs files are present in the current directory. To run Rosetta just type:
$ROSETTA/main/source/bin/rosetta_scripts.default.EXT -database $ROSETTA/main/database @options.short
where EXT
depend on your system (most commons are linuxgccrelease
or macosclangrelease
). The outputs will be present in the current directory.
Results clustering using hierarchical agglomerative algorithm¶
Docking calculations are followed by results analysis, which starts from hierarchical clustering of the obtained poses. This step has been done according to this protocol
References
- 1
https://www.rosettacommons.org/demos/latest/tutorials/ligand_docking/ligand_docking_tutorial/
- 2
https://www.rosettacommons.org/manuals/archive/rosetta3.1_user_guide/app_ligand_docking.html
- 3(1,2)
- 4
- 5
- 6(1,2)
https://openbabel.org/docs/dev/Command-line_tools/babel.html
- 7
http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1