.. _rosetta_cyp51_docking: Docking ligands to P450 enzymes with RosettaScripts =================================================== This protocol has been adpted from `Rosetta Ligand Docking tutorial`_ and `Ligand Dock Application`_ and utilised during our study on docking various inhibitors to P450 orthologs. Comparative modelling for the twelve CYP51 orthologus sequences has been completed as described in :ref:`comparative modelling with Modeller` section. The following Rosetta version was employed: .. code-block:: bash INFO:pyrosetta.rosetta:PyRosetta-4 2019 PyRosetta4.conda.linux.CentOS.python36.Release2019.44+release.5aed75f2e796a33ab71515b6c1daa321eb2294a2 2019-10-29T08:37:43 retrieved from: http://www.pyrosetta.org .. note:: This is the version 2.0 of the docking protocol. The previous version, that has been used in the following paper: O.O. Akapo, J.M. Macnar, J.D. Kryś, P.R. Syed, K. Syed and D. Gront, *In silico structural modeling and analysis of interactions of Tremellomycetes cytochrome P450 monooxygenases CYP51s with substrates and azoles*, submitted to *IJMS* is available :ref:`on this page` .. contents:: :local: Preparation of docking ~~~~~~~~~~~~~~~~~~~~~~~~~ Protein preparation ^^^^^^^^^^^^^^^^^^^^^^ We used Rosetta `clean-pdb.py` script (distributed with the software package) to prepare a protein chain for further steps: .. code-block:: bash python2.7 $ROSETTA/tools/protein_tools/scripts/clean_pdb.py protein.pdb chain_letter where ``protein.pdb`` is the PDB input file (e.g. `4LXJ.pdb`), ``chain_letter`` is a single-letter ID of the chain (typically 'A') and ``$ROSETTA`` shell variable provides the locations of your Rosetta package installation. As a result you will get a ``protein_chain_letter.pdb`` (e.g. `4LXJ_A.pdb`) and ``protein_chain_letter.fasta`` (e.g. `4LXJ_A.fasta`) files. Cofactor preparation ^^^^^^^^^^^^^^^^^^^^^^^ To prepare HEM `.params` file you will need a ``HEM.cif`` file (It can be downloaded from `PDB`_ database). It is needed to prepare ``HEM.mol2`` file with correct atom names and hydrogens present in the output. To get this use PyBioShell script ``cif_to_mol2.py`` as described below: .. code-block:: bash python3.9 $BIOSHELL/doc_examples/py-examples/core/data/cif_to_mol2/cif_to_mol2.py HEM.cif >HEM.mol2 ``$BIOSHELL`` shell variable provides the locations of your BIOSHELL package installation. Then you can run following script to finally produce the `.params file. .. code-block:: bash python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \ -n HEM -p HEM --chain=B --keep-names HEM.mol2 As a result, you will get a ``HEM_0001.pdb`` and ``HEM.params`` files in which HEM will be present in chain B. The ``HEM_0001.pdb`` will not be used in further steps and should be deleted. Ligand preparation ^^^^^^^^^^^^^^^^^^^^^ Ligands subjected for docking also needs a `.params` file but this time the conformers will be required as well. To get them use `bcl`_ following the protocole used in Meiler Lab. The input `ligand.sdf` file can be downloaded from `PubChem`_, `PDB`_ or prepared using `PyMOL`_ or `openbabel`_ from other type files. An example of using `openbabel`_ to prepare `.sdf` from `.pdb` file is described below (followig the Meiler Lab protocol). .. code-block:: bash obabel -ipdb ligand.pdb -osdf -O ligand.sdf --gen3d First, check if the molecule has reasonable bonds/geometries/clashing/etc. Ligands that pass are the *matched* and those that fail are the *unmatched*. .. code-block:: bash bcl.exe molecule:ConformerGenerator -rotamer_library cod \ -top_models 100 -ensemble_filenames ligand.sdf \ -conformers_single_file ligand.conf.sdf \ -conformation_comparer 'Dihedral(method=Max)' 30 -max_iterations 1000 Then `.params` file can be prepared as follow: .. code-block:: bash python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \ -n LIG -p LIG --chain=X --conformers-in-one-file --keep-names ligand.conf.sdf Check the `.params` if `PDB_ROTAMERS` line is present at the end with the correct conformers filename: .. code-block:: bash grep PDB_ROTAMERS LIG.params Putting protein, cofactor and ligand together ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All those components must be in the same `.pdb` file. First, prepare ``HEM.pdb`` from a protein-HEM complex to have the correct coordinates, using ``grep``. Then ``sed`` is used to substitute 22nd character to the letter **B** (`Chain ID` is the 22nd character in the line). .. code-block:: bash grep HETATM protein.pdb | grep HEM | sed 's/./B/22' > HEM.pdb It is important that the chain in the ``HEM.pdb`` is named **B**. This can be also achieved by using your favourite text editor like gedit, vim, nano; to change the chain letter. Add the cofactor file to the protein and minimize the side chains: .. code-block:: bash cat protein_chain_letter.pdb HEM.pdb > protein_HEM.pdb .. code-block:: bash $ROSETTA/main/source/bin/ligand_rpkmin.hdf5.linuxgccrelease \ -database $ROSETTA/main/database \ -in:file:extra_res_fa HEM.params \ -ex1 -ex2 -no_optH false -flip_HNQ \ -docking:ligand:old_estat \ -docking:ligand:soft_rep \ -nstruct 10 \ -s protein_HEM.pdb Then the structure with the lowest energy should be chosen for the next steps with the following command: .. code-block:: bash k=` awk '/^pose / {print FILENAME, $NF}' protein_HEM_*.pdb | sort -nk2 | head -1 | awk '{print $1}'` && cp $k protein_HEM_input.pdb which simply sorts file names by the respective scores and selects the file corresponding to the lowest energy value. The last step, which adds the ligand `.pdb` at the end of the input file, is done by: .. code-block:: bash cat protein_HEM_input.pdb LIG.pdb > protein_HEM_LIG.pdb It is important that output file should be named ``protein_HEM_LIG.pdb`` or the name has to be changed in `dock_protocol.xml` and `options.short` files below. Docking Procedures ~~~~~~~~~~~~~~~~~~~~~ To perform docking the following XML file, named `dock_protocole.xml`, have to be provided to Rosetta: .. code-block:: bash """ TUTAJ POTRZEBA INFORMACJI SKĄD TEN XML - jak to się robi w Rosetta? """ with this flags placed in `option.short` file: .. code-block:: bash -parser:protocol dock_protocol.xml -in:auto_setup_metals -nstruct 5 -s 'protein_HEM_LIG.pdb' -extra_res_fa LIG.params HEM.params -ignore_ligand_chi true -overwrite -flip_HNQ true -no_optH false -ex1 -ex2 assuming that all inputs files are present in the current directory. To run Rosetta just type: .. code-block:: bash $ROSETTA/main/source/bin/rosetta_scripts.default.EXT -database $ROSETTA/main/database @options.short where ``EXT`` depend on your system (most commons are ``linuxgccrelease`` or ``macosclangrelease``). The outputs will be present in the current directory. Results clustering using hierarchical agglomerative algorithm ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Docking calculations are followed by results analysis, which starts from hierarchical clustering of the obtained poses. This step has been done according to :ref:`this protocol` .. rubric:: References .. target-notes:: .. _`Rosetta Ligand Docking tutorial` : https://www.rosettacommons.org/demos/latest/tutorials/ligand_docking/ligand_docking_tutorial/ .. _`Ligand Dock Application` : https://www.rosettacommons.org/manuals/archive/rosetta3.1_user_guide/app_ligand_docking.html .. _`08.01 PyRosetta notebook` : https://github.com/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/08.01-Ligand-Docking-XMLObjects.ipynb .. _`PDB`: http://www.rcsb.org/ .. _`MODELLER`: https://salilab.org/modeller/ .. _`PyMOL`: https://pymol.org/2/ .. _`PubChem` : https://pubchem.ncbi.nlm.nih.gov/ .. _`openbabel`: https://openbabel.org/docs/dev/Command-line_tools/babel.html .. _`bcl` : http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1 .. _`BioShell 3.0` : https://bioshell.readthedocs.io/en/latest/