.. _rosetta_cyp51_docking:

Docking ligands to P450 enzymes with RosettaScripts
===================================================

This protocol has been adpted from `Rosetta Ligand Docking tutorial`_ and `Ligand Dock Application`_  and utilised during our study on docking various inhibitors to P450 orthologs. Comparative modelling for the twelve CYP51 orthologus  sequences has been completed as described in 
:ref:`comparative modelling with Modeller<comparative_modelling_modeller>` section.
  

The following Rosetta version was employed:

.. code-block:: bash

 INFO:pyrosetta.rosetta:PyRosetta-4 2019
 PyRosetta4.conda.linux.CentOS.python36.Release2019.44+release.5aed75f2e796a33ab71515b6c1daa321eb2294a2
 2019-10-29T08:37:43 retrieved from: http://www.pyrosetta.org

.. note::
  This is the version 2.0 of the docking protocol. The previous version, that has been used in the following paper:

  O.O. Akapo, J.M. Macnar, J.D. Kryś, P.R. Syed, K. Syed and D. Gront,  *In silico structural modeling and analysis of interactions of Tremellomycetes cytochrome P450 monooxygenases CYP51s with substrates and azoles*, submitted to *IJMS*

  is available :ref:`on this page<pyrosetta_cyp51_docking>`
 
.. contents:: :local:

Preparation of docking
~~~~~~~~~~~~~~~~~~~~~~~~~

Protein preparation
^^^^^^^^^^^^^^^^^^^^^^

We used Rosetta `clean-pdb.py` script (distributed with the software package) to prepare a protein chain for further steps:

.. code-block:: bash
    
  python2.7 $ROSETTA/tools/protein_tools/scripts/clean_pdb.py protein.pdb chain_letter

where ``protein.pdb`` is the PDB input file (e.g. `4LXJ.pdb`), ``chain_letter`` is a single-letter ID of the chain (typically 'A') and ``$ROSETTA`` shell variable provides the locations of your Rosetta package installation. As a result you will get a ``protein_chain_letter.pdb`` (e.g. `4LXJ_A.pdb`) and ``protein_chain_letter.fasta`` (e.g. `4LXJ_A.fasta`) files.

Cofactor preparation
^^^^^^^^^^^^^^^^^^^^^^^
To prepare HEM `.params` file you will need a ``HEM.cif`` file (It can be downloaded from `PDB`_ database). It is needed to prepare ``HEM.mol2`` file with correct atom names and hydrogens present in the output. To get this use PyBioShell script ``cif_to_mol2.py`` as described below:

.. code-block:: bash

   python3.9 $BIOSHELL/doc_examples/py-examples/core/data/cif_to_mol2/cif_to_mol2.py HEM.cif >HEM.mol2

``$BIOSHELL`` shell variable provides the locations of your BIOSHELL package installation.

Then you can run following script to finally produce the `.params file.

.. code-block:: bash
  
  python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
        -n HEM -p HEM --chain=B --keep-names HEM.mol2

As a result, you will get a ``HEM_0001.pdb`` and ``HEM.params`` files in which HEM will be present in chain B. The ``HEM_0001.pdb`` will not be used in further steps and should be deleted.


Ligand preparation
^^^^^^^^^^^^^^^^^^^^^

Ligands subjected for docking also needs a `.params` file but this time the conformers will be required as well. To get them
use `bcl`_ following the protocole used in Meiler Lab. The input `ligand.sdf` file can be downloaded
from `PubChem`_, `PDB`_ or prepared using `PyMOL`_ or `openbabel`_ from other type files. An example of using `openbabel`_ to prepare `.sdf` from `.pdb` file is described below (followig the Meiler Lab protocol).

.. code-block:: bash

  obabel -ipdb ligand.pdb -osdf -O ligand.sdf --gen3d

First, check if the molecule has reasonable bonds/geometries/clashing/etc. Ligands that pass are
the *matched* and those that fail are the *unmatched*.

.. code-block:: bash
  
  bcl.exe molecule:ConformerGenerator -rotamer_library cod \
    -top_models 100 -ensemble_filenames ligand.sdf \
    -conformers_single_file ligand.conf.sdf \
    -conformation_comparer 'Dihedral(method=Max)' 30 -max_iterations 1000

Then `.params` file can be prepared as follow:

.. code-block:: bash

  python2.7 $ROSETTA/main/source/scripts/python/public/molfile_to_params.py \
        -n LIG -p LIG --chain=X --conformers-in-one-file --keep-names ligand.conf.sdf

Check the `.params` if `PDB_ROTAMERS` line is present at the end with the correct conformers
filename:

.. code-block:: bash

  grep PDB_ROTAMERS LIG.params

Putting protein, cofactor and ligand together
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

All those components must be in the same `.pdb` file. First, prepare ``HEM.pdb`` from a protein-HEM complex to have the correct coordinates, using ``grep``. Then ``sed`` is used to substitute 22nd character to the letter **B** (`Chain ID` is the 22nd character in the line).

.. code-block:: bash

  grep HETATM protein.pdb | grep HEM | sed 's/./B/22' > HEM.pdb

It is important that the chain in the ``HEM.pdb`` is named **B**. This can be also achieved by using your favourite text editor like gedit, vim, nano; to change the chain letter.

Add the cofactor file to the protein and  minimize the side chains:

.. code-block:: bash

  cat protein_chain_letter.pdb HEM.pdb > protein_HEM.pdb

.. code-block:: bash

  $ROSETTA/main/source/bin/ligand_rpkmin.hdf5.linuxgccrelease \
      -database $ROSETTA/main/database \
        -in:file:extra_res_fa HEM.params \
        -ex1 -ex2 -no_optH false -flip_HNQ \
        -docking:ligand:old_estat \
        -docking:ligand:soft_rep \
        -nstruct 10 \
        -s protein_HEM.pdb

Then the structure with the lowest energy should be chosen for the next steps with the following command:

.. code-block:: bash

  k=` awk '/^pose / {print FILENAME, $NF}' protein_HEM_*.pdb | sort -nk2 | head -1 | awk '{print $1}'` && cp $k protein_HEM_input.pdb

which simply sorts file names by the respective scores and selects the file corresponding to the lowest energy value. The last step, which adds the ligand `.pdb` at the end of the input file, is done by:

.. code-block:: bash

  cat protein_HEM_input.pdb LIG.pdb > protein_HEM_LIG.pdb

It is important that output file should be named ``protein_HEM_LIG.pdb`` or the name has to be changed in `dock_protocol.xml` and `options.short` files below.


Docking Procedures
~~~~~~~~~~~~~~~~~~~~~

To perform docking the following XML file, named `dock_protocole.xml`, have to be provided to Rosetta:

.. code-block:: bash
    
    """
    TUTAJ POTRZEBA INFORMACJI SKĄD TEN XML - jak to się robi w Rosetta?
    """

    <ROSETTASCRIPTS>

        <SCOREFXNS>

            <ScoreFunction name="ligand_soft_rep" weights="ligand_soft_rep">
    
                <Reweight scoretype="fa_elec" weight="0.42"/>
    
                <Reweight scoretype="hbond_bb_sc" weight="1.3"/>
    
                <Reweight scoretype="hbond_sc" weight="1.3"/>
    
                <Reweight scoretype="rama" weight="0.2"/>
    
            </ScoreFunction>
    
    
            <ScoreFunction name="hard_rep" weights="ligand">
    
                <Reweight scoretype="fa_intra_rep" weight="0.004"/>
    
                <Reweight scoretype="fa_elec" weight="0.42"/>
    
                <Reweight scoretype="hbond_bb_sc" weight="1.3"/>
    
                <Reweight scoretype="hbond_sc" weight="1.3"/>
    
                <Reweight scoretype="rama" weight="0.2"/>
    
            </ScoreFunction>
    
        </SCOREFXNS>
    
        <LIGAND_AREAS>
    
            <LigandArea name="docking_sidechain" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="true" minimize_ligand="10"/>
    
            <LigandArea name="final_sidechain" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="true"/>
    
            <LigandArea name="final_backbone" chain="X" cutoff="7.0" add_nbr_radius="false" all_atom_mode="true" Calpha_restraints="0.3"/>
    
        </LIGAND_AREAS>
    
    
        <INTERFACE_BUILDERS>
    
            <InterfaceBuilder name="side_chain_for_docking" ligand_areas="docking_sidechain"/>
    
            <InterfaceBuilder name="side_chain_for_final" ligand_areas="final_sidechain"/>
    
            <InterfaceBuilder name="backbone" ligand_areas="final_backbone" extension_window="3"/>
    
        </INTERFACE_BUILDERS>
        <MOVEMAP_BUILDERS>
    
            <MoveMapBuilder name="docking" sc_interface="side_chain_for_docking" minimize_water="true"/>
    
            <MoveMapBuilder name="final" sc_interface="side_chain_for_final" bb_interface="backbone" minimize_water="true"/>
    
        </MOVEMAP_BUILDERS>
    
    
        <SCORINGGRIDS ligand_chain="X" width="30.0">
    
            <ClassicGrid grid_name="vdw" weight="1.0"/>
    
        </SCORINGGRIDS>
    
    
        <MOVERS>
    
            <StartFrom name="start_from" chain="X">
    
                <Coordinates x="27.739"  y="11.822"  z="18.856" pdb_tag=""/>
                <PDB filename="protein_HEM_LIG.pdb"/>
            </StartFrom>
    
            <Transform name="transform" chain="X" box_size="10.0" move_distance="0.1" angle="5.0" cycles="500" repeats="1" temperature="5" initial_perturb="3.0" />
    
            <HighResDocker name="high_res_docker" cycles="6" repack_every_Nth="3" scorefxn="ligand_soft_rep" movemap_builder="docking"/>
    
            <FinalMinimizer name="final" scorefxn="hard_rep" movemap_builder="final"/>
    
            <InterfaceScoreCalculator name="add_scores" chains="X" scorefxn="hard_rep" compute_grid_scores="0"/>
    
    
            <ParsedProtocol name="low_res_dock">
    
                <Add mover_name="start_from"/>
    
                <Add mover_name="transform"/>
    
            </ParsedProtocol>
    
    
            <ParsedProtocol name="high_res_dock">
    
                <Add mover_name="high_res_docker"/>
    
                <Add mover_name="final"/>
    
            </ParsedProtocol>
    
    
            <ParsedProtocol name="reporting">
    
                <Add mover_name="add_scores"/>
    
            </ParsedProtocol>
    
        </MOVERS>
    
    
        <PROTOCOLS>
    
            <Add mover_name="low_res_dock"/>
    
            <Add mover_name="high_res_dock"/>
    
            <Add mover_name="reporting"/>
    
        </PROTOCOLS>
    </ROSETTASCRIPTS>
             
with this flags placed in `option.short` file:

.. code-block:: bash

    -parser:protocol dock_protocol.xml
    -in:auto_setup_metals
    -nstruct 5
    -s 'protein_HEM_LIG.pdb'
    -extra_res_fa LIG.params HEM.params
    -ignore_ligand_chi true
    -overwrite
    -flip_HNQ true
    -no_optH false
    -ex1
    -ex2
    
assuming that all inputs files are present in the current directory.
To run Rosetta just type:

.. code-block:: bash
    
    $ROSETTA/main/source/bin/rosetta_scripts.default.EXT  -database $ROSETTA/main/database @options.short

where ``EXT`` depend on your system (most commons are ``linuxgccrelease`` or ``macosclangrelease``). The outputs will be present in the current directory.

Results clustering using hierarchical agglomerative algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Docking calculations are followed by results analysis, which starts from hierarchical clustering of  the obtained poses. This step has been done according to :ref:`this protocol<ligand_clustering>`


.. rubric::
  References

.. target-notes::

.. _`Rosetta Ligand Docking tutorial` : https://www.rosettacommons.org/demos/latest/tutorials/ligand_docking/ligand_docking_tutorial/

.. _`Ligand Dock Application` : https://www.rosettacommons.org/manuals/archive/rosetta3.1_user_guide/app_ligand_docking.html

.. _`08.01 PyRosetta notebook` : https://github.com/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/08.01-Ligand-Docking-XMLObjects.ipynb

.. _`PDB`: http://www.rcsb.org/

.. _`MODELLER`: https://salilab.org/modeller/

.. _`PyMOL`: https://pymol.org/2/

.. _`PubChem` : https://pubchem.ncbi.nlm.nih.gov/

.. _`openbabel`: https://openbabel.org/docs/dev/Command-line_tools/babel.html

.. _`bcl` : http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1

.. _`BioShell 3.0` : https://bioshell.readthedocs.io/en/latest/