Transition to v2 in progress, do not use for now.


About

Scientists have to perform multiple experiments producing qualitative and quantitative data to determine if a compound is able to bind to a given target. Due to the large diversity of the potential ligand chemical space, the possibility of experimentally exploring a lot of compounds on a target rapidly becomes out of reach. Scientists therefore need to use virtual screening methods to determine the putative binding mode of ligands on a protein, and then post-process the raw docking experiments with a dedicated scoring function in relation with experimental data.

Two of the major difficulties for comparing docking predictions with experiments mostly come from the lack of transferability of experimental data and the lack of standardisation in molecule names. Although large portals like PubChem or ChEMBL are available for general purpose, there is no service allowing a formal expert annotation of both experimental data and docking studies. To address these issues, researchers build their own collection of data in flat files, often in spreadsheets, with limited possibilities of extensive annotations or standardisation of ligand descriptions allowing cross-database retrieval.

We have conceived the dockNmine platform to provide a service allowing an expert and authenticated annotation of ligands and targets. First, this portal allows a scientist to incorporate controlled information in the database using reference identifiers for the protein (Uniprot ID) and the ligand (SMILES description), the data and the publication associated to it. Second, it allows the incorporation of docking experiments using forms that automatically parse useful parameters and results. Last, the web interface provides a lot of pre-computed outputs to assess the degree of correlations between docking experiments and experimental data.

General overview

Overview of dockNmine usage, following a project-based logic for data incoporation and analysis. A classical workflow is presented below



Figure 1. General overview of dockNmine usage

(A) (1) Link for the documentation of each service, (2-7) access to each service independently, (8) contact link, (9) login and registration links. A simple demonstration of the functionalities is accessible upon connection using the demo account using the log-in glyphicon or by registering upon clicking on the briefcase.
(B) Once connected, the user can create or join a project where all his data will be assembled (rounded-corner rectangle), he shall then add required protein and ligands (parallelogram), and link them to experimental data and docking results (diamond).

Target

The figure below indicates how to add a target intot the portal. The example provided is for the Solute carrier family 2, facilitated glucose transporter member 1.



Figure 2. Target management

(A) The screenshot presents a request for the retrieval of data for the Uniprot ID P11166, and of known ligands from ChEMBL.
(B) A condensed view of the targets for the project is provided. Some glyphicons are provided to see the details of the entry (magnifier icon), to get detailed statistics (histogram), to download existing data in csv format (circled arrow), to add a comment (pen), or to add a three-dimensional structure for the target (orange arrow towards a cloud).
(C) Detail of a given entry.
(D) If required, the user can upload one or many structures for the target. As structure files can be processed in virtual screening experiments, only the structure file is mandatory, all other fields being optional.

Ligand

There are two ways to manage ligand incorporation into the database: (i) by adding one ligand at a time via the dedicated form, (ii) via the upload form for multiple ligands, which will be regrouped into a library.



Figure 3. Ligand management

Additions of a single ligand, or of multiple ligands into a library.
(A) The form allows to add a single ligand from either the PDB, PubChem or ChEBML. The query using the PDB request is shown here.
(B) After a short period, the details of the added ligand can be accessed, if available, a 2D depiction of the molecule is displayed.
(C) For more extensive data incorporation, the simplest way is to add a library and alongside a valid sdf file.
(D) In this case all ligands available in the sdf file are processed, de-duplicated, and added to the library.
(E) If necessary, a small subset of ligands can be arranged in another library.
(F) This new library will be referenced


Docking

Sample docking data from the study of Siebenecher and co-workers were calculated for a small of of ligands referenced in the publication using autodock vina.



Figure 4. Docking management

Vina docking results import for the beta-D-glucose docked in the glucose transporter.
(A) Upon selection of the docking method, a dedicated form allows to link protein, ligand and docking results. Detailed docking parameters must be provided to allow a further comparison of docking profiles between experiments. If required, the plus glyphicon allows to add a target, a ligand or a target structure prior to entering the docking results.
(B) Detail of the docking analysis. This view indicates the principal features of the docking method.
(C) After docking processing, the cluster energy of vina is transformed into kJ/mol, LE and SILE automatically, to ease comparison against other experiments or other ligands.
(D) Interactive graph depicting the discrete cluster and associated energy. This graph, which can be easily downloaded as an image, allows a rapid overview of the docking energy dispersion for the ligand.

Experiments

Experimental binding data from the study of Siebenecher and co-workers were added to the demonstration project.

Figure 5. Experiments management

Experimental data addition for a selection of ligands from the study of Siebenecher and co-workers.
(A) After IC50 selection from the drop-down menu, a method specific form is shown to the user. Pre-defined valued are provided for pH, temperature, target and ligand concentrations since they are seldomly used. The user can complete the free text box to indicate the data origin, either being from literature of from private laboratory experiments.
(B) Upon form validation, all experimental data are listed

Analysis

If enough virtual and experimental data are present, a classification is performed and compared for each category. This can globally be visualized under the StatsDrudesign tab.

Global analysis



Figure 6. Global analysis

Analysis of ligand classification using reference methods. Experimental data were taken from the work of of Siebenecher and co-workers, the docking results were computed for this study.
(A) Single ligand analysis for CHEMBL3780153. Both the experimental and docking values allow to classify it as a good ligand.
(B) A more complete analysis of the overall virtual screening allows to evaluate the ongoing project evolution.

Local analysis

Data per protein or per-ligand can be visualized by clicking on the magnifier icon of each entry. In this case, the value of each data, its corresponding score, and a comparison of all data is provided. These data can further be dowloaded in csv format to perform more complex analysis on a dedicated software.



Figure 7. Local analysis

Comparison of experimental and virtual data for GLUT proteins and their ligands. Experimental data from Siebenecher and co-workers, the docking results were computed for this study.
(A) Comparison of ligand results for CHEMBL3780153. The docking was performed on all proteins, but experimental values are only available for GLUT1 and GLUT3.
(B) Tabular results and graphical representation of docking results for GLUT2.

Authentication

Authentication and permissions are handled using Django's built-in authentication mechanism. Per-object permission control is ensured by the Guardian module.

To ease permission management, pre-defined roles are available (see Table below). These permissions can be finely tuned by dockNmine administrator, for each object and/or role.

ProjectTargetLigandExperimental methodDockingLibrary
SuperUserCRUDCRUDCRUDCRUDCRUDCRUD
ManagerCRUCRUCRUCRUCRUCRU
MemberRRCRCRCRCR
AnonymousRRRRRR

File formats for ligands and proteins

Ligands

The 3D coordinates in sdf format for small molecules (ligands) are downloaded from upstream source when possible. The sdf format is the reference three-dimensional structure file format of all ligands.

When a ligand is uploaded using the PDB form, its ideal conformation (not the bound one from a crystal structure) is downloaded, which corresponds to "idealized coordinates (generated using Molecular Networks' Corina, and if there are issues, OpenEye's OMEGA)".

When a ligand is uploaded using the PubChem form, the 3D sdf file is downloaded from PubChem, this file was processed using the CACTVS toolkit.

When a ligand is uploaded using the ChEMBL form, its 3D reference conformation is computed from its SMILES description using rdkit and the UFF optimization procedure.

Proteins

For proteins, no automatic download is proposed since a particular attention to the receptor preparation is required in virtual docking experiments, for instance for pkA and charge adaptation (process your protein file online here). The default format for protein file upload is PDB/CIF, but the pdbqt format is accepted for Vina or Autodock.