Frequently Asked Questions
1. What kind of information can I find in Promiscuous 2.0?
In order to provide comprehensive and easy-to-access data which can be used by scientists for the purpose of drug repositioning, Promiscuous 2.0 integrates data from a variety of sources in a unified dataset. Drugs and drug-like small molecule compounds are hereby attained from DrugBank, ChEMBL and Superdrug as well as their associated target relations. To further complement drug-target-relations, additional information to the targets is extracted and displayed from UniProt. Moreover, data regarding side effects from the SIDER database and indications from the Therapeutic Target Database were obtained and integrated in order to provide the information necessary for the repositioning of drugs.
2. Which filtering steps were applied to the interaction data?
The interaction data obtained for the >900.000 small molecule compounds from the ChEMBL database was additionally filtered, applying 4 criteria:
- 1. Standard relation: "="
- 2. Standard unit: "nm", "mM" or "uM" (which were all converted to nM in the final database)
- 3. Confidence score >= 4
- 4. Activity comment does not contain the expressions "inconclusive", "not determined", "not tested", "not active", "inactive", "no effect"
This filtering reduced the number of interactions from 15.207.914 to 2.727.520.
3. What are the different options to search for information on Promiscuous 2.0?
Promiscuous 2.0 aims to be a resource easily usable by experts on their field as well as non-experts. For this purpose different options to retrieve the compiled data are offered supporting the objective of the user. Experts looking for specific targets or drugs can access information regarding their area of interest directly, by using the drug-/target-search with parameters like UniProt and PubChem identifiers or names. For non-experts or experts attempting to attain a broader overview to specific indications or organs the "Browse" option is integrated to search for all targets or drugs associated with specific indications or area of effects.
4. How do I use the drug search?
The search for specific drugs is performed based on structural similarity with different options provided to submit the drug structure of interest. Upon entering the PubChem name and clicking the "start" button, the corresponding structure is loaded and previewed in the ChemDoodle structure view. Alternatively you can provide a SMILES string to query Promiscuous 2.0 for similar structures or directly import a molecule structure from a file. Moreover it is also possible to draw a molecule structure using the drawing tools provided. Once a structure was successfully obtained and displayed from either of the described methods, the similarity search can be started via the corresponding button below the structure view.
5. Why is a specific point (for instance "predicted targets") missing in the drug result page?
When you search for a drug and feel that its result page is missing some information, which was present for other drugs (like predicted targets or known side-effects), the complete absence of the point indicates that we unfortunately had no information regarding it. Therefore the whole paragraph is not displayed for the concerned drug instead of showing empty tables.
6. How do I use the target search?
To access the information for specific targets the corresponding UniProt identifiers are required, either an accession number, a UniProt name or the name of the gene as specified in UniProt. When queried with a UniProt accession number an exact match search is performed and the corresponding target reported. In contrast, for a gene or UniProt name a search over all UniProt (gene) name synonyms is performed and all matching proteins are reported.
7. What is the purpose of the Browse tab and how do I use it?
The intention behind the "Browse" tab is to provide an easily understandable entry point that does not require any detailed preliminary knowledge about drug or target identifiers, and is therefore ideally usable by non-experts or to gain a broader overview. Here drugs or targets can be obtained based on indications they are associated with as well as (in case of drugs) their anatomic acting area and therapeutic, pharmacological and chemical properties (ATC). To start a search it is sufficient to simply select an ICD-10 or ATC category, which retrieves all drugs or targets that are associated with the selected category. ICd-10 and ATC Codes are displayed hierarchically, whereby it is also possible to select a broader category (e.g. all "Intestinal infectious diseases" instead of a specific disease like "Cholera"). Additionally you can search the ICD-10 and ATC selection for terms of interest (e.g. Hepatitis).
8. Why is a specific ICD-10 or ATC Code not provided in the browsing options?
Only ICD-10 and ATC Codes for which there are known drugs or targets will be displayed in the browsing options. If you are interested in a specific ICD-10 or ATC Code and cannot find it in the selection we unfortunately do not possess any information regarding it.
9. How can I use the drug repositioning option?
As the drug repositioning feature is based on structural similarity of the query compound to different compounds contained in the database, a structure is needed as starting point. Similar to the drug-search there are different options to provide to molecule structure of interest. Upon entering a PubChem name and clicking the "start" button, the corresponding structure is loaded and previewed in the ChemDoodle structure view. Alternatively you can provide a SMILES string to query Promiscuous 2.0 for similar structures or directly import a molecule structure from a file. Moreover it is also possible to draw a molecule structure using the drawing tools provided. Once a structure was entered you can start the search for similar compounds below the structure viewer.
In the following calculation all similar compounds contained in Promiscuous 2.0 will be retrieved and their known targets analyzed for potential indications. In order to ensure the novelty of the suggestions all known targets of the query compound, should it be contained in Promiscuous 2.0, will be excluded for further consideration in the search for proposable indications.
Additionally for structures contained in Promiscuous 2.0 their targets and associated indications will be displayed.
10. What determines the size of a slice of the result pie chart from the drug repositioning?
The size of a particular pie chart slice in the drug repositioning results is determined by the number of compounds from the database on which basis the indications were suggested. Similarly, for the pie charts that contain the overview of all known targets with indications, the number of targets associated with the specific ICD-10 categories determines the size of the slices.
11. What is the basis for the machine learning indication prediction?
The machine learning is based on molecular structures. For the purpose of predicting indications for a given input structure, for each ICD-10 category with sufficient data, a machine learning model was developed. This was done by transforming the molecular structures of drugs, that were mapped to the specific indication of interest, into MACCS molecular fingerprints and using them as input for the machine learning models. Only small molecule drugs that are meaningfully representable as molecular structure files were considered in this approach.
12. Which machine learning technique was used and how accurate are the models?
In order to create the machine learning models, different alternatives were evaluated and validated. In order to assess the performance, randomly chosen portions of the data were used as training and test sets. Additionally, for each training set, a 10-fold cross validation was performed. Among the tested models (which also include logistic regression, K nearest neighbors, linear discriminant analysis, decision tree and gaussian naive bayes), the random forest classifier and support vector machines performed best on average. According to cross-validation and precision/recall/accuracy analysis on the test sets, the different models were on average 84% accurate in the prediction, with a standard deviation of 0.08. Since the difference in performance between random forests and support vector machines was minimal in the different disease categories and to retain consisteny, the random forest model was chosen for the final model calculation.
13. How is the machine learning score obtained and what does it represent?
As machine learning predictor a random forest consisting of 100 decision trees each was used. The obtained score represents the number of trees that voted in favor of a specific prediction. So, given an input structure, a score of 0.96 for a specific ICD-10 category means that 96 trees in the forest evaluated the structure to be associated with the given category. This is not to be interpreted as a 96% chance for active behavior though, due to the unpreventable inaccuracy of each overall model (see point above for further information).
14. Which drugs were used as reference for the Covid-19 model?
- Amodiaquine
- Anakinra
- Azithromycin
- Bromhexine
- Budesonide
- Camostat
- Camostat mesylate
- Captopril
- Chlorpromazine hydrochloride
- Clomipramine
- Cyclosporin A
- Danoprevir
- Dasatinib
- Dexamethasone
- Dexmedetomidine
- Disulfiram
- Doxycycline
- Enalapril
- Epoprostenol
- Estradiol
- Famotidine
- Favipiravir (Avigan)
- Fluoxetine
- Fluvoxamine
- Gemcitabine hydrochloride
- Hydrochloroquine
- Imatinib
- Imatinib mesylate
- Ivermectin
- Lenalidomide
- Loperamide
- Lopinavir
- Losartan
- Mefloquine
- Metformin
- Methylprednisolone
- Nitazoxanide
- Nitric Oxide
- Oseltamivir (Tamiflu)
- Promethazine hydrochloride
- Remdesivir
- Ritonavir
- Ruxolitinib
- Sirolimus
- Spironolactone
- Tamoxifen
- Teicoplanin
- Terconazole
- Tetrandrine
- Toremifene
- Umifenovir (Arbidol)
- Valsartan
- Vitamin C (Ascorbic Acid)