Abstract
Adverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health care. Understanding which drug targets are linked to ADRs can lead to the development of safer medicines. Here, we analyze in vitro secondary pharmacology of common (off) targets for 2134 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event Reports and developed random forest models that predict ADR occurrences from in vitro pharmacological profiles. By evaluating Gini importance scores of model features, we identify 221 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected by chance. Among these are established relations, such as the association of in vitro hERG binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence on bile acid metabolism supports our identification of associations between the Bile Salt Export Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Unexpectedly, our model suggests PDE3 is associated with 40 ADRs. These associations provide a comprehensive resource to support drug development and human biology studies.
Toxicity is one of the major causes of termination, withdrawal, or labeling of a drug candidate or drug, other than lack of efficacy 1–3. There is an urgent need to better identify toxic on- and off-target effects on vital organ systems especially for cardiovascular, renal, hepatic and central nervous system (CNS)-related toxicities; furthermore, there is a desire to reduce cost and labor in preclinical assays and drug testing on non-human species 4–6. In vitro pharmacological assays have been widely used to screen for possible off-targets and potential adverse effects and eliminate compounds that are not safe enough in the drug development stage as early as possible 5,7. However, systematic prediction of compound safety and potential adverse events associated with a compound is still a challenge for the pharmaceutical industry.
Machine learning can be very insightful for many different stages of drug discovery and development, such as automation in pharmacology assays, clinical trials, and basic science research. Previous studies have focused on predicting structure-function relationships based on chemical structure of small molecules and potency assays that probe the physicochemical properties of compounds to estimate associations with off-targets 8. However, the diversity of structures that interact with targets, even when they are well described like human Ether-a-go-go-related gene (hERG), make it challenging to produce reliable models 9. Several papers provide small, hand-curated databases providing up to 70 pharmacological targets (i.e. receptors, ion channels, transporters, etc.) with established links to adverse side effects based on a scientific literature search 5,7,10,11. Mirams et al. recently described how integration of data from multiple ion channels (e.g. hERG, sodium, L-type calcium) provided improved in silico prediction of torsadogenic risk 12. Chen et al. proposed a machine learning approach to predict adverse drug reaction (ADR) outcomes for given patient characteristics and drug usage 13. Another study highlights the importance of predicting the likelihood of clinical trial side effects using human genetic studies of drug-targeted proteins 14. From a pharmacogenomics perspective, predicting drug-target interactions using pharmacological similarities of drugs and the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS 15) can be beneficial for drug repositioning and repurposing 16.
FAERS is a voluntary, post-marketing pharmacovigilance tool that can be used to monitor the clinical performance of drugs. In this study, we explore an alternative use of FAERS data to predict compound safety using Medical Dictionary for Regulatory Activities (MedDRA® 17) terms, which we envision to be useful for future preclinical studies. Our machine learning approach is different from the aforementioned approaches because we not only predict adverse drug reaction occurrences of drugs but most importantly also extract biologically meaningful target-ADR links. Using an in vitro secondary pharmacology dataset of more than 2,000 marketed or withdrawn drugs (see Methods), we built a random forest model to predict drug-ADR and target-ADR associations. We validate drug-ADR predictions through systematic Side Effect Resource (SIDER) drug label analysis and 221 target-ADR predictions through systematic literature co-occurrence analysis. Furthermore, we find canonical target-ADR associations, such as hERG binding causing cardiac arrhythmias. We also encountered unexpected associations which warrant further investigations, such as a link between Phosphodiesterase 3 (PDE3) and several ADRs, including congenital renal and urinary tract disorders. We conclude our study with potential targets that are associated with cardiovascular and renal ADRs to demonstrate the utility and possible impact of this method in drug development and preclinical safety sciences by enabling prediction of ADRs from in vitro pharmacological profiles.
Results
Systematic in vitro pharmacology of marketed and withdrawn drugs
To link gene targets to ADR occurrence, we utilized in vitro pharmacology assay data for 2134 marketed or withdrawn drugs, generated by Novartis, and ADR reports from FAERS (Figure 1A, Supplementary Table 1). Withdrawn drugs and their assay data are also included due to the fact that they are associated with a plethora of ADRs, and thereby constitute an important resource for our predictive approach. Figure 1B summarizes the top 50% of frequently occurring primary indications, classified by the Anatomical Therapeutic Chemical (ATC) codes, of the 2134 drugs using a word cloud. The categories that have the highest number of compounds are antibacterial, ophthalmological, and antineoplastic drugs. The in vitro pharmacology assay data includes AC50 values for each drug at up to 218 different assays for 184 gene targets (see Supplementary Table 2 for a list of target assays). There are 6 classes of these 184 gene targets, with the majority (47%) of targets falling into G protein-coupled receptors (GPCRs) (Figure 1C), which is a dominant, widely studied drug target family, broadly represented by marketed drugs 18. Figure 1D is a heatmap visualization of the in vitro pharmacology assay data, where each row is a drug, grouped by their ATC anatomical main group terms 19; each column is a target assay, grouped by target class; and each value is the AC50 of drugs for target assays. Even though 70% of drug-assay combinations have not been tested, i.e. these combinations have NA value for AC50, our data indicate relatively uniform assaying with respect to the different drug classes.
Analysis of adverse event reports from FAERS connects drugs with human ADRs
We queried FAERS 15 using openFDA 20 for 2134 marketed or withdrawn drugs in October 2018 (FAERS Q4_2018 version; covering all reports from January 2004 to October 2018) and retrieved 671,143 adverse event reports using our data extraction criteria (Figure 2A). We only included reports which were submitted by physicians and were annotated as the primary suspect drug 21. There are 464 drugs that did not have a matching name in FAERS, 341 drugs that did not have any adverse event reports, and 1329 drugs with at least 1 adverse event report. We developed a significance test based on a binomial null distribution and false discovery rate (FDR) multiple testing correction to determine if the observed ADR occurrence was significantly high to be classified as an association (or alternatively no association) between ADR and drug (see Methods for detail). The resulting drug-ADR associations corresponded strongly (odds ratio = 11, χ2-test, p-value < 10−16) with those identified with ERAM (Empirical-Bayes Regression-adjusted Arithmetic Mean), an established Bayesian method based on the proportional reporting ratio adjusted for covariates (age group, sex and reporting year) and concomitant drugs 22–25. Overall, we observe a positive trend between the number of adverse event reports and the number of ADR associations (Figure 2B). Antineoplastic and immunomodulatory drugs (Figure 2B, blue, N=155) have many ADR associations while the extent of ADR association for antihypertensive drugs (Figure 2B, red, N=35) varies more widely. As an example, we visualized our drug-ADR associations (Figure 2C), in which ADRs are grouped by MedDRA System Organ Class (SOC) level terms and drugs are grouped by ATC anatomical main group terms 19, revealing that ADRs are widespread across organs caused by antineoplastic and immunomodulating agents (Figure 2C, label L), as well as nervous system drugs (Figure 2C, label N).
Random forest model learns relationship between in vitro pharmacology and reported ADRs in humans
We deployed a machine learning approach to predict ADRs for a given drug from their in vitro secondary pharmacology profiles (Figure 3A). We consider this a multi-label classification problem because a given drug can cause multiple ADRs based on its possible engagement with multiple targets and because a single target may be associated with multiple ADRs. We discretized and one-hot encoded our “input” in vitro pharmacology assay data (AC50 values) into 3 classes: highly active (AC50 < 3 μM), active (3 μM ≤ AC50 ≤ 30 μM) and inactive (AC50 > 30 μM), which reflect commonly used ranges in the field 4. In total, 413 features (assay information) were used to predict 321 High Level Group Term (HLGT) ADRs or 26 System Organ Class (SOC) ADRs for each drug. The observed drug-ADR associations from FAERS, as described above, constitute the dependent variable that the model is learning. We constructed a unifying binary relevance random forest model, which consists of 321 random forest HLGT ADR models. The models were first trained and tested, using 5-fold cross validation where each fold is selected sequentially (Figure 3B). We used 1329 drugs for model construction because these drugs had at least 1 adverse event report in FAERS Q4_2018. The remaining 805 drugs, which did not have any ADR reports, were excluded from training and cross-validation. We confirmed that the distribution of drug classes in our training set (1329 drugs) is comparable to the distribution of drug classes in each 5-fold cross validation split (1063 drugs for training and 226 drugs for testing; Supplementary Figure 1A, χ2-test: p-value > 0.99). The model predictions are in probability format, which is used later for target-ADR predictions, and in boolean format (Figure 3A), to enable assessment of model performance via accuracy; macro-precision; macro-recall; Matthew’s correlation coefficient (MCC), a performance measure that takes class imbalance into account; and the area under the receiver operating characteristic curve (macro-AUROC) (Figure 3B). The unifying random forest model performance of SOC ADRs and HLGT ADRs using the full training set (1329 drugs) and the 5-fold cross validation sets (266 drugs, averaged) are depicted in Figure 3B. Accuracy ranges from 0.82 to 0.98, macro-precision ranges from 0.5 to 0.85, macro-recall ranges from 0.29 to 0.74, MCC ranges from 0.37 to 0.83, and macro-AUROC ranges from 0.80 to 0.96. Compared to SOC level (21 ADR terms), the finer grain HLGT level (321 ADR terms) had proportionally fewer drug-ADR associations; additionally, the performance of the HLGT and SOC models are comparable. We therefore proceeded with the HLGT level models for further investigation.
For 55 of the 321 HLGT ADRs, the corresponding random forest models simply predicted zero for all drugs as mostly none (and at most 4) of the 1329 drugs with adverse event reports were associated with those ADRs (Supplementary Table 3). Since these models were not predictive, we did not consider them for further analyses. For the remaining 266 ADRs, we could determine performance metrics (Figure 3C). Accuracy and precision were high, ranging between 0.9 and 1, whilst the recall and MCC range more widely (Figure 3C). This variability occurs for ADRs that have only a few drugs associated with them (Figure 3D). As the number of associated drugs increases, the models learn to better distinguish true positives from false negatives, subsequently leading to an increase in recall and MCC values (Figure 3D).
Predictive power of the random forest model for multiple FAERS reporting time periods
To test if our random forest model framework is sensitively dependent on the FAERS reporting period, we constructed new random forest models and performed 5-fold cross validations for both SOC and HLGT levels using FAERS data from 2 different time points: Q4_2014 (including all reports from January 2004 to December 2014) and Q2_2019 (including all reports from January 2004 to June 2019). For proper comparison, the model constructions and cross validations were identical to our above described “main” model based on FAERS Q4_2018. Overall, the performance metrics (accuracy, MCC, macro-precision, macro-recall, macro-AUROC) of both SOC and HLGT level models are comparable between Q4_2014, Q4_2018 and Q2_2019 (Supplementary Table 4). This analysis demonstrates that our random forest modeling framework has a comparable predictive power despite changes in the FAERS reporting time period; therefore, it is not sensitive to different versions of FAERS.
Chronological validation of predicted drug-ADR associations
To validate the predictive power of our random forest modeling framework further, we performed a chronological validation analysis, through identification of initial false predictions (false positives and false negatives) from the random forest model trained on FAERS Q4_2014 which was then validated using a dataset from the subsequent time period, 2015-2019. The random forest model trained on Q4_2014 data has 421 (0.1% of a total of N=433,671 model predictions) false positive drug - ADR associations, i.e. based on a drug’s pharmacology profile, the model predicted a probability > 0.5 (Figure 3A) for an ADR even though there was no association observed from the adverse event reports up until 2014. However, when compared to the observed Q2_2019 FAERS data, which also include adverse event reports from the time period 2014-2019, 3.1% (13) of the false positives turned into observed drug-ADR associations (true positive), which is 4.4-fold more than expected by chance (χ2-test: p-value = 2×10−5). Similarly, the Q4_2014 random forest model made 8519 false negative predictions, of which 2.2% (184), 40-fold more than expected by chance (χ2-test: p-value < 10−16), turned into true negative predictions when compared to the Q2_2019 observed drug-ADR associations. This analysis indicates that significant proportions of our model predictions on drug-ADR associations that were initially “false predictions” became “true predictions” through accumulation of new adverse events reports over time.
Random forest model predicts expected ADR profiles for anti-hypertensive drugs
As another demonstration of model validation, we analyzed the ADR profiles of 6 subclasses of antihypertensive drugs: adrenergic alpha, adrenergic beta, ACE inhibitors, angiotensin AT2 inhibitors, calcium channel blockers and diuretics (Supplementary Table 5). The signature of the anti-hypertensive drug subclass represents a set of ADRs that were common to all drugs in this subclass. Each antihypertensive drug subclass has a unique ADR fingerprint in the Q4_2018 FAERS version which was closely predicted by our random forest model (Figure 3E). The accuracy ranged from 0.984 to 1, with perfect specificity and precision (Supplementary Table 6). The sensitivity ranged from 0.882 to 1, except for the diuretics sub-class, which had a sensitivity of 0.167. This may be because diuretics target the kidney, and not the cardiovascular system as the rest of the anti-hypertensive drugs do. Of note, the adrenergic alpha and adrenergic beta receptor subclasses maintain distinct profiles in the predicted data. Specifically, the model correctly predicts that adrenergic alpha receptor drugs are associated with suicidal and self injurious behaviors, which has been reported in the literature 26,27.
Random forest model validation through comparison with drug label ADRs
To demonstrate the predictive power of our random forest model on a test set of drugs that were not used for model construction, we utilized the model to predict drug-ADR associations for 805 drugs that did not have any reported ADRs in the FAERS Q4_2018 version, either because there was no match with the drug name or there were no ADR reports for that drug submitted to FAERS by October 2018. For validation, we queried the Side Effect Resource (SIDER) database 28, which contains drug-ADR pairs extracted from FDA drug labels by text mining 28. For these 805 drugs, we obtained 95 drug matches, which were further reduced to 75 drugs that did not share active ingredients with drugs in the training set. Overall, 57% of positive drug-ADR pairs (i.e. drugs where the model predicts ADRs) were reported in SIDER, compared to 9% of negative pairs (N = 24075; χ2-test: p-value < 10−16; Supplementary Table 7). For instance, methysergide, a 5-HT receptor antagonist used to treat migraine and cluster headaches, has predicted ADRs from 6 HLGT categories, all of which are supported by specific ADRs from SIDER (Figure 3F). “Cardiovascular disorders with murmurs” appears in the Warnings and Precautions section of the label. Other adverse events under gastrointestinal symptoms and CNS symptoms from SIDER were confirmed in the Adverse Events section. Oxprenolol, a lipophilic beta blocker used for treating angina pectoris, abnormal heart rhythms and high blood pressure, has predicted ADRs from 3 HLGT categories. The specific SIDER ADRs of bradycardia, dizziness and asthenia were also confirmed in the label from the Electronic Medicines Compendium (https://www.medicines.org.uk/emc/product/3235; accessed 09/11/2019). Overall, our random forest model proves to be a powerful tool to predict both on- and off-target related drug-ADR associations from in vitro pharmacological drug profiles.
Random forest model predicts 221 target-ADR associations
To predict which target genes are associated with which ADRs, we utilized the Gini importance score to rank features for their importance in random forest models for each ADR (Figure 4A). For a given ADR, we selected assays that had multiple AC50 features represented in the top 5% of Gini scores ranking (see Methods for detail). We then generated ADR probability predictions for an in silico compound that is assumed to target only the selected assay with an AC50 value corresponding to a represented feature. We also assumed no available data for all other assays. Using this in silico AC50 profile as an input to the ADR model, we could generate the ADR probability. By assessing differences in ADR probabilities (two sample t-test, FDR corrected p-value < 0.1) between different AC50 classes, e.g. highly active (0-3 μM) vs inactive (>30 μM), we predict positive or negative correlations, collectively termed associations, between the selected target assay and ADR. Unsurprisingly, some ADRs did not generate any target associations.
To find biologically meaningful associations, we first filtered out HLGT terms belonging to SOC classes that are not specific to human body parts or only procedural or intervention related (see Methods for detail). Secondly, we filtered out terms that fall under the SOC class neoplasms, since genes are often severely misregulated in cancers and therefore not representative of neoplasm-related ADRs in the organ where the tumor resides. After filtering, we found 221 statistically significant target-ADR associations (Figure 4B, full details including p-values in Supplementary Table 8); 51 out of 184 target assays and 132 out of 321 ADRs are represented (Figure 4B). The assay class distribution of these 51 targets, represented among the 221 predicted target-ADR associations, is similar to the class distribution of all target assays (Supplementary Figure 1B, χ2-test: p-value = 0.09). This demonstrates that our algorithm does not bias towards certain target classes. In the following sections, we investigate these 221 target-ADR associations in more detail.
Systematic literature validation of target-ADR associations
To validate our ADR-target predictions, we performed a systematic literature co-occurrence analysis. First, we mapped all genes corresponding to the assays and HLGT level ADRs to their respective MeSH terms (Supplementary Table 9). Next, we queried PubMed for the publication identifiers linked to these MeSH terms and determined the number of publications that corresponded to both a gene and HLGT term (i.e. co-occurrence). We found at least one co-occurrence publication for 66% (145) of 219 predicted unique gene-HLGT MeSH pairs, which was higher (Fisher Exact test: odds ratio=1.8, p-value=6×10−5) than for all possible negative unique gene-HLGT pairs (N=26705). In order to control for the fact that some ADRs and genes are studied more intensively than others, we also compared our set of positive predictions to a negative control set (N=4890) formed by permuted pairs from the positive set and obtained similar results (Fisher Exact test: odds ratio=1.5, p-value=3×10−3). Furthermore, as quantified by the co-occurrence “lift” over the reporting probability when assuming independence, (see Methods for details), we found 4-fold higher co-occurrence median lift values for our predictions compared to all negative pairs (Mann Whitney U-test: p-value=2×10−5), and 3-fold higher lift than permuted negative pairs (Mann Whitney U-test: p-value=3×10−4). We conclude that our target-ADR identification method provides association predictions that are supported by the literature in higher proportion than random selection of target-ADR pairs.
Evidence for targets that are predicted to cause cardiovascular-related ADRs
To further validate our model’s ability to predict target-ADR associations, we investigated a group of cardiovascular ADRs. We found that hERG binding was associated with cardiac arrhythmias and heart failure (Table 1). hERG encodes for a subunit of the cardiac potassium ion channel and contributes to cardiac electrical activity, which is necessary to regulate the heartbeat. The mechanism of action for drug-induced arrhythmias by blocking hERG has been described in numerous human 29 and animal studies 30, as well as structural modeling 31 studies (Table 1). Consistently, our systematic PubMed queries found 753 co-occurrence publications in support of this predicted association and 6 co-occurrences for hERG binding increasing the risk of heart failure. We did not find an ADR probability associated within the range of 0-3 μM AC50 of hERG binding, likely because such strong binding to hERG is a common reason for deprioritizing drug candidates in development 32.
The model predictions also suggest that PDE3 inhibition is associated with cardiac valve disorders (Table 1, 3 co-occurrence publications). PDE3 inhibition is used clinically to treat dilated cardiomyopathy 33, which encapsulates valvular heart disorder. However, the PDE3 therapeutic window is narrow, partially due to complex signaling networks 34, and careful dosing is required to avoid increased mortality in response to treatment.
Furthermore, our model predicts that adenosine transporter (AdT) inhibition increases the risk of pericardial disorders (Table 1). For this scenario, we did not find direct supporting evidence in the literature, however there is evidence that disturbed adenosine homeostasis in pathological cardiac conditions could result in pericardial effusion or pericarditis 35.
The model suggests that glucocorticoid receptor (GR) binding is more likely to lead to myocardial disorders if the drug has high affinity for GR (Table 1, 8 co-occurrence publications). This is supported by the finding that glucocorticoid treatment of patients with rheumatoid arthritis increased the risk of myocardial infarction 36. Furthermore, it is known that dysregulation of glucocorticoids can give rise to cardiotoxicity 37.
Taken together, this investigation of genes associated with cardiovascular ADRs confirms the well-known association of hERG with cardiac arrhythmia, and also highlights ADR associations that would merit further experimental investigation.
COX-2, PDE3, and hERG associations with kidney related ADRs
Another important class of ADRs involve the kidney (Figure 4B, label: renal). We found COX-2 associated with nephropathies (Table 2), which has been well recognized (398 co-occurrence publications) and evidenced previously 38–40. Interestingly, another model prediction is PDE3 sensitivity correlating with congenital renal and urinary tract disorders (Table 2). According to a mouse model study 41, PDE3 inhibition could be a contributing factor in Polycystic Kidney Disease (PKD), as PDE3 protein levels are downregulated in PKD compared to healthy control kidneys. Lastly, we found an unexpected association between hERG and renal disorders (excluding nephropathy) (Table 2). One study has found a loss of hERG function in renal cell carcinoma 42. In humans, hERG expression in the kidney is much lower than in the heart 43. Therefore, we conclude that a link between hERG and renal disorders remains a prediction that warrants further investigation.
PDE3 and nuclear hormone receptors AR, ERa, and PR are overrepresented in ADR associations
To investigate if the number of different drugs tested for a target assay is predictive to the number of ADRs associated with that target (Figure 4C), we calculated their Spearman correlation coefficient and found a moderate correlation (ρ=0.5; Figure 4C). However, some targets had considerably more associated ADRs than other targets that were tested a similar number of times, indicating that more frequently performed assays do not necessarily result in a higher number of associated ADRs (Figure 4C). Out of all target assays, PDE3 was associated with the most ADRs (Figure 4C), falling in a wide range of SOC classes (Figure 4B, Supplementary Table 8). Furthermore, nuclear hormone receptors for androgen (AR), estrogen (ERa) and progesterone (PR) binding assays also have disproportionately many ADR associations, compared to their frequency of testing (Figure 4C). As expected, AR (7/14 ADRs), ERa (9/10 ADRs) but not PR (0/17 ADRs) are associated with sexual reproductive organ- and pregnancy-related ADRs (Figure 4B, Supplementary Table 8). Androgen is produced in the adrenal gland 44 and we predict a link between AR with adrenal gland disorders, with evidence in mouse studies 45. Interestingly, the model predicted 6 ocular ADRs associated to PR, including vision disorders, anterior eye structural change (deposit and degeneration), infections, irritations and inflammations and structural changes (Figure 4B, Supplementary Table 8), for which we could find supporting evidence 46.
GABAA receptor associations with psychoactive ADRs
GABAA receptor is the primary target of benzodiazepines (BZD), a drug class known to be psychoactive with potential of addiction 47. Consistently, our model predicts that this ligand-gated chloride ion channel assay is associated with 14 ADRs, 13 of which are neurologically and psychiatrically related, including disturbances in thinking and perception, sleep disorders, depression and suicidal behaviors (Figure 4B, Supplementary Table 8).
Bile salt export pump BSEP associations with ADRs in various organs
BSEP, encoded by ABCB11 and a member of the superfamily of ATP-binding cassette (ABC) transporters, is most highly expressed in the liver 43. Drugs that target BSEP are often associated with hepatotoxicity 48. However, initially, we did not find a BSEP association with hepatic and hepatobiliary disorders. To investigate this false negative prediction, we noted that the dynamic range of the BSEP assay specifically extends up to 300 μM because the first pass effect for orally delivered drugs results in high concentrations in the liver 49; as a result, most of our data falls into the ‘inactive’ (>30 uM class). Consistently, the BSEP inactive feature had the highest Gini score for this HLGT term, while its two active features had much lower Gini scores, falling outside of the top 5%. To take the extended dynamic range into account, we altered the BSEP assay class boundaries to 0-30 μM, 30-300 μM and >300 μM and retrained the random forest model. In this case, we did find BSEP associated with hepatic and hepatobiliary disorders (Table 3, 354 publication co-occurrences), according to our association criteria (Figure 4A). We repeated this procedure whilst replacing the first class boundary (30 μM) with 100 μM and found the same association again, indicating the robustness of our results. Interestingly, with our original AC50 discretizations (Figure 1D), we found BSEP associated with 7 other ADRs from various organ classes (Table 3), much more than other targets that were assayed at a similar frequency (Figure 4C). This suggests that compounds potent against BSEP (AC50 < 30 μM) could cause adverse effects in addition to hepatotoxicity, which already occurs at lower potency. We found BSEP associated with urolithiasis and with disorders of the thyroid gland, upper respiratory tract disorders (excl infections), lipid metabolism and central nervous system (Table 3). Since BSEP expression is much lower in these organs 43, we searched the literature for evidence including a substrate of BSEP, bile acid. We could find previous studies linking bile acid to these disorders (Table 3), which suggests an indirect relation between BSEP and these ADRs through bile acid metabolism. Lastly, we found BSEP associated with foetal complications and pregnancy conditions (Table 3), both supported through prior studies that link BSEP with transient neonatal cholestasis and intrahepatic cholestasis of pregnancy, respectively 50,51.
Discussion
In this study we have taken a machine learning approach to predict human ADRs from the in vitro secondary pharmacology profiles of a large number of marketed and withdrawn drugs. Several prior studies focus on predicting ADRs directly from chemical drug structure 52,53. However, utilizing functional information such as in vitro pharmacological targeting of common (off) targets represents a viable alternative to bridge the complex relationship between drugs and their effects in the human body 4.
Our random forest model performance metrics are good considering the sparse coverage (2134 drugs) over a large input space (3184 possibilities) and partial overlap with ADR reporting for these drugs, making ADR occurrence prediction effectively a one shot learning task. Importantly, our model performances were strong enough to discover drug-ADR and biologically meaningful target-ADR associations. To determine the target-ADR associations, we utilized all available input data for model training and made use of Gini scores to robustly select relevant features for ADR probability predictions. We rigorously validated our model predictions with multiple independent analyses (e.g. chronological validation on drug-ADR associations and systematic literature validation on target-ADR associations). Our novel method for target-ADR associations was able to recapitulate well recognized causal relations, such as hERG with cardiac arrhythmias. For others, we were able to find literature evidence in animal or in vitro studies but our study is, to our knowledge, a first in human report. Another fraction of target-ADR associations represents predictions of novel, unexpected or little known associations, such as Adenosine Transporter (AdT) and pericardial disorders, for which we could find little evidence other than our analysis of adverse event reports. Similar to genome-wide association studies, our quantitative methodology extracts statistically significant relations from human population data. With this framework in mind, our 221 associations form a rich resource that can be used for further mechanistic studies in the drug discovery process.
Our random forest model is agnostic to molecular mechanisms; therefore, resulting associations could arise from indirect regulation. A likely example is the bile transporter BSEP, which is associated with numerous ADRs, although it is most highly expressed in the liver and kidney. We have related our findings to evidence that misregulation of its substrate, bile acid, could result in disorders related to kidney stones, lipid metabolism, thyroid gland, respiratory system, and central nervous system. This also indicates the strength of our approach, which can relate genes to physiological processes unbiasedly in humans, without any interventions or large scale genome-wide association studies, but solely with voluntary adverse event reporting.
Some of the predicted target-ADR associations could be hard to validate, such as the PDE3 enzyme association with congenital renal disorders association. While the association is valid, the modality has to be clarified: PDE3 inhibitors are proposed to ameliorate certain forms of chronic kidney disease 56, instead of causing it. Thus, predictions of congenital disorders should be considered but confirmed by checking the modality of the effects.
While we recommend this approach to find target-ADR associations to impact safety awareness in drug discovery, we are also aware of the limitations. Firstly, targets in the in vitro pharmacology panel cover a fraction of the biological target space and not all drugs were tested in all assays. We recognize that 47% of all targets belong to the GPCR target family with limited representation of other therapeutic or ADR-associated targets such as ion channels and kinases. However, our model predictions are not biased towards the GPCR target family; the target classes of 51 targets in 221 target-ADR association predictions have a similar distribution compared to all target classes in our input data (Supplementary Figure 1B). Also, data are influenced by prior knowledge; for example, more than 87% of all drugs in the set were tested for hERG activity. High affinity (lower AC50 value) for hERG is associated with higher probability for QT prolongation for human and non-human preclinical species 29,30. As discussed earlier, there are not many drugs with a hERG AC50 value in the highly active class (0-3 μM), which is a commonly encountered roadblock for drug candidates to progress towards clinical trials 32. Only about 10% of all drugs fall into the highly active class in our assay data. To limit feature engineering, our AC50 discretization into three classes (Figure 1D) was kept uniform across all assays. Notably for the BSEP assay only, the dynamic range extends up to 300 μM and as a result most of our data falls into the ‘inactive’ (>30 μM) class. Consequently, we initially did not find the expected association with hepatotoxicity. We rectified this by reclassifying the BSEP assay data according to levels required for hepatotoxicity of BSEP inhibition 54,55 and indeed recovered the expected association.
Secondly, in vitro potency is an initial marker of clinical effect, and does not take into account prolonged dosing, comorbidity or pharmacokinetic/pharmacodynamic (PK/PD) relationships (e.g. therapeutic window). For 9 of 184 assays, non-human proteins were assayed (e.g. rat brain was used as a source for the benzodiazepine receptor) which may not be a direct correlate of the human protein. One way of further improvement of our approach is to include additional occupancy parameters and PK/PD components for higher precision and enhanced predictive value.
Lastly, in the FAERS database, drug-ADR associations may be mislabeled, e.g. anti-hypertensives are often reported as associated with hypertension as an ADR, rather than as the indication. Additionally, the FAERS database does not contain information on the total number of patients exposed to a particular drug, nor is it necessarily a reflection of the true incidence or frequency of ADRs. These and other limitations are discussed by Maciejewski et al. 21 with suggestions and methodology for further refinement of the FAERS database curation and maintenance.
We investigated one-to-one associations between targets and ADRs because these relationships are biologically meaningful and have a high utility in preclinical drug development. However, in some cases, a given ADR can be a prerequisite for others (e.g. hypotension leading to reflex tachycardia). We leave a model extension to incorporate these dependencies as future work. For target-ADR associations, we utilized our random forest model for a single drug at a time. Our model can be repurposed to predict possible ADRs from combination drug therapies and likelihood of drug-drug interactions. In principle, this can be extended for combination therapies by merging the in vitro data from the individual compounds and the predictions can be validated by querying Offside and Twosides databases 57. Similarly, our model can be utilized for drug repositioning and repurposing, using similar drug-ADR and target-ADR profiles. In conclusion, our random forest model and the target-ADR associations provide a validated, comprehensive resource to support drug development and future human biology studies.
Methods
In vitro secondary pharmacology assays for marketed drugs
AC50 values of 2134 marketed drugs (Supplementary Table 1) were measured in up to 218 different in vitro secondary pharmacology assays. Compounds were obtained from the Novartis Institutes of Biomedical Research (NIBR) compound library and tested in a panel of in vitro biochemical and cell-based assays at Eurofins and at NIBR in concentration-response (8 concentrations, half-log dilutions starting at 30 µM). Assay formats varied from radioligand binding to isolated protein to cellular assays. Example protocols may be found at https://www.eurofinsdiscoveryservices.com/cms/cms-content/services/in-vitro-assays/. Normalized concentration response curves were fitted using a four parameter logistic equation with internally developed software (Helios). The equation used is for a one site sigmoidal dose response curve Y as a function of tested concentrations X: Y(X)=A+(B-A)/(1+(X/C)D), with fitted parameters A=min(Y), B=max(Y), C=AC50 and exponent D. By default, A is fixed at 0, whereas B is not fixed.
If a drug was not tested against a specific assay, the AC50 value was set to NA (not available). AC50 values from similar assays with the same gene target were merged to reduce the NA data and features in the random forest model; this procedure resulted in 184 different target assays (Supplementary Table 2). In case any merged assays had multiple AC50 values for the same drug, we averaged these geometrically to take into account variation over orders of magnitudes. In figures 1D and 2C, the drugs are classified according to their annotated Anatomical Therapeutic Chemical (ATC) code 19. In case of multiple ATC codes, we assigned the most frequent level 1 code.
Mining adverse event reports of marketed drugs using OpenFDA
In this study, we utilized openFDA to acquire FAERS reports related to the query compounds 15,20. This Elasticsearch-based API provides a raw download access to a large volume of structured datasets, including adverse events reports from FAERS.
We used generic compound names (e.g. “Amoxicillin”) to query through the openFDA interface, accessed programmatically using Python. In order to maximize the coverage over FDA datasets, we normalized generic names to uppercase format followed by a name similarity metric to filter out unrelated records in our analysis. We included reports when the Jaro similarity between the query generic name and reported compound name was equal or greater than 0.8. To illustrate, to query “3alpha-Androstanediol”, we acquired reports including “3α-Androstanediol”, “Androstanediol”, “3-alpha-Androstanediol” as different lexical variations of the generic name and collated the resulting adverse event reports.
As the FAERS database contains information voluntarily submitted by healthcare professionals, consumers, lawyers and manufacturers, adverse event reports may be duplicated by multiple parties per event, and may be more likely to contain incorrect information if submitted by a non-medical professional. To reduce reporting bias and increase report information accuracy, we only analyzed reports submitted by physicians (data field: ‘qualification’ = 1). In this subset of adverse event reports, the data were further filtered by reported drug characterization, which indicates how the physician characterized the role of the drug in the patient’s adverse event. A drug can be characterized as a primary suspect drug, holding a primary role in the cause of the adverse event (data field: ‘drugcharacterization’ = 1); a concomitant drug (‘drugcharacterization’ = 2); or an interacting drug (‘drugcharacterization’ = 3). Here, we included only primary suspect drug reports. Without this restriction, model performances did not improve. We obtained all adverse events reports corresponding to the query compound that passed through the aforementioned filters.
Adverse event report descriptions are coded as medical terms of MedDRA terminology 17. Medical observations can be reported using 5 hierarchical levels of medical terminology, ranging from a very general System Organ Class term (e.g. gastrointestinal disorders) to a very specific Lowest Level Term (e.g. feeling queasy). Each term is linked to only one term on a higher level. For each report, we recorded all MedDRA Reaction terms (data field: “reactionmeddrapt”) at the Preferred Term level and mapped these Preferred Terms to Higher Level Group Term and System Organ Class level. For each (ADR term, drug) tuple, we then calculated the ADR occurrence, defined as the following fraction: number of adverse event reports containing that ADR term relative to the total number of ADR reports for that drug.
For different FAERS versions (Q4_2014, Q4_2018 and Q2_2019), we used the same query except the time parameter TO, which was set to 12/30/2014 for the Q4_2014 query. For other two queries, we didn’t set the limit parameter which was filled with the query time by default (query date was 10/10/2018 for Q4_2018 and 08/12/2019 for Q2_2019).
Random forest models
To construct and train our models (Figure 3A), we used AC50 values for a panel of target assays for marketed drugs (model input; independent variable) and ADR occurrences of the compounds (model output/predictions; dependent variable). Since there may be several ADRs associated with any given drug, we considered this a multi-label learning problem. We took a “first-order strategy”, i.e. we assume there is no correlation between different ADRs, and a “divide and conquer” strategy, i.e. we decompose our multi-label learning task into n independent binary classification problems, where n is the number of different ADR terms in our output data (n = 26 for SOC and n = 321 for HLGT level respectively). We built a random forest58 binary classifier for each ADR using Binary Relevance with the random forest modeling option in mldr package 59 and utiml package in R 60.
To define the features for the random forest models, we discretized and one-hot encoded our input AC50 values. Discretization was essential to limit the number of features and enhance the predictive power of the model. We defined 3 classes (levels) of AC50 ranges for each target assay.
Highly active class: AC50 in [0, 3 μM)
Active class: AC50 in [3 μM, 30 μM]
Inactive class: AC50 greater than 30 μM
If the AC50 value is NA, the values for all Classes are 0. Each drug has AC50 values for 184 (merged) assays, so there are 184×3 = 552 binary features to represent our input data. Features consisting of only 0 values were removed, resulting in 413 input features used for model construction.
The observed ADR occurrences were discretized into binary dependent variables. To achieve this, first let Nd be the total number of ADR reports for a given drug. The probability to observe an ADR occurrence OADR = X / Nd at random is equivalent to choosing that ADR X times out of Nd with X distributed binomially: X∼bin(Nd, p=1/n). Here, n represents the total number of ADRs as defined above. Under this null distribution, we calculate the p-values for all observed ADR occurrences OADR for a given drug, and then perform a Benjamini-Hochberg False Discovery Rate (FDR) correction (using the Python statsmodels package). If an FDR-corrected p-value is < 0.01, then the ADR value for that drug is 1, reflecting an association; 0 otherwise.
All random forest models were first trained using 5-fold cross validation and each fold is selected sequentially. 1063 drugs were used for training and 266 drugs were used for testing in each fold, and the distribution of drug classes (ATC) in our training set (1329 drugs) is preserved in 5-fold cross validation splits (Supplementary Figure 1A), i.e. the drug classes are represented in 5-fold cross validation splits the way they are represented in our training set. Then, the drugs with at least 1 ADR report are used as a training set. For a given (drug) input of AC50 values and ADR, the random forest model output, termed ADR probability, can be interpreted as the probability that the ADR is associated with the drug. To enable direct comparison of model predictions with binarized ADR occurrences, we binarized these ADR probabilities with a simple threshold value of 0.5. These binary values were used for training, cross validation and to calculate classification performance metrics (Figure 3B,C). All models have been constructed the same way regardless of different FAERS versions.
We evaluated our models based on five metrics: accuracy, Matthew’s correlation coefficient (MCC), macro-precision, macro-recall and area under the receiver operating characteristic curve (macro-AUROC). These metrics are calculated using their definitions below, except 2 metrics: (1) MCC, which is calculated using mltools package in R (https://github.com/ben519/mltools) and (2) AUROC, which is calculated using precrec package in R 61.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
MCC = (TP * TN − FP * FN) / SQRT ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))
where TPR (true positive rate = TP / (TP + FN)) and FPR (false positive rate = FP / (F P + TN)). The corresponding metrics for each ADR model (Figure 3C, 3D) are accuracy, precision, recall, and MCC, which is calculated using mltools package in R (https://github.com/ben519/mltools).
Determination of target-ADR associations
To find associations between gene target assays and ADRs (Figure 4), we first generated ADR probabilities specific to a given assay. As a model input, one out of its three random forest input features’ value was set to 1 and all others to 0. This simulates the scenario of an in silico compound that is potent with an AC50 value in the range corresponding to the positive feature only. We then utilized the ADR’s random forest model, pre-trained on all available marketed drug data (see previous section), to calculate the resulting ADR probability. We repeated this procedure for each feature of all assays and each ADR.
To select the predictive features for a given ADR, we ordered the pre-trained random forest model’s input features according to their Gini importance score 62 and denote the top 5% as significant features. Our criteria for a gene (target assay) - ADR pair were:
For a given ADR: at least 2 out of 3 assay features need to be significant in order to make a reliable comparison between the ADR probabilities with respect to AC50 values.
At least one of the ADR probabilities of the significant features has to be larger than zero.
We filtered out target-ADR pairs if the ADR term maps to the following SOC classes, which are not specific to body parts or underlying human biology:
general disorders and administration site conditions
injury, poisoning and procedural complications
investigations
neoplasms benign, malignant and unspecified (incl cysts and polyps)
poisoning and procedural complications
social circumstances
surgical and medical procedures
To ensure the reproducibility of the target-ADR pair selection procedure, we repeated the random forest model training with different seeds for a total of 5 times. We then took the union of the 5 sets of target-ADR pairs and discarded pairs that were only found once out of 5 runs. Finally, to determine if the mean ADR probabilities between the selected AC50 classes were statistically significantly different, we performed a two-sample t-test with sample sizes equal to the number of times a class was selected (ranging from 2 to 5 times) using the Python scikit.stats package. In case all three AC50 classes were represented, we tested the highly active versus inactive class. We then performed a Benjamini-Hochberg FDR correction. If the FDR-corrected p-value is < 0.1, then the target-ADR pair is considered a statistically significant association (Figure 4B, Supplementary Table 8).
To evaluate the relation between the HLGT level ADR term hepatic and hepatobiliary disorders and target assay BSEP, we also trained and analyzed two random forest models as described above to find target-ADR pairs but with only the BSEP assay data discretized with class boundaries [0, 30 μM), [30, 300 μM] and >300 μM or [0, 100 μM), [100, 300 μM] and >300 μM.
Side Effect Resource (SIDER) analysis
The Side Effect Resource (SIDER; version 4.1) was downloaded (http://sideeffects.embl.de/download/; accessed on 09/16/2019). The file meddra_all_se.tsv.gz contains drug-ADR pairs extracted from drug labels using text mining 28. The supplied MedDRA preferred term (PT) was mapped to HLGT used for the random forest modeling. The file drug_atc.txt provides mappings from drug names as used in SIDER to Anatomical Therapeutic Chemical (ATC) codes. ATC codes for the 805 drugs in the test set were obtained from the NIBR compound database, and matched to ATC codes from SIDER. For drugs that could not be matched via ATC codes, additional matches were obtained by mapping the compound name, first trying the name in its entirety (e.g. “butriptyline hydrochloride”, then on the first word in the drug name (e.g. “butriptyline”). All matches, whether obtained on ATC codes or by drug name, were reviewed manually for accuracy.
Systematic validation of predicted target-ADR association using PubMed database
We built a query based on 254 unique HLGT level ADR terms and 106 unique target genes (corresponding to the assays), for which we could find a corresponding MeSH term (Supplementary Table 9), to retrieve linked publication identifiers (PMIDs) from the PubMed database. All PMIDs were acquired by submitting a query for every MeSH entity separately via the PubMed API engine, a search engine that provides access to the MEDLINE database of references and abstracts on life sciences and biomedical articles. Next, we determined the PMIDs for a gene-ADR pair as the intersection of the two PMID sets of each corresponding MeSH term query. Furthermore, for each possible gene-ADR pair we determined whether it was part of the 221 predicted associations from the Random Forest model or not. In this way, we obtained 219 unique positive gene-ADR pairs and a total 26705 unique negative pairs. Lastly, we generated a set of negative pairs corresponding to all permutation pairs from the 39 unique genes and 131 unique ADRs that are part of the positive set, resulting in 4890 unique negative pairs in this negative control set. To assess any statistical overrepresentation, we calculated the number of pairs with at least one co-occurrence publication for both negative and positive sets and assessed significance with a Fisher Exact test (Python function scipy.stats.fisher_exact). Furthermore, we calculated the co-occurrence “lift” over the reporting probability when assuming independence, defined as , With Npubmed = 29138919 the total number of PMIDs in the Pubmed database in 2019 (https://www.nlm.nih.gov/bsd/licensee/2019_stats/2019_LO.html). N(A, T), N(A), and N(T) are respectively the number of retrieved PMIDs for a unique gene-ADR pair, ADR, or gene target separately. To assess the location differences of the above described positive versus negative distribution of lift values, we performed a Mann Whitney U test (Python function scipy.stats.mannwhitneyu, two-sided, continuity correction=True).
Data availability
Data retrieved from public repositories is made available in Supplementary Tables 1-9 and on GitHub (https://github.com/samanfrm/ADRtarget).
Code availability
Code to query FAERS and PubMed, to construct the random forest models and identify the target-ADR associations is made available on GitHub (https://github.com/samanfrm/ADRtarget).
Figure Legends
Author contributions
R.I., S.A., A.X.C., S.F., B.K., D.A. and L.U. conceived the study. D.A., A.F. and L.U. provided the Novartis in vitro pharmacology data, advice and mentorship. S.A., R.I., A.X.C., S.F., B.K., W.D.M. and J.S. performed data analysis. S.A. developed the random forest modeling. R.I. developed the formalism for target-ADR association inference. S.F., R.I. and A.X.C. developed the query of OpenFDA. J.S. performed the SIDER analysis. S.F., J.S. and R.I. performed the systematic PubMed query. S.A., R.I., and A.X.C. wrote the paper and designed the figures with input from all the authors.
Conflict of interest
Authors declare no conflict of interest.
Acknowledgements
We are grateful to Mirjam Trame and Andy Stein for giving us the opportunity to participate in the 2018 Novartis Quantitative Sciences Academia-to-Industry Hackathon organized at the Novartis Institutes for Biomedical Research. We also thank Changchang Liu and Xinrui (Sandy) Zou for their contributions to the project at the Hackathon.
Footnotes
Supplemental Figure 1 added. Supplemental data sets provided. Link to Github code repository provided.