Probing the origins of human acetylcholinesterase inhibition via QSAR modeling and molecular docking

Saw Simeon; Nuttapat Anuwongcharoen; Watshara Shoombuatong; Aijaz Ahmad Malik; Virapong Prachayasittikul; Jarl E.S. Wikberg; Chanin Nantasenamat

doi:10.7717/peerj.2322

Probing the origins of human acetylcholinesterase inhibition via QSAR modeling and molecular docking

Saw Simeon¹, Nuttapat Anuwongcharoen¹, Watshara Shoombuatong¹, Aijaz Ahmad Malik¹, Virapong Prachayasittikul², Jarl E.S. Wikberg³, Chanin Nantasenamat ¹

1Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand

2Department of Clinical Microbiology and Applied Technology, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand

3Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden

DOI: 10.7717/peerj.2322

Published: 2016-08-09
Accepted: 2016-07-13
Received: 2016-05-09

Academic Editor: Walter de Azevedo Jr

Subject Areas: Bioinformatics, Computational Biology, Drugs and Devices, Neuroscience, Pharmacology
Keywords: Acetylcholinesterase, Acetylcholinesterase inhibitor, Alzheimer’s disease, Dementia, Neurodegenerative disease, Quantitative structure-activity relationship, QSAR, Data mining

Copyright: © 2016 Simeon et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Simeon S, Anuwongcharoen N, Shoombuatong W, Malik AA, Prachayasittikul V, Wikberg JES, Nantasenamat C. 2016. Probing the origins of human acetylcholinesterase inhibition via QSAR modeling and molecular docking. PeerJ 4:e2322 https://doi.org/10.7717/peerj.2322

The authors have chosen to make the review history of this article public.

Abstract

Alzheimer’s disease (AD) is a chronic neurodegenerative disease which leads to the gradual loss of neuronal cells. Several hypotheses for AD exists (e.g., cholinergic, amyloid, tau hypotheses, etc.). As per the cholinergic hypothesis, the deficiency of choline is responsible for AD; therefore, the inhibition of AChE is a lucrative therapeutic strategy for the treatment of AD. Acetylcholinesterase (AChE) is an enzyme that catalyzes the breakdown of the neurotransmitter acetylcholine that is essential for cognition and memory. A large non-redundant data set of 2,570 compounds with reported IC₅₀ values against AChE was obtained from ChEMBL and employed in quantitative structure-activity relationship (QSAR) study so as to gain insights on their origin of bioactivity. AChE inhibitors were described by a set of 12 fingerprint descriptors and predictive models were constructed from 100 different data splits using random forest. Generated models afforded R², $Q_{CV}^{2}$ and $Q_{Ext}^{2}$ values in ranges of 0.66–0.93, 0.55–0.79 and 0.56–0.81 for the training set, 10-fold cross-validated set and external set, respectively. The best model built using the substructure count was selected according to the OECD guidelines and it afforded R², $Q_{CV}^{2}$ and $Q_{Ext}^{2}$ values of 0.92 ± 0.01, 0.78 ± 0.06 and 0.78 ± 0.05, respectively. Furthermore, Y-scrambling was applied to evaluate the possibility of chance correlation of the predictive model. Subsequently, a thorough analysis of the substructure fingerprint count was conducted to provide informative insights on the inhibitory activity of AChE inhibitors. Moreover, Kennard–Stone sampling of the actives were applied to select 30 diverse compounds for further molecular docking studies in order to gain structural insights on the origin of AChE inhibition. Site-moiety mapping of compounds from the diversity set revealed three binding anchors encompassing both hydrogen bonding and van der Waals interaction. Molecular docking revealed that compounds 13, 5 and 28 exhibited the lowest binding energies of −12.2, −12.0 and −12.0 kcal/mol, respectively, against human AChE, which is modulated by hydrogen bonding, π–π stacking and hydrophobic interaction inside the binding pocket. These information may be used as guidelines for the design of novel and robust AChE inhibitors.

Introduction

Neurodegenerative diseases is caused by the progressive loss of neural cells thereby leading to nervous system dysfunction (Beal, 1995; Kuca et al., 2016). In particular, Alzheimer’s disease (AD) is a debilitating illness that is expected to triple by the year 2050 (Brookmeyer et al., 2007). AD is characterized by gradual cognitive impairment, memory loss and decline in speech, behavioral abnormality and eventually death. The pathological changes in AD are mainly attributed to the dramatic loss of neurons in many areas of the central nervous system accompanied by a great reduction in the levels of neurotransmitters. Acetylcholine (ACh) is a neurotransmitter possessing important cognitive and muscular functions. Particularly, in the peripheral nervous system, ACh is found at the neuromuscular junction where it is involved in muscle contraction while in the central nervous system, it is involved in cognitive functions such as thought, learning and memory.

Acetylcholinesterases (AChE) is an enzyme that catalyzes the breakdown of ACh to choline and acetic acid (Quinn, 1987). Thus, a promising therapeutic approach is to maintain the level of ACh by inhibiting the enzyme that is responsible for its breakdown. The structure of AChE is comprised of four main subsites consisting of anionic subsite, esteratic site, oxyanion hole and the acyl pocket (Bourne, Taylor & Marchot, 1995). The anionic site is involved in the binding of the positive quaternary amine of ACh (Ordentlich et al., 1993). The substrate interacts with the 14 aromatic residues that forms the active site. Of these 14 aromatic residues, Trp84 is important for the enzyme activity because when it is replaced by alanine, the activity of the enzyme decreased by 3,000-fold (Tougu, 2001). The esteratic site contains the catalytic triad consisting of Ser203, His447 and Glu334 (i.e., resembling that of chymotrypsin and other serine proteases) that hydrolyzes ACh to acetate and choline (Harel et al., 1993). The mechanism of the hydrolysis starts from the carboxyl ester leads to the formation of an acyl-enzyme and choline. Finally, the acyl-enzyme undergoes nucleophilic attack by water molecules thereby regenerating the enzyme (Tougu, 2001). The oxyanion hole consisting of Gly121, Gly122 and Ala204 contribute hydrogen bond donors to help stabilize the tetrahedral intermediate of ACh form during catalysis. The acyl pocket consisting of Phe295 and Phe297 are gatekeepers that limit the dimension of substrates that can enter the active site.

AChE inhibitors form one of the most actively investigated classes of compounds having been labeled as a potential agent for the treatment of AD by inhibiting AChE from hydrolyzing ACh, thereby leading to increases in the level of ACh (Birks, 2006). Generally, AChE inhibitors can be classified into reversible and irreversible inhibitors. Reversible inhibitor bind to the enzyme at allosteric sites as to reduce the activity of the enzyme whether or not the enzyme has already bind the substrate or not. Tacrine is a reversible AChE inhibitor that was synthesized nearly five decades ago and in 1993 it has become the first drug to be marketed for the treatment of AD with approval from the US. Food and Drug Administration (Racchi et al., 2004). On the other hand, irreversible inhibitors such as metrifonate (Morris et al., 1998) bind to the target enzymes and dissociates very slowly from the enzyme via either covalent or non-covalent interactions (Kitz & Wilson, 1962).

Quantitative structure–activity relationship (QSAR) is a paradigm that enables the prediction of biological activities for compounds of interest as a function of their descriptors through the use of statistical or machine learning methods (Nantasenamat et al., 2009). Aside from the ability to predict the activity, QSAR models have been instrumental in enabling understanding on the origin of these biological activities by means of interpreting the descriptors used in building such models.

Historically, the first QSAR investigation of AChE inhibitors was reported by Mundy et al. (1978) almost 40 years ago in which the log(1/LD₅₀ for a series of twelve substituted 0,0-dimethyl 0-(p-nitrophenyl) phosphorothioates and 0-analogs was predicted as a function of the octanol/water partition coefficient. Analysis of the literature of QSAR studies of AChE revealed that much of the early studies are classical QSAR models (i.e., Hansch and Free-Wilson approach) that are based on small congeneric compound set and primarily aimed at predicting AChE inhibition as to investigate the toxic effect of pesticides of various chemotypes belonging to either organophosphates (Mager, 1983; Aaviksaar, 1990) or carbamates (Su & Lien, 1980; Goldblum, Yoshimoto & Hansch, 1981; Walters & Hopfinger, 1986). Recent QSAR studies are based on the use of large and heterogeneous data sets comprising of structurally diverse chemotypes. This include the study from Yan & Wang (2012) where they predicted AChE inhibition for a large set of 404 compounds using multiple linear regression and support vector machine. Furthermore, Lee & Barron (2016) performed a 3D-QSAR investigation on a large set of 341 compounds comprising of organophosphates and carbamates. Moreover, Veselinović et al. (2015) compiled a set of 278 organophosphates for which they developed QSAR models for predicting AChE inhibition using SMILES-based descriptors.

Research in this field had experienced two distinct transitions when viewed from biological and computational viewpoints. Biologically, early QSAR studies treat AChE as a biomarker of toxicity from pesticides while investigations from later years had shifted the focus by viewing AChE as a therapeutic target for the treatment of AD. In regards to the latter point, viewpoint on targeting AChE as a single target for treating AD is starting to be replaced by the multi-target concept in which the treatment for AD can be approached by a panel of key targets (Fang et al., 2015; Huang et al., 2011). Computationally, early studies are predominantly based on simple 2D-QSAR (Mundy et al., 1978; Su & Lien, 1980) while later years started to use more sophisticated approach for understanding AChE inhibition encompassing 3D-QSAR (Deb et al., 2012; Lee & Barron, 2016; Prado-Prado et al., 2012), molecular dynamics (Shen et al., 2002), molecular docking (Lu et al., 2011; Deb et al., 2012; Giacoppo et al., 2015), pharmacophore modeling (Lu et al., 2011; Gupta & Mohan, 2014) and statistical molecular design (Andersson et al., 2014; Prado-Prado, Escobar & Garcia-Mera, 2013).

Herein, we propose the first large-scale QSAR investigation for predicting AChE inhibition, which to the best of our knowledge represents the largest collection of 2,570 non-redundant compounds. QSAR models were built using interpretable learning methods (e.g., random forest) and descriptors (i.e., molecular fingerprints) as to unravel the underlying AChE inhibitory activity, which was performed in accordance with guidelines of the Organisation for Economic Cooperation and Development (OECD). Molecular docking was also performed on a chemically diverse set of compounds selected from active AChE inhibitors. Together, the ligand and structure-based approach employed in this study is anticipated to be useful in the design and development of robust AChE inhibitors.

Materials and Methods

A summary of the workflow of this study is provided in Fig. 1. Briefly, this included a large-scale QSAR model for predicting and analyzing the AChE inhibition, which was performed in accordance with the OECD guidelines as follows: (i) a data set with a defined endpoint; (ii) an unambiguous learning algorithm; (iii) a defined applicability domain of the QSAR model; (iv) using appropriate measures of goodness-of-fit, robustness and predictivity; (v) a mechanistic interpretation of the QSAR model. Furthermore, molecular docking was also performed on a chemically diverse data set as to elucidate the underlying binding mechanism. To facilitate the reproducibility of the research work performed herein, the data set (Data S1) and codes (Data S2) used to perform the multivariate analysis are provided as Supplemental Information.

Figure 1: Workflow of QSAR modeling and molecular docking for investigating AChE inhibitory activity.

Download full-size image

DOI: 10.7717/peerj.2322/fig-1

Data set

A data set of inhibitors against human AChE (Target ID CHEMBL220) were compiled from the ChEMBL 20 database (Gaulton et al., 2012) that is comprised of a total number of 9,242 bioactivity data points from 5,049 compounds. SMILES notations of the compounds were curated with ChemAxon’s Standardizer (ChemAxon Kft., 2015) using the same parameter settings as described in our previous study (Simeon et al., 2016). The initial data set was assembled from several bioactivity measurement units including (in order of decreasing data size) IC₅₀, K_i, % activity, % inhibition, MIC, EC₅₀, etc. IC₅₀ was selected for further investigation as they constituted the largest subset with 4,910 compounds. A closer look revealed that 1,301 compounds had no reported IC₅₀ values or had lesser/greater than signs which were subjected to removal thereby leaving 3,609 compounds. Only data points having nM as the bioactivity unit were selected for further study, which produced 3,596 compounds. Furthermore, redundant compounds having different bioactivity values were kept if the standard deviation of IC₅₀ was less than 2 and this resulted in 2,571 compounds. Moreover, some compounds were found to have no SMILES notation associated with it and were thus removed. A final data set comprising of 2,570 compounds was obtained.

Description of inhibitors

AChE inhibitors were encoded by a vector of fingerprint descriptors accounting for its molecular constituents. Prior to calculating descriptors, salts were removed and tautomers were standardized using the built-in function of the PaDEL-Descriptor software (Yap, 2011).

Although fingerprint descriptors are able to capture the feature space of chemical compounds, their ability to be used as descriptors for bioactivity modeling can vary. In fact, performance differences existing amongst the different fingerprint type has been the subject of several investigations into its utilization for bioactivity modeling. Riniker & Landrum (2013) benchmarked and assessed the performance of predictive models constructed from 2D fingerprint descriptors obtained from RDKit.

In this study, the suitability of 12 different fingerprint descriptors for predicting the bioactivity of AChE inhibitors was investigated. Table 1 summarizes the employed fingerprints along with their corresponding size, description and reference.

Table 1:

Summary of 12 sets of fingerprint descriptors employed in this study.

No.	Fingerprint	Number	Description	Reference
1	CDK	1024	Fingerprint of length 1024 and search depth of 8	Steinbeck et al. (2003)
2	CDK extended	1024	Extends the fingerprint with additional bits describing ring features	Steinbeck et al. (2003)
3	CDK graph only	1024	A special version that considers only the connectivity and not bond order	Steinbeck et al. (2003)
4	E-state	79	Electrotopological state atom types	Hall & Kier (1995)
5	MACCS	166	Binary representation of chemical features defined by MACCS keys	Durant et al. (2002)
6	PubChem	881	Binary representation of substructures defined by PubChem	NCBI (2009)
7	Substructure	307	Presence of SMARTS patterns for functional groups	Laggner (2005)
8	Substructure count	307	Count of SMARTS patterns for functional groups	Laggner (2005)
9	Klekota–Roth	4860	Presence of chemical substructures	Klekota & Roth (2008)
10	Klekota–Roth count	4860	Count of chemical substructures	Klekota & Roth (2008)
11	2D atom pairs	780	Presence of atom pairs at various topological distances	Carhart, Smith & Venkataraghavan (1985)
12	2D atom pairs count	780	Count of atom pairs at various topological distances	Carhart, Smith & Venkataraghavan (1985)

DOI: 10.7717/peerj.2322/table-1

Additionally, the four molecular descriptors that are used to define the Lipinski’s rule-of-five comprising of molecular weight (MW), logarithm of the octanol/water partition coefficient (ALogP), number of hydrogen bond donor (nHBDon) and number of hydrogen bond acceptor (nHBAcc) were also computed by the PaDEL-Descriptor software.

Feature selection

Collinearity is a condition where descriptor pairs are known to have intercorrelation, which not only add complexity to the model but could potentially give rise to bias. To remedy this, the cor function from the R package stats was used to find the pairwise correlation among descriptors, and descriptors in a pair with a Pearson’s correlation coefficient greater than the threshold of 0.7 was filtered out using the findCorrelation function from the R package caret to obtain a smaller subset of descriptors (Kuhn, 2008).

Data splitting

To avoid the possibility of bias that may arise from a single data split when building predictive models (Puzyn et al., 2011), predictive models were constructed from 100 independent data splits and the mean and standard deviation values of statistical parameters were reported. The data set was split into internal and external sets in which the former comprises 80% whereas the latter constitutes 20% of the initial data set. The sample function from the R base package was used to split the data.

Multivariate analysis

Supervised learning is to learn a model from labeled training data which can be used to make prediction about unseen or future data (James et al., 2013). This study constructs regression models, which affords the prediction of the continuous response variable (i.e., pIC₅₀) as a function of predictors (i.e., fingerprint descriptors).

Random forest (RF) is an ensemble classifier that is composed of several decision trees (Breiman, 2001). Briefly, the main idea behind RF is that instead of building a deep decision tree with an ever-growing number of nodes, which may be at risk for overfitting and overtraining of the data, rather multiple trees are generated as to minimize the variance instead of maximizing the accuracy. As such, the results will be more noisier when compared to a well-trained decision tree, yet these results are usually reliable and robust. The ranger function from the R package ranger, which is a fast implementation of the RF algorithm that was used for constructing the models (Wright & Ziegler, 2015).

Validation of QSAR models

Model validation is an important process, which should be performed to ensure that a fitted model can accurately predict responses for future or unknown subjects. Two statistical parameters were used to evaluate the performance of the QSAR models consisting of Pearson’s correlation coefficient (r) and root mean squared error (RMSE). The r value is a commonly used metric to represent the degree of relationship between two variables of interest. It can range from −1 to +1 in which negative values are indicative of negative correlation between two variables and vice versa. RMSE is a commonly used parameter to assess the relative error of the predictive model. The predictive performance of the QSAR models was verified by 10-fold cross-validation, external validation and Y-scrambling test.

The 10-fold cross-validation technique does not used the entire data set to build predictive model. Instead, it splits the data into training and testing data set by allowing model that is built with training data set us allow to assess the performance of the model on the testing data set. By performing repeats of the 10-fold validation, the average accuracies can be used to truly assess the performance of the predictive model.

Y-scrambling test was used to ensure the robustness of the predictive model not only to rule out the possibility of chance correlations but also to assess the statistical significance of R² and Q², ensuring the generalizability of QSAR model. The true Y-dependent variable (i.e., pIC₅₀) was randomly scrambled and the statistical assessment parameters are recalculated. Performance of the Y-scrambling test can be deduced from the regression line of the plot of R² versus Q². Intercept values for R² and Q² as denoted by iR² and iQ², respectively, were calculated. Negative iQ² is indicative of an acceptable QSAR model and that there is no chance correlation from the real model (Eriksson et al., 2003). Furthermore, $r_{m}^{2}$ and $Δ r_{m}^{2}$ metrics as introduced by Roy et al. (2013) were used to verify the robustness of the proposed QSAR model in which an acceptable QSAR model should give $r_{m}^{2} > 0.5$ and $Δ r_{m}^{2} < 0.2$ .

Applicability domain analysis

The applicability domain (AD) estimates the likelihood of reliable prediction for compounds on the basis of how similar they are to compounds used to build the model. Thus, compounds falling outside the AD may lead to unreliable predictions. The most common approach for determining AD is described by Gramatica (2007) and Tropsha, Gramatica & Gombar (2003), which is to compute the leverage values for each compound. The leverage value allows one to identify whether new compounds will lie within or outside the domain. Leverage values for all compounds are calculated via adjustment of X to give the hat matrix H: (1) $H = X {(X^{T} X)}^{- 1} X^{T}$

where X is a two-dimensional matrix comprising of n compounds and m descriptors while X^T is the transpose of X. Meanwhile, the leverage value of the ith compound (h_i) is the ith diagonal element of H: (2) $h_{i} = x_{i}^{T} {(X^{T} X)}^{- 1} x_{i}$

where x_i is the descriptor row-vector of the ith compound. The warning leverage h^∗ is calculated by: (3) $h^{*} = 3 (p + 1) ∕ n .$

Practically, the leverage value along with the William’s plot is often used to assess the AD of QSAR models. The William’s plot is constructed by depicting the standardized residuals versus the leverage value for each compound’s h_i. If the ith compound has h_i > h^∗ then it means that the ith compound exerts a great influence on the QSAR model and may be excluded from the AD. In spite of this, it does not appear to be an outlier because its standardized residual may be small.

Molecular docking

The co-crystal structure of human AChE with donepezil (PDB ID: 4EY7) was retrieved from the Protein Data Bank and initially prepared by removing alternative side chains and water molecules. The protein was prepared via the rebuilding of bonds and the addition of missing hydrogen atoms. Subsequently, the protein was cleaned by merging the atomic charge and removing lone pair atoms, non-polar hydrogen atoms and non-standard amino acid residues. Grid box was set up to provide coverage of the active site of human AChE with a dimension of 40 × 30 × 40 Å (X, Y and Z axes of −13.987, −41.668 and 27.109, respectively). Molecular docking was consequently performed with AutoDock Vina (Trott & Olson, 2010) using default parameters. The docking protocol was validated in order to ensure its reliability for subsequent analysis of the studied compounds. This was performed by extracting the co-crystal ligand, donepezil, from the PDB file and re-docked to the co-crystal human AChE protein. The root mean squared deviation (RMSD) of the atomic position between the original orientation of the co-crystal ligand and the re-docked ligand is computed and is deemed acceptable if the RMSD value is less than or equal to 2.0 Å.

A set of 30 representative and chemically diverse compounds, which will be referred hereafter as the diversity set, were extracted from the full set of active AChE inhibitors (i.e., IC₅₀ <1 µM) using the Kennard–Stone algorithm (Kennard & Stone, 1969). These compounds were used as ligands for molecular docking against the human AChE. The binding energy (kcal/mol) of AChE inhibitors were calculated according to the built-in scoring function of Autodock Vina and conformers providing the lowest binding energy were selected for further analysis of the binding mode. Furthermore, key-interacting residues and their moiety preferences were analyzed using LigPlot+ (Wallace, Laskowski & Thornton, 1995), Maestro (Schrödinger, 2015b) and the SiMMap web server (Bollback, 2006). Finally, three-dimensional structure of protein–ligand interaction was created and visualized using Pymol (Schrödinger, 2015a).

Figure 2: Chemical space of AChE inhibitors.
Actives, inactives and intermediates are shown in green, red and blue colors, respectively.

Download full-size image

DOI: 10.7717/peerj.2322/fig-2

Figure 3: Box plot of AChE inhibitors using Lipinski’s rule-of-five descriptors.

Download full-size image

DOI: 10.7717/peerj.2322/fig-3

Results and Discussion

Chemical space of AChE inhibitors

Navigation of the chemical space of AChE inhibitors was performed to gain insights into the structure–activity relationship by analyzing the Lipinski’s rule-of-five descriptors. Chemical space analysis may provide important knowledge on the general character of compounds governing inhibitory properties of compounds. Exploratory data analysis was performed using the Lipinski’s rule-of-five descriptors comprising of MW, ALogP, nHBDon and nHBAcc. MW represents the molecular size of a compound that is commonly used because of it can be easily interpreted and calculated as well as appropriate size of a compound is important for its passage via lipid membrane. ALogP is a widely used parameter for determining the lipophilicity of a compound and used in calculating the membrane penetration and permeability of compounds. nHBDon and nHBAcc describe the number of hydrogen bond donors and hydrogen bond acceptors, respectively, which is used to measuring hydrogen bonding capacity. Visualization of the chemical space of ALogP as a function of MW is shown in Fig. 2, as to investigate the chemical space of AChE inhibitors. A dense distribution of inhibitors was observed within the space of MW starting from approximately 300–600 Da and within the space of ALogP ranging from approximately −2.5 to 2.5. In addition, the box plot of the Lipinski’s descriptors is shown in Fig. 3. Compounds with negative ALogP values approximately of closer to 0.0 can be found in inactive inhibitors whereas most of the active inhibitors tend to possess approximately lower values in average of ALogP values.

Visual representation of the overall distribution of data values of Lipinski’s descriptors is shown as box plots in Fig. 3 in which the ALogP, nHBAcc, nHBDon and MW are shown in Figs. 3A, 3B, 3C and 3D, respectively. Analysis of the box plots revealed that there were no differences amongst the three bioactivity classes for nHBAcc and nHBAcc as deduced from the boundaries of the boxes (i.e., representing the first and third quartiles). ALogP and MW were found to display differences amongst the bioactivity classes. Particularly, ALogP values for actives were the lowest while negligible differences were observed for the other two classes. Furthermore, MW for actives were the largest amongst the three bioactivity classes, which is followed by the intermediates while inactives were smallest.

QSAR model for predicting AChE inhibitory activity

A data set comprising of 2,570 compounds were used for construction of QSAR models. Particularly, twelve sets of fingerprint descriptors were benchmarked in order to find the best performing set. Prior to modeling, feature selection was applied to remove collinear descriptors. Each of the twelve models were then built using a data split ratio of 80/20 in which 80% of the data set was used as the internal set and 20% as the external set. This procedure was iteratively performed in which each of the 100 independent data splits were used for model construction and the performance results given in Table 2 are the mean and standard deviation values derived from these runs.

Table 2:

Performance summary of QSAR models for predicting pIC₅₀.

Descriptor class	N	Training set		10–fold CV set		External set		$R^{2} - Q_{CV}^{2}$	$R^{2} - Q_{Ext}^{2}$
		R²	RMSE_Tr	$Q_{CV}^{2}$	RMSE_CV	$Q_{Ext}^{2}$	RMSE_Ext
CDK	960	0.93 ± 0.01	0.44 ± 0.04	0.79 ± 0.07	0.76 ± 0.15	0.80 ± 0.04	0.73 ± 0.09	0.14	0.13
CDK extended	948	0.94 ± 0.01	0.42 ± 0.03	0.79 ± 0.06	0.76 ± 0.12	0.81 ± 0.04	0.72 ± 0.08	0.15	0.13
CDK graph only	198	0.87 ± 0.01	0.61 ± 0.03	0.72 ± 0.06	0.87 ± 0.13	0.72 ± 0.05	0.87 ± 0.09	0.15	0.15
E-State	21	0.66 ± 0.03	1.00 ± 0.05	0.55 ± 0.09	1.11 ± 0.13	0.56 ± 0.05	1.10 ± 0.08	0.11	0.10
MACCS	77	0.89 ± 0.01	0.56 ± 0.03	0.77 ± 0.07	0.81 ± 0.15	0.77 ± 0.04	0.80 ± 0.09	0.12	0.12
PubChem	103	0.90 ± 0.01	0.55 ± 0.03	0.76 ± 0.05	0.80 ± 0.11	0.78 ± 0.03	0.79 ± 0.08	0.14	0.12
Substructure	30	0.75 ± 0.01	0.85 ± 0.03	0.64 ± 0.06	1.00 ± 0.13	0.65 ± 0.05	0.98 ± 0.08	0.11	0.10
Substructure count	26	0.92 ± 0.01	0.50 ± 0.02	0.78 ± 0.06	0.77 ± 0.14	0.78 ± 0.05	0.77 ± 0.10	0.14	0.14
Klekota–Roth	111	0.89 ± 0.01	0.59 ± 0.03	0.74 ± 0.07	0.85 ± 0.14	0.76 ± 0.05	0.81 ± 0.10	0.15	0.13
Klekota–Roth count	72	0.91 ± 0.01	0.52 ± 0.03	0.78 ± 0.06	0.79 ± 0.14	0.78 ± 0.05	0.77 ± 0.11	0.13	0.13
2D atom pairs	42	0.75 ± 0.03	0.85 ± 0.06	0.61 ± 0.08	1.03 ± 0.15	0.60 ± 0.06	1.05 ± 0.12	0.14	0.15
2D atom pairs count	38	0.92 ± 0.01	0.51 ± 0.02	0.74 ± 0.07	0.84 ± 0.15	0.76 ± 0.05	0.82 ± 0.10	0.18	0.16

DOI: 10.7717/peerj.2322/table-2

It can be observed that all twelve models are capable of capturing the inhibitory activity space of AChE inhibitors as they provided R² and Q² (i.e., both 10-fold CV and external sets) greater than the threshold values proposed by Golbraikh & Tropsha (2002) of 0.6 and 0.5, respectively, which is indicative of robust model performance. The possibility of chance correlation can be assessed from the R²–Q² margin as described by Eriksson & Johansson (1996) in which values <0.2–0.3 are indicative of predictive and reliable models while values >0.2–0.3 suggests possible chance correlation or the presence of outliers in the data set. Furthermore, observation of the $Q_{CV}^{2}$ – $Q_{Ext}^{2}$ margin revealed that the difference was negligible with values in the range of 0 and 0.01.

Generally, it can be seen that models with larger descriptor size, namely CDK and CDK extended, afforded the best performance with $Q_{CV}^{2}$ of 0.79 ± 0.07 and 0.79 ± 0.06, respectively, and $Q_{Ext}^{2}$ of 0.80 ± 0.04 and 0.81 ± 0.04, respectively. The opposite also holds true as the model with the least number of descriptors were also found to perform the worst amongst the other fingerprints with $Q_{CV}^{2}$ of 0.55 ± 0.09 and $Q_{Ext}^{2}$ of 0.56 ± 0.05. In a nutshell, the model performance in order of decreasing value is as follows: CDK extended > CDK > MACCS ≈ Substructure count ≈ Klekota–Rota count > PubChem > Klekota–Roth ≈ 2D atom pairs count > CDK graph only > Substructure > 2D atom pairs > E-state.

The best performing model is not necessarily the best choice considering the fact that the descriptor size for the best models were quite high and is consequently prone to overfitting. It was found that the substructure count provided reasonably good predictive performance (i.e., $Q_{CV}^{2}$ and $Q_{Ext}^{2}$ of 0.78 ± 0.06 and 0.78 ± 0.05, respectively) with the advantage of making use of a small set of 26 descriptors. Therefore, this fingerprint was selected for further interpretation of the feature importance.

Table 3:

Performance summary of QSAR models assessed using additional statistical metrics.

Descriptor class	N	Training set		10-fold CV set		External set		iR²	iQ²
		$r_{m}^{2}$	$Δ r_{m}^{2}$	$r_{m}^{2}$	$Δ r_{m}^{2}$	$r_{m}^{2}$	$Δ r_{m}^{2}$
CDK	960	0.82 ± 0.02	0.07 ± 0.01	0.62 ± 0.09	0.20 ± 0.06	0.64 ± 0.05	0.19 ± 0.03	0.0003	−0.0003
CDK extended	948	0.83 ± 0.01	0.07 ± 0.01	0.62 ± 0.08	0.20 ± 0.05	0.65 ± 0.05	0.18 ± 0.03	0.0006	−0.0005
CDK graph only	198	0.70 ± 0.02	0.14 ± 0.01	0.51 ± 0.07	0.27 ± 0.05	0.53 ± 0.05	0.26 ± 0.03	0.0007	−0.0006
E-State	21	0.35 ± 0.03	0.38 ± 0.02	0.27 ± 0.07	0.40 ± 0.04	0.28 ± 0.05	0.41 ± 0.03	0.0011	−0.0009
MACCS	77	0.73 ± 0.01	0.12 ± 0.01	0.57 ± 0.09	0.23 ± 0.05	0.58 ± 0.05	0.23 ± 0.03	0.0005	−0.0004
PubChem	103	0.74 ± 0.02	0.12 ± 0.01	0.57 ± 0.07	0.23 ± 0.04	0.59 ± 0.05	0.22 ± 0.03	0.0006	−0.0005
Substructure	30	0.50 ± 0.02	0.28 ± 0.01	0.39 ± 0.07	0.34 ± 0.05	0.41 ± 0.05	0.33 ± 0.03	0.0033	–0.0027
Substructure count	26	0.77 ± 0.01	0.10 ± 0.01	0.60 ± 0.08	0.22 ± 0.05	0.61 ± 0.06	0.21 ± 0.04	0.0015	−0.0013
Klekota–Roth	111	0.71 ± 0.02	0.14 ± 0.01	0.54 ± 0.08	0.25 ± 0.05	0.56 ± 0.06	0.24 ± 0.03	0.0006	–0.0004
Klekota–Roth count	72	0.76 ± 0.02	0.11 ± 0.01	0.60 ± 0.08	0.22 ± 0.05	0.61 ± 0.07	0.21 ± 0.04	0.0006	−0.0005
2D atom pairs	42	0.49 ± 0.03	0.28 ± 0.02	0.35 ± 0.08	0.36 ± 0.04	0.35 ± 0.06	0.36 ± 0.03	0.0010	–0.0008
2D atom pairs count	38	0.75 ± 0.01	0.10 ± 0.01	0.52 ± 0.05	0.26 ± 0.05	0.54 ± 0.06	0.25 ± 0.04	0.0006	−0.0005

DOI: 10.7717/peerj.2322/table-3

To further check the reliability and validity of the selected model, Y-scrambling test was performed for 100 iterations. Table 3 demonstrates that QSAR models built using substructure count has a low Q² (−0.0013), which rules out the possibility of chance correlation. Furthermore, model afforded an $r_{m}^{2}$ value of 0.61 ± 0.06 thereby revealing its robustness. It is observed in Table 3 that the value of $Δ r_{m}^{2}$ is greater that 0.2 but also close to 0.20.

As shown in Fig. 4, it can also be seen that scatter plots of experimental versus predicted pIC₅₀ of panels A, C and F displayed narrower variance of the data points than the other methods as assessed via 10-fold cross-validation and external set.

Plot of experimental versus predicted pIC50 values for models constructed with 12 different fingerprint descriptors. — Figure 4: Plot of experimental versus predicted pIC₅₀ values for models constructed with 12 different fingerprint descriptors.
Shown are models built with CDK fingerprint (A), CDK extended fingerprint (B), E-State fingerprint (C), CDK graph only fingerprint (D), MACCS fingerprint (E), PubChem fingerprint (F), substructure fingerprint (G), substructure fingerprint count (H), Klekota–Roth fingerprint (I), Klekota–Roth fingerprint count (J), 2D atom pairs (K) and 2D atom pairs count (L).

Download full-size image

DOI: 10.7717/peerj.2322/fig-4

Applicability domain

The AD of the proposed QSAR model was defined as provided by the Williams plot shown in Fig. 5. The employed data set consisting of 2,570 compounds was randomly split to two separate subset in which the first subset constituting 80% of the data set was used as an internal set while the second subset constituting the remaining 20% were used as an external set. Compounds representing the internal set (blue dots) and external set (red dots) are shown in the Williams plot and it can be clearly seen that almost all of the 2,570 compounds were located within the boundaries of applicability domain, which indicated that our proposed QSAR model had a well-defined AD.

As can be seen in Fig. 5, very few compounds indeed fall outside the ±3 standardized residual range. This consisted of six compounds (997, 1829, 62, 1096, 13, 677) from the internal set and 11 compounds (2116, 2120, 2388, 2117, 2392, 2323, 2423, 2424, 2507, 2219, 2422) from the external set that had standardized residual higher than 3. On the other hand, 7 compounds (1567, 576, 1644, 1098, 2022, 1447 and 322) from the internal set and 8 compounds (2486, 2353, 2130, 2553, 2389, 2072, 2103, 2125) from the external set had standardized residual lower than −3. The corresponding chemical structures are provided in Table S1.

Mechanistic interpretation of feature importance

Feature importance analysis help reveal features that are important toward bioactivity. There are essentially two parameters for evaluating the relative importance of features used in models using the RF algorithm: (i) accuracy and (ii) Gini index (i.e., variance of the responses). The latter was selected as a metric for ranking important features (i.e., mean decrease of the Gini index) for predicting the pIC₅₀ of AChE inhibitors (Fig. 6). Table 4 lists the substructure fingerprints along with their respective descriptions.

Table 4:

List of top substructure fingerprints and their corresponding description.

Fingerprints	Description
SubFPC1	Primary carbon
SubFPC2	Secondary carbon
SubFPC3	Tertiary carbon
SubFPC5	Alkene
SubFPC18	Alkylaryether
SubFPC23	Amine
SubFPC26	Tertiary aliphatic amine
SubFPC28	Primary aromatic amine
SubFPC32	Secondary mixed amine
SubFPC35	Ammonium
SubFPC49	Ketone
SubFPC88	Carboxylic acid derivative
SubFPC100	Secondary amide
SubFPC135	Carbonyl derivative
SubFPC137	Vinylogous ester
SubFPC143	Carbonic acid diester
SubFPC153	Urethan
SubFPC171	Arylchloride
SubFPC180	Hetero N basic H
SubFPC181	Hetero N nonbasic
SubFPC182	Hetero O
SubFPC184	Heteroaromatic ring
SubFPC274	Aromatic ring
SubFPC275	Heterocyclic ring
SubFPC276	Epoxide
SubFPC287	Conjugated double bond
SubFPC295	C ONS bond
SubFPC296	Charged
SubFPC298	Cation
SubFPC300	1,3-Tautomerizable
SubFPC301	1,5-Tautomerizable
SubFPC302	Rotatable bond
SubFPC303	Michael acceptor
SubFPC307	Chiral center specified

DOI: 10.7717/peerj.2322/table-4

As can be seen in Fig. 6, the top ranking feature is secondary carbon (SubFPC2), which is a carbon atom with two carbon neighbors. In the context of drug design, such central carbon atom may be more difficult to be accessed and metabolized by cytochrome P450 (Uetrecht & Trager, 2007) and therefore are more metabolically stable.

Figure 6: Plot of feature importance as exemplified by the Gini index.

Download full-size image

DOI: 10.7717/peerj.2322/fig-6

The second most important feature is the rotatable bond (SubFPC302). Based on the rule of three for defining lead-like compounds, a compound may have a lead-like characters if it does have rotatable bonds of no more than 3. On the other hand, Veber et al. (2002) noted that the the upper limit of a orally bioavailable drugs is of seven rotatable bonds. Nevertheless, it has been found that number of rotatable bonds provide better discrimination between compounds that are orally active and those that are not. Kryger, Silman & Sussman (1999) claimed that E2020 (i.e., also known as donepezil and marketed as Aricept) needs at least two rotatable bonds on each side of the piperidine in which two aromatic moieties of E2020 interact with Trp86 and Trp286 (human AChE numbering), suggesting that links between aromatic systems of the inhibitor against its AChE counterparts are essential to yield high affinity.

The third important substructure is the aromatic ring (SubFPC274). Findings from X-ray crystallographic study showed that in the binding site of the co-crystal structure of AChE with tacrine, the aromatic ring of acridine engages in a π–π stacking interaction with the indole of Trp86 (human AChE numbering), thereby indicating the importance of the aromatic ring for AChE inhibition (Chen et al. 2012).

The fourth important feature is C ONS bond (SubFPC295), which is defined as the presence of any carbon connected with either oxygen, nitrogen or sulfur atom in a molecule. These atoms are considered as high electron density atoms, which exerted from higher electronegativity comparing with a carbon atom. Unequal sharing of electron pair making covalent bond contribute polarity and afford dipole moment to a molecule, which able to generate a dipole–dipole attraction such as hydrogen bond between two polar molecules. Furthermore, increasing of polarity and presenting of hydrogen bond improve water solubility, which is essential characteristic of a drug.

The fifth important substructure is secondary mixed amine (SubFPC32). The importance of the moiety was demonstrated in the work of Bembenek et al. (2008) in which a structure-based approach was used to reveal that in the catalytic triad, Trp86 interacts with the quaternary amine of ACh through a cation–π interaction. Furthermore, in the ‘anionic’ site Trp286 appears to attract the amine moiety via cation and/or hydrophobic interactions.

The sixth, seventh and eighth important substructures are heterocyclic rings (SubFPC275), tertiary carbon (SubFPC3) and primary carbon (SubFPC1). Heterocycles are of high relevance in the design of AChE inhibitors as it allows π–π stacking interaction with key amino acid residues in the binding site of AChE. It is observed that the binding site of the AChE are highly hydrophobic in nature. Particularly, the Trp286 (human AChE numbering) which is a part of the peripheral anionic site of the AChE is involved in the π–π interaction with heterocycles of AChE inhibitors (Lu et al., 2011). The aforementioned explanation made for the secondary carbon is also applicable for the tertiary carbon in which the higher number of carbon neighbors would also confer high stability against cytochrome P450 metabolism.

The ninth and tenth important substructures are Hetero N nonbasic (SubFPC181) and tertiary aliphatic amine (SubFPC26) in which the former is defined as aromatic nitrogen. The presence of the nitrogen atom, which is a cationic moiety deemed to interact with aromatic residues through π-cation interaction as observed in E2020 against Torpedo californica AChE (tcAChE). The charge of nitrogen atom located in piperidine ring provide π-cation interaction with the side chain of Phe337 (Phe330 in tcAChE numbering) (Guo et al., 2004). Since the active site gorge of AChE is comprised of several aromatic residues known as the aromatic patch, the addition of cationic moiety could possibly increase the binding affinity when the ligand is arranged in a suitable conformation with respect to the aromatic side chain of residues in the active site.

It is worthy to note that substructures pertaining to the covalent inhibitors, carbamates and organophosphates, were not found in the top ten important features. A manual inspection of the 307 substructure fingerprints revealed that there were none specifically describing the characteristic feature of carbamate and organophosphates. However, there were quite a few substructures that resembled partial features of the carbamate (e.g., carboxylic acid (SubFPC84), carboxylic ester (SubFPC85), “NOS methylen ester and similar” (SubFPC65), etc.) as well as substructures resembling partial features of organophosphates (e.g., phosphonic monoester (SubFPC230), phosphonic diester (SubFPC231), phosphonic monoamide (SubFPC232), phosphonic esteramide (SubFPC234), phosphonic acid derivative (SubFPC235), etc.). In spite of the presence of these descriptors, important features obtained from predictive model as revealed by the Gini index did not contain these features in the top ten. A possible explanation for such observation could be that there were a few carbamates (123 compounds) and organophosphates (18 compounds) present in the data set and its relative importance may have been masked by other features that represented the other structural class. Thus, it seems very lucrative for a future large-scale study to be performed focusing on these two compound classes.

Figure 7: Plot showing the distribution of active AChE inhibitors (gray circles) and the diversity set (red circles) selected for molecular docking.

Download full-size image

DOI: 10.7717/peerj.2322/fig-7

Figure 8: Chemical structures, binding energy and bioactivity of the diversity set consisting of 30 representative compounds from active AChE inhibitors.

Download full-size image

DOI: 10.7717/peerj.2322/fig-8

Molecular Docking of AChE Inhibitors

To gain a further understanding on the non-covalent interaction between AChE and their inhibitors, a chemically diverse set of 30 representative compounds was extracted from active AChE inhibitors (i.e., having IC₅₀ <1 µM) using the Kennard–Stone algorithm and subjected to an investigation on its binding modality against the active site of AChE. Figure 7 shows the distribution of the selected subset of compounds in the context of the full set of actives, which was found to provide a full coverage of the original chemical space. The chemical structures of these compounds are shown in Fig. 8.

The active site of this enzyme is buried inside a narrow gorge of 20 Å deep, which permits multiple enzyme-substrate interaction thereby facilitating the formation of the transition state of ACh (Silman & Sussman, 2008; Zhou, Wang & Zhang, 2010; Cheung et al., 2012). The entry of the active site gorge is lined up by peripheral anionic site (PAS), which is composed of Tyr72, Tyr124, Trp286 and Try341. The function of PAS is to trap the substrate via π-cation interaction and proceed through the constriction residues Tyr124 and Phe338 and onto the catalytic site (Silman & Sussman, 2008; Dvir et al., 2010). As a serine hydrolase, AChE contains Ser203, Glu334 and His447 in the catalytic triad that catalyzed the acylation and deacylation of ACh. The catalytic triad is surrounded by the catalytic anionic site (CAS) (i.e., contains Trp86, Glu202 and Tyr337), oxyanion hole (i.e., comprising of Glu121, Glu122 and Ala204) and the acyl pocket (i.e., comprised of Phe295 and Phe297). These sites interact with ACh and positions it in a suitable orientation for interacting with the catalytic triad as well as providing proton transfer that is essential for nucleophilic substitution during the catalytic reaction (Zhou, Wang & Zhang, 2010). As a result of the acylation process, the proton transfer from Ser203 to His447 induces the oxygen atom of Ser203 to engage in the nucleophilic attack of the carbonyl group of ACh, which consequently breaks down the choline moiety and forms a covalent acylenzyme complex between Ser203 and the acetyl group. This complex consequently proceeds with deacylation, which follows a similar mechanism with the acylation stage. The protonation of Glu202 provides a water molecule and the proton transfer from His447 to the water molecule leads to the nucleophilic attack against the acetyl group of the complex. Finally, this results in the breaking down of the complex thereby restoring wild-type AChE and causing the release of acetic acid from the active site (Zhou, Wang & Zhang, 2010). In the case of organophosphate and carbamate poisoning, the active site of AChE is phosphorylated or carbamylated, which is no longer capable to hydrolyze the ACh substrate. Organophosphates target and phosphorylates Ser203 in the same manner as that of ACh. However, the phosphoryl moiety is highly stable and leads to the irreversible inhibition of AChE. The mechanism of carbamate inhibition is virtually identical to organophosphate poisoning with the exception that the carbamylated serine moiety is less stable and is therefore able to be regenerated to the active enzyme form. Such understanding on the catalytic mechanism of substrate and inhibitory mechanism of AChE provides useful insights for the development of therapeutic agents targeting AChE (Fukuto, 1990).

Prior to carrying out the molecular docking calculations, the docking protocol was validated by re-docking the co-crystal ligand and protein. It was found that the re-docked ligand exhibited negligible deviation from the co-crystal conformation with an RMSD value of 0.963 Å, which was deemed to be suitable for further molecular docking investigation and its subsequent interpretation. Consequently, the binding modality was analyzed in order to gain understanding on the contribution of key residues in interacting with the investigated set of 30 compounds. This was performed using the SiMMap web server, which revealed three major binding anchors: Hbond1, vdW1 and vdW2 along with their site-moiety preferences. The first anchor site involves hydrogen bond interaction between Tyr124 (i.e., an important residue in the PAS that is spatially located as a bottleneck between the peripheral region and the catalytic site of AChE) and the following ligand moieties: secondary amide, secondary amine, nitrogen moiety in aromatic ring, ketone and ester. Such interaction can be observed in the co-crystal structure of huperzine A with human AChE (Cheung et al., 2012). Interestingly, analysis of the important features from QSAR models also revealed the importance of “C ONS bond,” “secondary mixed amine,” “heterocyclic” and “hetero N non-basic” as they were found to be in the top ten important substructures and is therefore crucial for forming hydrogen bonds. Furthermore, the other anchor sites involve van der Waals interaction in which members of the first van der Waals interaction site (vdW1) are comprised of Tyr124, Phe338 and Tyr341, which has a preference to interact with heterocyclic, aromatic, phenol and other non-polar moieties from representative inhibitors. The second van der Waals’ interaction site (vdW2) is consisted of Trp86 and Gly121 with preference for the following ligand moieties: aromatic ring, heterocyclic ring, aliphatic moiety with alkene linkage and phenol moiety. These residues contain either bulky aromatic ring or non-polar moiety as their side chain to provide the van der Waal’s surface contact against non-polar moiety from ligands. Notably, aromatic and heterocyclic substructures were also observed in the top ten important substructures for predicting the inhibitory activity of human AChE inhibitors.

Figure 9: Molecular docking of the top three representative compounds against AChE.
The binding between three subsites of AChE with 13, 5 and 28 are shown in (A, B) and (C), respectively. The representative compounds are highlighted as yellow sticks while residues participating in the peripheral site, constriction site and catalytic site are labeled as green, white and blue sticks, respectively. Hydrogen bonds are shown by blue dash line while the red dash line represents the π–π stacking interaction between ligand and residues in the active site of AChE.

Download full-size image

DOI: 10.7717/peerj.2322/fig-9

Furthermore, analysis of the binding energy from the 30 representative compounds revealed that compounds 13, 5 and 28 exhibited the lowest binding energy of −12.2, −12.0 and −12.0 kcal/mol, respectively, when interacting with the human AChE binding site, which is comparable to donepezil (−12.2 kcal/mol) as indicated in Fig. 8. Key interacting residues and their moiety preference was deduced from the protein–ligand interaction diagram generated by LigPlot+ (Wallace, Laskowski & Thornton, 1995) in combination with Maestro (Schrödinger, 2015b) and their three-dimensional structure was visualized by PyMOL (Schrödinger, 2015a).

Figure 9A revealed three major interaction subsites for 13 inside the binding pocket of human AChE. The first interaction site (Fig. 9A) is formed by residues from the PAS at the gorge opening consisting of Trp286 and Tyr341 both of which engages in π–π stacking interaction where the terminal benzene attached with pyrazole moiety of the ligand interacts with the former residue while the pyrazole moiety of 13 provide contact with the phenol moiety of the latter residue, which are deemed to increase the binding fitness against the active site of human AChE. The second interaction subsite (Fig. 9A) is dominated by Tyr124 and Phe338 in which the former interacts with the ketone moiety of 13 by means of a hydrogen bond, which tends to increase the binding affinity of this compound. The side chain of Phe338 is involved in hydrophobic interaction with carbon atoms from the piperidine ring and the central aliphatic chain of 13. It can be noted that these residues are members of constriction site, which arranged as bottleneck of active site. The third interaction subsite (Fig. 9A) is moderated by π–π stacking between Trp86 and the terminal benzene with an attached piperidine moiety as well as hydrophobic interaction between Tyr337 and the piperidine moiety of 13 in which both residues belonged to the CAS. In addition, Gly121 of oxyanion hole also interact with terminal benzene of the ligand through hydrophobic contact increasing binding affinity against catalytic site of AChE.

Analysis of the binding modality of compound 5 revealed interactions with all subsites of the AChE active site gorge as illustrated in Fig. 9B. PAS was the first subsite dominated by π-π stacking between Trp286 and the benzisoxazole moiety of compound 5, which is essential for stabilizing the binding affinity of the ligand against entry into the gorge. The second interaction site was observed at the constriction site in which the piperidine moiety makes contact with Phe338 via hydrophobic interaction thereby increasing the binding fitness against the bottleneck region of the active site. Similar hydrophobic interaction was also observed in the binding pocket in which Tyr337 from CAS interacts with the piperidine moiety and Gly121 of the oxyanion hole interacts with the terminal benzene of compound 5. The π-π interaction between the terminal benzene and Trp86 from the CAS was deemed to increase the binding fitness with the catalytic site of AChE. In addition, hydrogen bond interaction is facilitated by the nitrogen atom from Phe295 at the acyl pocket to the oxygen atom from the benzisoxazole moiety. Notably, all sites from the active site gorge are snugly bound by compound 5, which is deemed to exhibit strong intermolecular interaction with human AChE.

The binding energy of compound 28 was similar to that of compound 5 as indicated in Fig. 8. These compounds possessed several aromatic rings at both terminal, which are favourable for interacting with aromatic residues lining up the surface of the gorge and these are known as the aromatic patch. The 5,7-dihydrophenanthridine moiety facilitates π–π stacking with the side chain of Trp286 from PAS at the gorge opening. Meanwhile, this moiety also engages in hydrophobic interaction with Phe295, which tends to increase the binding fitness for the acyl pocket. Aside from the former moiety, 1,2,3,4-tetrahydroacridine at the opposite terminal provides π–π interaction with Trp86 and hydrophobic contact with Tyr337 where both of which are members of CAS in the catalytic site. Furthermore, long aliphatic chain linking the two aromatic moieties provide hydrophobic contact with several aromatic residues in the aromatic patch consisting of Tyr72, Tyr124, Trp286, Tyr337, Phe338 and Tyr341 (i.e., these residues are the members of PAS, CAS and constriction site of the gorge). Moreover, this chain contain several nitrogen atoms, which can act as hydrogen bond donor to Tyr124 and Tyr337 from PAS and CAS, respectively. This would tighten the binding between compound 28 and the active site gorge. The binding modality of this compound was shown in Fig. 9C.

It should be noted that compounds exhibiting strong binding fitness against AChE are those that interact with residues from both PAS and CAS at the entry and inner pocket of the gorge, respectively, as dual-binding site inhibitor through either π–π stacking or π-cation interaction together with hydrophobic contact. These compounds competes with the natural substrate in interacting with these residues. For non-covalent inhibitors, the aromatic moiety is preferred for occupying the interaction sites while hydrophobic moieties are preferred for making contact with the aromatic residues surrounding the catalytic site. Hydrogen bond donors such as secondary amine and heterocyclic ring can be employed for interacting with the oxygen atom on the side chain of Tyr residues. Interestingly, this finding is corroborated by the feature importance results obtained from the QSAR model as shown in Fig. 6 in which the aromatic moiety, C ONS bond, secondary mixed amine, heterocyclic ring and the hetero N non-basic moiety were found amongst the top ten important substructures that are essential for the bioactivity of AChE inhibitor.

Conclusion

In conclusion, twelve sets of fingerprint descriptors were used for constructing QSAR models and their performances were comparatively evaluated. It was observed that several fingerprint descriptors afforded good performance for the constructed models indicating that they could capture the feature space of AChE inhibitors. By taking advantage of the built-in feature importance estimator from RF known as the Gini index, the following important features that are critical for AChE inhibition were identified: secondary carbon (SubFPC2), rotatable bond (SubFPC302), aromatic (SubFPC274), C ONS bond (SubFPC295), secondary mixed amine (SubFPC32) and heterocyclic (SubFPC275). Results from molecular docking also support the aforementioned findings from the QSAR models in which the aromatic, heteroaromatic and heterocyclic rings were preferable moieties for interacting with the hydrophobic pocket of AChE. It is anticipated that the knowledge gained from this study could be used as general guidelines for the design of novel AChE inhibitors.

Supplemental Information

Outlying compounds as deduced from the Williams plot

DOI: 10.7717/peerj.2322/supp-1

Download

The initial and pre-processed data set of human AChE inhibitors as compiled from ChEMBL20

DOI: 10.7717/peerj.2322/supp-2

Download

R code for performing the multivariate analysis described in this study is provided in the enclosed R markdown files along with its output notebook in PDF file format

DOI: 10.7717/peerj.2322/supp-3

Download

[1] Aaviksaar A. 1990. QSAR in reactions of organophosphorus inhibitors with acetylcholinesterase. Phosphorus, Sulfur, and Silicon and the Related Elements 51(1–4):47-50

[2] Andersson CD, Hillgren JM, Lindgren C, Qian W, Akfur C, Berg L, Ekström F, Linusson A. 2014. Benefits of statistical molecular design, covariance analysis, and reference models in QSAR: a case study on acetylcholinesterase. Journal of Computer-Aided Molecular Design 29(3):199-215

[3] Beal MF. 1995. Aging, energy, and oxidative stress in neurodegenerative diseases. Annals of Neurology 38(3):357-366

[4] Bembenek SD, Keith JM, Letavic MA, Apodaca R, Barbier AJ, Dvorak L, Aluisio L, Miller KL, Lovenberg TW, Carruthers NI. 2008. Lead identification of acetylcholinesterase inhibitors-histamine H3 receptor antagonists from molecular modeling. Bioorganic & Medicinal Chemistry 16(6):2968-2973

[5] Birks J. 2006. Cholinesterase inhibitors for Alzheimer’s disease. Cochrane Database of Systematic Reviews (1):CD005593

[6] Bollback JP. 2006. SIMMAP: stochastic character mapping of discrete traits on phylogenies. BMC Bioinformatics 7:88

[7] Bourne Y, Taylor P, Marchot P. 1995. Acetylcholinesterase inhibition by fasciculin: crystal structure of the complex. Cell 83(3):503-512

[8] Breiman L. 2001. Random forests. Machine Learning 45(1):5-32

[9] Brookmeyer R, Johnson E, Ziegler-Graham K, Arrighi HM. 2007. Forecasting the global burden of Alzheimer’s disease. Alzheimer’s & Dementia 3(3):186-191

[10] Carhart RE, Smith DH, Venkataraghavan R. 1985. Atom pairs as molecular features in structure–activity studies: definition and applications. Journal of Chemical Information and Computer Sciences 25(2):64-73

[11] ChemAxon Kft. 2015. Standardizer. Version 15.9.14.0 software

[12] Chen Y, Fang L, Peng S, Liao H, Lehmann J, Zhang Y. 2012. Discovery of a novel acetylcholinesterase inhibitor by structure-based virtual screening techniques. Bioorganic & Medicinal Chemistry Letters 22(9):3181-3187

[13] Cheung J, Rudolph MJ, Burshteyn F, Cassidy MS, Gary EN, Love J, Franklin MC, Height JJ. 2012. Structures of human acetylcholinesterase in complex with pharmacologically important ligands. Journal of Medicinal Chemistry 55(22):10282-10286

[14] Deb PK, Sharma A, Piplani P, Akkinepally RR. 2012. Molecular docking and receptor-specific 3D-QSAR studies of acetylcholinesterase inhibitors. Molecular Diversity 16(4):803-823

[15] Durant JL, Leland BA, Henry DR, Nourse JG. 2002. Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences 42(6):1273-1280

[16] Dvir H, Silman I, Harel M, Rosenberry TL, Sussman JL. 2010. Acetylcholinesterase: from 3D structure to function. Chemico–Biological Interactions 187(1–3):10-22

[17] Eriksson L, Jaworska J, Worth AP, Cronin MT, McDowell RM, Gramatica P. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification-and regression-based QSARs. Environmental Health Perspectives 111(10):1361-1375

[18] Eriksson L, Johansson E. 1996. Multivariate design and modeling in QSAR. Chemometrics and Intelligent Laboratory Systems 34(1):1-19

[19] Fang J, Li Y, Liu R, Pang X, Li C, Yang R, He Y, Lian W, Liu A-L, Du G-H. 2015. Discovery of multitarget-directed ligands against Alzheimer’s disease through systematic prediction of chemical–protein interactions. Journal of Chemical Information and Modeling 55(1):149-164

[20] Fukuto TR. 1990. Mechanism of action of organophosphorus and carbamate insecticides. Environmental Health Perspectives 87:245-254

[21] Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. 2012. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Research 40(D1):D1100-D1107

[22] Giacoppo JO, CC França T, Kuča K, Cunha EFD, Abagyan R, Mancini DT, Ramalho TC. 2015. Molecular modeling and in vitro reactivation study between the oxime BI-6 and acetylcholinesterase inhibited by different nerve agents. Journal of Biomolecular Structure and Dynamics 33(9):2048-2058

[23] Golbraikh A, Tropsha A. 2002. Beware of q²! Journal of Molecular Graphics and Modelling 20(4):269-276

[24] Goldblum A, Yoshimoto M, Hansch C. 1981. Quantitative structure–activity relationship of phenyl N-methylcarbamate inhibition of acetylcholinesterase. Journal of Agricultural and Food Chemistry 29(2):277-288

[25] Gramatica P. 2007. Principles of QSAR models validation: internal and external. QSAR & Combinatorial Science 26(5):694-701

[26] Guo J, Hurley MM, Wright JB, Lushington GH. 2004. A docking score function for estimating ligand–protein interactions: application to acetylcholinesterase inhibition. Journal of Medicinal Chemistry 47(22):5492-5500

[27] Gupta S, Mohan CG. 2014. Dual binding site and selective acetylcholinesterase inhibitors derived from integrated pharmacophore models and sequential virtual screening. BioMed Research International 2014:291214

[28] Hall LH, Kier LB. 1995. Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. Journal of Chemical Information and Computer Sciences 35(6):1039-1045

[29] Harel M, Schalk I, Ehret-Sabatier L, Bouet F, Goeldner M, Hirth C, Axelsen P, Silman I, Sussman J. 1993. Quaternary ligand binding to aromatic residues in the active-site gorge of acetylcholinesterase. Proceedings of the National Academy of Sciences of the United States of America 90(19):9031-9035

[30] Huang W, Tang L, Shi Y, Huang S, Xu L, Sheng R, Wu P, Li J, Zhou N, Hu Y. 2011. Searching for the multi-target-directed ligands against Alzheimer’s disease: discovery of quinoxaline-based hybrid compounds with AChE, H3R and BACE 1 inhibitory activities. Bioorganic & Medicinal Chemistry 19(23):7158-7167

[31] James G, Witten D, Hastie T, Tibshirani R. 2013. An introduction to statistical learning: with applications in R. New York: Springer.

[32] Kennard RW, Stone LA. 1969. Computer aided design of experiments. Technometrics 11(1):137-148

[33] Kitz R, Wilson IB. 1962. Esters of methanesulfonic acid as irreversible inhibitors of acetylcholinesterase. Journal of Biological Chemistry 237(10):3245-3249

[34] Klekota J, Roth FP. 2008. Chemical substructures that enrich for biological activity. Bioinformatics 24(21):2518-2525

[35] Kryger G, Silman I, Sussman JL. 1999. Structure of acetylcholinesterase complexed with E2020 (Aricept^®): implications for the design of new anti-Alzheimer drugs. Structure 7(3):297-307

[36] Kuca K, Soukup O, Maresova P, Korabecny J, Nepovimova E, Klimova B, Honegr J, Ramalho TC, França TC. 2016. Current approaches against Alzheimer’s disease in clinical trials. Journal of the Brazilian Chemical Society 27(4):641-649

[37] Kuhn M. 2008. Building predictive models in R using the caret package. Journal of Statistical Software 28(5):1-26

[38] Laggner C. 2005. SMARTS patterns for functional group classification. Inte:Ligand Software-Entwicklungs und Consulting GmbH. software

[39] Lee S, Barron MG. 2016. A mechanism-based 3D-QSAR approach for classification and prediction of acetylcholinesterase inhibitory potency of organophosphate and carbamate analogs. Journal of Computer-Aided Molecular Design 30(4):347-363

[40] Lu S-H, Wu JW, Liu H-L, Zhao J-H, Liu K-T, Chuang C-K, Lin H-Y, Tsai W-B, Ho Y. 2011. The discovery of potential acetylcholinesterase inhibitors: a combination of pharmacophore modeling, virtual screening, and molecular docking studies. Journal of Biomedical Science 18(8) Article 22

[41] Mager P. 1983. QSAR applied to aging of phosphylated acetylcholinesterase. Pharmazie 38(4):271-272

[42] Morris JC, Cyrus PA, Orazem J, Mas J, Bieber F, Ruzicka BB, Gulanski B. 1998. Metrifonate benefits cognitive, behavioral, and global function in patients with Alzheimer’s disease. Neurology 50(5):1222-1230

[43] Mundy R, Bowman M, Farmer J, Haley T. 1978. Quantitative structure activity study of a series of substituted 0,0-dimethyl 0-(p-nitrophenyl) phosphorothioates and 0-analogs. Archives of Toxicology 41(2):111-123

[44] Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V. 2009. A practical overview of quantitative structure–activity relationship. EXCLI Journal 8(7):74-88

[45] NCBI. 2009. PubChem Substructure Fingerprint. Version 1.3 software

[46] Ordentlich A, Barak D, Kronman C, Flashner Y, Leitner M, Segall Y, Ariel N, Cohen S, Velan B, Shafferman A. 1993. Dissection of the human acetylcholinesterase active center determinants of substrate specificity. Identification of residues constituting the anionic site, the hydrophobic site, and the acyl pocket. Journal of Biological Chemistry 268(23):17083-17095

[47] Prado-Prado FJ, Escobar M, Garcia-Mera X. 2013. Review of bioinformatics and theoretical studies of acetylcholinesterase inhibitors. Current Bioinformatics 8(4):496-510

[48] Prado-Prado F, Garcia-Mera X, Escobar M, Alonso N, Caamano O, Yanez M, Gonzalez-Diaz H. 2012. 3D MI-DRAGON: new model for the reconstruction of US FDA drug-target network and theoretical-experimental studies of inhibitors of rasagiline derivatives for AChE. Current Topics in Medicinal Chemistry 12(16):1843-1865

[49] Puzyn T, Mostrag-Szlichtyng A, Gajewicz A, Skrzyński M, Worth AP. 2011. Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models. Structural Chemistry 22(4):795-804

[50] Quinn DM. 1987. Acetylcholinesterase: enzyme structure, reaction dynamics, and virtual transition states. Chemical Reviews 87(5):955-979

[51] Racchi M, Mazzucchelli M, Porrello E, Lanni C, Govoni S. 2004. Acetylcholinesterase inhibitors: novel activities of old molecules. Pharmacological Research 50(4):441-451

[52] Riniker S, Landrum GA. 2013. Open-source platform to benchmark fingerprints for ligand-based virtual screening. Journal of Cheminformatics 5 Article 26

[53] Roy K, Chakraborty P, Mitra I, Ojha PK, Kar S, Das RN. 2013. Some case studies on application of “ $r_{m}^{2}$ ” metrics for judging quality of quantitative structure–activity relationship predictions: emphasis on scaling of response data. Journal of Computational Chemistry 34(12):1071-1082

[54] Schrödinger LLC. 2015a. PyMOL Molecular Graphics System. Version 1.7.6.3 software

[55] Schrödinger LLC. 2015b. Maestro. Version 10.4 software

[56] Shen T, Tai K, Henchman RH, McCammon JA. 2002. Molecular dynamics of acetylcholinesterase. Accounts of Chemical Research 35(6):332-340

[57] Silman I, Sussman JL. 2008. Acetylcholinesterase: how is structure related to function? Chemico-Biological Interactions 175(1–3):3-10

[58] Simeon S, Möller R, Almgren D, Li H, Phanus-umporn C, Prachayasittikul V, Bülow L, Nantasenamat C. 2016. Unraveling the origin of splice switching activity of hemoglobin β-globin gene modulators via QSAR modeling. Chemometrics and Intelligent Laboratory Systems 151:51-60

[59] Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. 2003. The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences 43(2):493-500

[60] Su C, Lien E. 1980. QSAR of acetylcholinesterase inhibitors: a reexamination of the role of charge-transfer. Research Communications in Chemical Pathology and Pharmacology 29(3):403-415

[61] Tougu V. 2001. Acetylcholinesterase: mechanism of catalysis and inhibition. Current Medicinal Chemistry—Central Nervous System Agents 1(2):155-170

[62] Tropsha A, Gramatica P, Gombar VK. 2003. The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR & Combinatorial Science 22(1):69-77

[63] Trott O, Olson AJ. 2010. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry 31(2):455-461

[64] Uetrecht J, Trager W. 2007. Drug metabolism: chemical and enzymatic aspects. Boca Raton: CRC Press.

[65] Veber DF, Johnson SR, Cheng H-Y, Smith BR, Ward KW, Kopple KD. 2002. Molecular properties that influence the oral bioavailability of drug candidates. Journal of Medicinal Chemistry 45(12):2615-2623

[66] Veselinović J, Nikolić G, Trutić N, Živković J, Veselinović A. 2015. Monte Carlo QSAR models for predicting organophosphate inhibition of acetycholinesterase. SAR and QSAR in Environmental Research 26(6):449-460

[67] Wallace AC, Laskowski RA, Thornton JM. 1995. LIGPLOT: a program to generate schematic diagrams of protein–ligand interactions. Protein Engineering 8(2):127-134

[68] Walters D, Hopfinger A. 1986. Case studies of the application of molecular shape analysis to elucidate drug action. Journal of Molecular Structure 134(3–4):317-323