← Back
A Ru(II) polypyridyl complex bearing aldehyde functions as a versatile synthetic precursor for long-wavelength absorbing photodynamic therapy photosensitizers.
TYPE Methods
PUBLISHED 15 November 2023
DOI 10.3389/fphar.2023.1260349
OPEN ACCESS
EDITED BY
Wenying He,
Hainan Normal University, China
REVIEWED BY
Ailin Zhao,
Sichuan University, China
Amit Kumar Halder,
University of Porto, Portugal
Prediction of histone deacetylase
inhibition by triazole compounds
based on artificial intelligence
Yiran Wang and Peijian Zhang*
College of Computer Science and Technology, Qingdao University, Qingdao, Shandong Province, China
*CORRESPONDENCE
Peijian Zhang,
peijianzh@126.com
RECEIVED 17 July 2023
ACCEPTED 30 October 2023
PUBLISHED 15 November 2023
CITATION
Wang Y and Zhang P (2023), Prediction of
histone deacetylase inhibition by triazole
compounds based on
artificial intelligence.
Front. Pharmacol. 14:1260349.
doi: 10.3389/fphar.2023.1260349
COPYRIGHT
© 2023 Wang and Zhang. This is an openaccess article distributed under the terms
of the Creative Commons Attribution
License (CC BY). The use, distribution or
reproduction in other forums is
permitted, provided the original author(s)
and the copyright owner(s) are credited
and that the original publication in this
journal is cited, in accordance with
accepted academic practice. No use,
distribution or reproduction is permitted
which does not comply with these terms.
A quantitative structure-activity relationship (QSAR) study was conducted to
predict the anti-colon cancer and HDAC inhibition of triazole-containing
compounds. Four descriptors were selected from 579 descriptors which have
the most obvious effect on the inhibition of histone deacetylase (HDAC). Four
QSAR models were constructed using heuristic algorithm (HM), random forest
(RF), radial basis kernel function support vector machine (RBF-SVM) and support
vector machine optimized by particle swarm optimization (PSO-SVM).
Furthermore, the robustness of four QSAR models were verified by K-fold
cross-validation method, which was described by Q2. In addition, the R2 of the
four models are greater than 0.8, which indicates that the four descriptors
selected are reasonable. Among the four models, model based on PSO-SVM
method has the best prediction ability and robustness with R2 of 0.954, root mean
squared error (RMSE) of 0.019 and Q2 of 0.916 for the training set and R2 of 0.965,
RMSE of 0.017 and Q2 of 0.907 for the test set. In this study, four key descriptors
were discovered, which will help to screen effective new anti-colon cancer drugs
in the future.
KEYWORDS
cancer, HDAC inhibition, quantitative structure-activity relationship, support vector
machine, particle swarm optimization
1 Introduction
Colon and rectal cancers are the most common gastrointestinal tumors (Chen et al.,
2019; Pan et al., 2023). Currently, colon cancer is one of the most common solid
malignancies, which is the third leading cause of cancer-related new cases and deaths
worldwide (Chen et al., 2019; Shi et al., 2022a; Shi et al., 2022b). Most patients with colon
cancer present with advanced disease, whose survival rate is very low. More than 95% of the
patients with colorectal cancer in the diagnosis of aged 50 or older, and the overall survival
rate for advanced, metastatic, and recurrent colon cancer is less than 50%. (Wang et al., 2015;
Shi et al., 2022a). Due to the numerous factors in the development and progression of colon
cancer, its pathogenesis is still unclear. Current treatment options for colon cancer include
surgery, chemotherapy, and molecularly targeted therapy. Treatments for colon cancer are
limited, and a large percentage of patients develop resistance to current treatments (Schmoll
and Stein, 2014). Therefore, the study of new therapeutic strategies and the development of
new drugs are the key points in the research of colon cancer (Xiong et al., 2023).
At present, an emerging therapeutic approach for colon cancer is the use of
corresponding histone deacetylase (HDAC) inhibitors (Tavares et al., 2017). HDAC is a
kind of epigenetic antitumor drug targets (Choi et al., 2023). Because of the important role of
HDAC in various biological processes such as cell proliferation, metastasis and apoptosis,
Frontiers in Pharmacology
01
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
experimental environment for 72 h to ensure the accuracy of the
experiment17. The compounds used in this study and their IC50
values are shown in Supplementary Table S1. All compounds were
randomly divided into the training set and test set in the ratio of 4:1, of
which 48 compounds in training set were used to construct models and
the remaining 12 compounds in test set were used to evaluate the
performance of the models (Fan et al., 2018).
HDAC inhibitors has been widely studied as a novel anticancer drug
target (Gillette, 2021; Roy et al., 2023). Though HDAC inhibitors
have not been approved by FDA to treat colon cancer, some
preclinical studies have discovered its efficacy to treat colon
cancer in vitro and in vivo (Kang et al., 2009; Asklund et al.,
2012; Yao et al., 2014). However, there are some limitations of
current approved HDAC inhibitors, such as pan-inhibition, etc.
Thus, it is in urgent need to develop novel HDAC inhibitors to
improve colon cancer treatment (Place et al., 2005; Sang et al., 2023)
Nan Sun et al. designed and synthesized a series of triazolecontaining compounds as novel HDAC inhibitors, which have
significant anti-proliferation effect on murine and human colon
cancer cell lines MC38 and HCT116 (Sun et al., 2023). In the studies
of discovering new HDAC inhibitors, the measurement of HDAC
inhibition IC50 values of compounds has great influence for the
design of new effective anti-colon cancer drugs. Since numerous
chemical experiments are costly and time-consuming, a new and
effective method for predicting the IC50 of untested compounds
should be found (Zhao et al., 2020).
In 1964, the concept of the quantitative structure-activity
relationship (QSAR) was first proposed by Free et al. and then
was widely used (Free and Wilson, 1964; Hansch and Steward, 1964;
Myint and Xie, 2010). QSAR is based on the general principle of
medicinal chemistry that the biological activities of a ligand or
compound is related to its molecular structure or properties, and
molecules with similar structures may have similar biological
activities (Myint and Xie, 2010). Model established based on
QSAR can reveal quantitative structure-activity relationship
between biological activities and part of descriptors of set of
known compounds with similar structure (Huang et al., 2021).
Then QSAR model can be used to predict the activities of
unknown compounds that have similar structure with the
previous known compounds, which has widely used in the
process of screening out efficient and novel drugs (Sun et al., 2023).
Therefore, four QSAR models were established in this study to
predict the HDAC inhibition IC50 of 60 selected compounds based
on descriptors selected by the heuristic method (HM). The methods
establishing models in study are HM, random forest (RF), support
vector machine with radial basis kernel function (RBF-SVM) and
RBF kernel function support vector machine with particle swarm
optimization (PSO-SVM). In addition, the HM method was also
used to select descriptors. Among the four models, the model
constructed by PSO-SVM has the best performance and strongest
robustness. In addition, the R2 of the established four models meet
the requirement of predicting IC50 of compounds, which indicates
the descriptors used in the models were enough and good for drug
design. Overall, this study will provide efficient guidance and help
for the screening of a new type of colon cancer drug.
2.2 Calculating descriptors
The calculation of molecular descriptors and the selection of
appropriate molecular descriptors are the key prerequisites of
establishing QSAR models, which directly affects the
performance of QSAR models, such as accuracy of prediction.
Comprehensive descriptors for structural and statistical analysis
(CODESSA) is currently the more common package to calculate
molecular descriptors and perform statistical analysis (Katritzky
et al., 2005; Zhang et al., 2013). The steps for calculating and
selecting molecular descriptors are as follows. Firstly, ChemDraw
was used to draw the structure of the compounds to get mol file
and skc file (Evans, 2014). Secondly, the compounds in mol
format were optimized using HyperChem software under the
guidance of the theory that the lower the energy of a molecule is,
the more stable its structure is. In the optimization process, MM+
molecular mechanical force field was used to preliminarily
optimize
the
compound,
and
then
semi-empirical
AM1 method was used to further optimize the compound to
obtain the most stable structure, which is beneficial to improve
the calculation accuracy of molecular descriptors (Lima et al.).
The optimized structures by HyperChem were stored in hin and
zmt formats as the input of MOPAC. After that, MOPAC was
used to generate mno files as the input files of CODESSA to
calculate descriptors. Five categories of molecular descriptors
were obtained using CODESSA, which are structural, topological,
geometric, electrostatic, and quantum chemical (Coi et al., 2006;
Madugula and Yarasi, 2017).
2.3 Linear model by HM
HM is a common and effective method for selecting descriptors
which was widely used at present (Wang et al., 2020; Gao et al.,
2022). This method is not limited by the size of data set, and is highly
efficient (Yang et al., 2023; Li et al., 2023). Therefore, HM was used
to select the appropriate descriptor from the specific descriptors
computed by CODESSA. This method can select the descriptors
responsible for activity from the descriptor set by building a multiple
linear regression models. The following descriptors should be
excluded before building linear regression models by HM. 1)
Special descriptors that not all compounds possess. 2)
Descriptors with correlation coefficients greater than 0.8, namely
collinear descriptors.
In this study, square of correlation coefficient (R2) and root
mean square error (RMSE) were used to evaluate and analyze the
performance of models established by HM, RF, SVM and PSOSVM. In addition, the robustness of the models were verified by
K-fold cross-validation.
2 Material and methods
2.1 Data set
The 60 compounds containing triazole studied in this paper are all
from the same literature, which eliminates unforeseen problems due to
data from different sources17]. The IC50 value of HDAC inhibition was
determined by Sun et al. after exposing the compound to the same
Frontiers in Pharmacology
02
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
FIGURE 1
Influence of the Number of descriptors on R2 and Rcv2.
2.4 Nonlinear model by RF
summarized as transforming the linearly inseparable samples of the
low-dimensional input space into the high-dimensional feature
space by using the nonlinear mapping algorithm, so that the
linear regression can be performed in the high-dimensional
feature space (Collobert and Bengio, 2001; Khemchandani and
Jayadeva, 2009; Huang and Zhao, 2018).
By introducing ε-insensitive loss function, regularization
constant C, relaxation variable ξ i , ξ^i , normal vector w,
displacement b to be determined, the SVR is formalized as
follows in Eq. 1, and the constraint as follows in Eq. 2:
The IC50 value of HDAC inhibition is influenced by many
factors, so the linear model cannot accurately predict the IC50 of
HDAC. Therefore, three kinds of nonlinear models were established
by RF, SVM and PSO-SVM. Random forest is a supervised machine
learning algorithm based on ensemble learning (Feng et al., 2019). It
can effectively reduce the risk of overfitting and is more conducive to
obtain a robust model. Therefore, it is a novel and efficient method
to establish QSAR nonlinear models (Zhang et al., 2015a; Fang et al.,
2022.).
The steps to build a RF regression model are: 1) First, build a
dataset containing the descriptor values and -lg (IC50) values for
training set and test set. 2) Build RF model. Set the number of
decision trees to control the RF’s behavior (Li et al., 2012; Song,
2015; Ao et al., 2019). 3) Train the RF using the training set. RF
method constructed multiple decision trees based on descriptor data
and IC50 values in training set, and performs feature selection and
partitioning in each tree. 4) Use the trained RF model to predict the
samples in the test set. In this model, the prediction results of each
decision tree were weighted to obtain the predicted IC50 value. 5)
Use a variety of performance indicators to evaluate prediction
accuracy and generalization ability of RF model.
f(x) min
m
w2 + Cξ i + ξ^i
(1)
i1
T
⎪
⎪
⎧
⎫
⎪
⎨ w · xiT+ b − yi ≤ ε + ξ i , ⎪
⎬
s.t.⎪ yi − w · xi − b ≤ ε + ξ^ ⎪
⎪
⎩ ξ ≥ 0,ξ^ ≥ 0,i 1, 2...m ⎪
⎭
i
(2)
Therefore, the final linear regression function of SVR is obtained
as shown in Eq. 3:
m
f(x) (α^i − αi )κ xTi x + b
(3)
i1
In Eq. 3, κ(xTi x) is the kernel function, αi and α^i are Lagrange
multipliers. Several kernel functions commonly used in support
vector machines include linear kernel function, polynomial
kernel function, radial basis kernel function and so on. The
core idea of RBF is to map each sample point to an infinite
dimensional feature space, so as to make linearly indivisible data
linearly separable. It is the most commonly used kernel function
and shown in Eq. 4:
2.5 Nonlinear model by RBF-SVM
Support vector machine (SVM) proposed by Vapnik and
colleagues in 1964 is a generalized linear classifier that classifies
data through supervised learning (Girosi, 1998). SVM can also be
used for regression, which is called support vector regression (SVR)
(Santamaria-Bonfil et al., 2016). The main idea of SVR can be
Frontiers in Pharmacology
1
w,b,ξ i ,ξ^ 2
κ(xi , x) exp −γxi − x2
03
(4)
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
FIGURE 2
The plot of measured and predicted -lg (IC50) by HM.
3 Result
where γ is a hyperparameter of a radial basis kernel function.
3.1 Results of HM
2.6 Nonlinear model by PSO-SVM
579 descriptors were calculated by CODESSA. To obtain
several descriptors most relevant to the HDAC inhibition, the
number of molecular descriptors in linear models were increased
from 1 to 7, and the corresponding R2 and Rcv2 were recorded.
Considering that excessive selection of descriptors is not
conducive to drug screening and design, the number of
descriptors were determined to be 4 of which R2 and Rcv2 both
reached about 0.9. The influence of the number of descriptors on
R2 and Rcv2 is shown in Figure 1.
The four selected molecular descriptors and their
physicochemical meanings are shown in Supplementary Table S2.
Their correlation coefficients are shown in Supplementary Table S3,
all of which are less than 0.8.
The multiple regression linear model established by HM, is
shown in Eq. 7.
Because complexity of optimizing many parameters in RBFSVM is high, particle swarm optimization (PSO) algorithm was
introduced in the parameter optimization process, which can
converge to the global optimal solution with high efficiency. PSO
was proposed by Kennedy and Eberhart in 1995, which is one of the
most widely used optimization algorithm (Eberhart and Shi, 2004;
Zhang et al., 2015b).
The basic concept of PSO is derived from the study of the
foraging behavior of birds (Gong et al., 2016; Bonyadi and
Michalewicz, 2017; Pervaiz et al., 2021). PSO can be expressed as:
each particle can be regarded as a search individual in the
N-dimensional search space, which iterates continuously, updates
the speed and position, and finally obtains the optimal solution
satisfying the termination condition (Li et al., 2002; Lin et al., 2008;
Subasi, 2013). The formula for PSO update speed and location is
shown below:
-lg(IC50 ) -0.351132-2.165127*d1 +0.940460*d2 -0.669793*d3
(7)
+0.800716*d4
Vid ωVid + C1 random(0, 1)(Pid − Xid )
+ C2 random(0, 1)Pgd − Xid
Xid Xid + Vid
where d1, d2, d3 and d4 represented MERHN, MREHN, MNRIN and
MVO, respectively.
The R2 and RMSE of the training set and the test set in the model
are 0.917, 0.832 and 0.044, 0.056 respectively. The plot for HM
model is shown in Figure 2. In addition, the Q2 of the training set and
the test set are 0.832 and 0.804 respectively through K-fold crossvalidation, where the K value is 5.
(5)
(6)
where ω is the inertial factor that specifies search step size. C1 and C2
are acceleration constants, Xid represents the d-dimensional position
of each particle i, and Pgd represents the D-dimension of the global
optimal solution.
Frontiers in Pharmacology
04
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
FIGURE 3
The plot of measured and predicted -lg (IC50) by RF.
underfit. The greater the ε is, the less the support vector is, and the
greater the support vector is.
Grid search is the simplest and most widely used
hyperparameter search algorithm, which determines the optimal
value by looking for all the points within the search range. The
optimal values of C and γ were determined by using grid search. The
optimal values of C, γ and ε are 9.11,10.24 and 0.1, respectively. The
optimal prediction results of RBF-SVM are shown in Figure 4. The
R2 and RMSE of the training set and prediction set using RBF-SVM
are 0.957,0.022 and 0.944,0.025, respectively. When K-fold (K = 5)
cross-validation was executed, Q2 for the training set and the test set
are 0.897 and 0.871, respectively.
3.2 Results of RF
In order to ensure that the results of HM, RF, RBF-SVM and
PSO-SVM can be compared with each other, the 4 descriptors
selected by HM were used to build other three models by RF,
RBF-SVM and PSO-SVM.
When using the RF method, it is necessary to determine the values
of some parameters in RF, such as the number of decision trees n, the
maximum depth of the tree d, the number of samples contained in each
internal node s1, and the number of samples contained in each leaf node
s2. Among them, the larger n is, the better the effect of the model tends
to be. However, when n is larger to a certain extent, the decision
boundary is reached, and the accuracy of the random forest usually
stops rising or starts to fluctuate.
Set n to 100, d to the default value of the python library, s1 to 2,
and s2 to 1. The R2 and RMSE for the training set are 0.982 and 0.009,
and the R2 and RMSE for the test set are 0.841 and 0.063. The
optimal prediction results of RF are shown in Figure 3. The results
Q2 of K-fold cross-validation for training set and test set are
0.886 and 0.823 respectively, where the K value is 5.
3.4 Results of PSO-SVM
Many parameters need to be determined during the modeling
process using RBF-SVM, and PSO algorithm with character of easy
implement, high precision, fast convergence can quickly find the
optimal solution. Therefore, PSO algorithm was used instead of grid
search to find the optimal parameters. The particle swarm size and
number of iterations were set to 600 and 1,500, respectively.
The optimal values of C, γ and ε are 9.13, 1.82 and 0.0,
respectively. The R2 and RMSE of the training and prediction
sets using RBF-SVM were 0.966,0.018 and 0.975,0.012, and Q2
were 0.903 and 0.896, respectively. The optimal prediction results
of PSO-SVM are shown in Figure 5.
3.3 Results of RBF-SVM
The RBF-SVM method needs to determine the values of some
parameters, such as the penalty coefficient C, ε-insensitive loss
function, and the parameter γ of the kernel function. As the
penalty coefficient used to control the loss function, if C is too
large, the penalty for false regression predictions is too large, which
is easy to lead to overfitting. If C is too small, the penalty for false
regression predictions is too small, which easily leads to underfitting.
As a parameter of the radial basis kernel function, the larger the γ is,
the easier it is to overfit, and the smaller the γ is, the easier it is to
Frontiers in Pharmacology
4 Comparison of different results
The prediction results of the four models are shown in
Supplementary Table S4, and the K-fold cross-validation results
05
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
FIGURE 4
The plot of measured and predicted -lg (IC50) by RBF-SVM.
FIGURE 5
The plot of measured and predicted -lg (IC50) by PSO-SVM.
RBF-SVM and PSO-SVM methods are stronger than that of the
linear model HM. However, the prediction accuracy of the model
built by RF method on the training set was very high, but the
prediction accuracy of the test set was relatively ordinary, which
indicates that the model built by RF method is overfitting. The
prediction accuracy of the models built by RBF-SVM and PSO-SVM
are shown in Supplementary Table S5, where the value of K was set
to 5. Moreover, it can be seen from Supplementary Table S4 that the
RMSE of PSO-SVM is the smallest, indicating the best degree of
fitting.
It can be seen from Supplementary Table S4 that the prediction
accuracy and robustness of the nonlinear model established by RF,
Frontiers in Pharmacology
06
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
robustness, indicating that the models constructed by RBF-SVM
and PSO-SVM have a broad application prospect in the study of the
inhibitory effect of triazole-containing compounds on colon cancer.
In addition, this study revealed 4 key descriptors that influence the
inhibition of HDAC: Max e–e repulsion for a H–N bond, Max
resonance energy for a H-N bond, Max nucleoph react index for a N
atom and Min valency of a O atom.
In addition to some traditional and common QSAR models, this
study also used particle swarm optimization to optimize the SVM
model, which greatly improved the prediction accuracy, making the
accuracy of the test set increased to 0.975, which will provide
guidance and help for the future research on anti-colon cancer
drugs.
are very high. Among the four models, the model established by
PSO-SVM has the strongest prediction ability and stability, for the
PSO algorithm can efficiently find more optimized parameters.
5 Discussion
In this study, 4 descriptors were selected from 579 descriptors of
60 compounds containing triazole, and 4 QSAR models were
established using HM, RF, SVM and PSO-SVM methods to
predict the HDAC inhibition. Among the four models, the
prediction ability and stability of PSO-SVM are the best,
indicating that the model established by PSO-SVM method has a
broad application prospect in searching for compounds with
significant anti-colon cancer effect, and can be used as an
effective method to assist drug design. In addition, this study also
revealed four descriptors with significant inhibitory effects on
HDAC: Max e-e repulsion for a H-N bond, Max resonance
energy for a H-N bond, Max nucleoph react index for a N atom
and Min valency of a O atom.
Among the four descriptors, Max e-e repulsion for a h-n bond
is the one that has the most significant inhibiting effect on
HDAC. Because it is always selected as the top node and has
the highest Gini coefficient in the model built by RF. It reflects the
repulsion force between electrons and plays a key role in forming
the momentum distribution of the final correlated double
electrons. The second descriptor is Max resonance energy for
an H-N bond. The bond resonance energy represents the
contribution of a given bond in a molecule to the topological
resonance energy. If a molecule has one or more bonds with a
large negative bond resonance energy, the molecule is very
chemically reactive. The third descriptor Max nucleoph react
index for a N atom is a quantum chemical descriptor, which
indicates the strength of covalent bond in the molecule,
represents the maximum nuclear reaction index of N atoms.
Min valency of a O atom is a quantum chemical descriptor whose
scope goes beyond the strength of intramolecular adhesion and
accounts for the stability of the molecule and its conformational
flexibility. Attempts to increase the valence of the O atom in the
substituent may help to reduce the IC50 value of HDAC
inhibition.
The compounds numbered 20 and 27 in Supplementary
Table S1 have lower IC 50 values, so other similar compounds
with the above descriptors may be novel anti-colon cancer
inhibitors that could be designed as potential drugs. Overall,
this study revealed four descriptors with significant HDAC
inhibition, which could help in the design of novel anti-colon
cancer drugs in the future.
Data availability statement
The original contributions presented in the study are included in
the article/Supplementary Material, further inquiries can be directed
to the corresponding author.
Author contributions
YW: Data curation, Methodology, Validation, Writing–original draft.
PZ: Formal Analysis, Supervision, Validation, Writing–review and editing.
Funding
The author(s) declare that no financial support was received for
the research, authorship, and/or publication of this article.
Conflict of interest
The authors declare that the research was conducted in the
absence of any commercial or financial relationships that could be
construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the
authors and do not necessarily represent those of their affiliated
organizations, or those of the publisher, the editors and the
reviewers. Any product that may be evaluated in this article, or
claim that may be made by its manufacturer, is not guaranteed or
endorsed by the publisher.
6 Conclusion
Supplementary material
PyCharm Community Edition 2022.2.1 was used for
experiments, and the scikit learn library was used to build
machine learning models. The models established by RBF-SVM
and PSO-SVM have good prediction performance and strong
Frontiers in Pharmacology
The Supplementary Material for this article can be found online
at: https://www.frontiersin.org/articles/10.3389/fphar.2023.1260349/
full#supplementary-material
07
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
References
Khemchandani, R., and Jayadeva, C. S. (2009). Regularized least squares fuzzy support
vector regression for financial time series forecasting. EXPERT Syst. Appl. 36 (1),
132–138. doi:10.1016/j.eswa.2007.09.035
Ao, Y. L., Li, H. Q., Zhu, L. P., Ali, S., and Yang, Z. G. (2019). The linear random
forest algorithm and its advantages in machine learning assisted logging
regression modeling. J. PETROLEUM Sci. Eng. 174, 776–789. doi:10.1016/j.
petrol.2018.11.067
Li, A., Qin, Z., Bao, F., and He, S. (2002). Particle swarm optimization algorithms.
Comput. Eng. Appl. 38 (21), 1–3.
Asklund, T., Kvarnbrink, S., Holmlund, C., Wibom, C., Bergenheim, T., Henriksson,
R., et al. (2012). Synergistic killing of glioblastoma stem-like cells by bortezomib and
HDAC inhibitors. ANTICANCER Res. 32 (7), 2407–2413. doi:10.1093/annonc/mds166
Li, G. L., Wang, X. Q., Li, A. Q., and Zhang, P. J. (2023). QSAR study on the IC50 of
thiosemicarbazone derivatives as PC-3 inhibitors based on mixed kernel function
support vector machine. Lat. Am. J. Pharm. 42 (3), 543–553.
Bonyadi, M. R., and Michalewicz, Z. (2017). Particle swarm optimization for single
objective continuous space problems: a review. Evol. Comput. 25 (1), 1–54. doi:10.1162/
EVCO_r_00180
Li, Z., Zhang, T., and Wu, X. (2012). Methodology of regression by random forest and
its application on metabolomics. Chin. J. Health Statistics 29 (2), 158–160.
Chen, N., Chen, J., Yao, B., and Li, Z. G. (2018). QSAR study on antioxidant
tripeptides and the antioxidant activity of the designed tripeptides in free radical
systems. MOLECULES 23 (6), 1407. doi:10.3390/molecules23061407
Lima, N., Rocha, G., Freire, R., and Simas, A. (2018). RM1 semiempirical model:
chemistry, pharmaceutical research, molecular biology and materials science. J. Braz.
Chem. Soc. doi:10.21577/0103-5053.20180239
Chen, Z., Li, K., Yin, X., Li, H., Li, Y., Zhang, Q., et al. (2019). Lower expression of
gelsolin in colon cancer and its diagnostic value in colon cancer patients. J. Cancer 10
(5), 1288–1296. doi:10.7150/jca.28529
Lin, S.-W., Ying, K.-C., Chen, S.-C., and Lee, Z.-J. (2008). Particle swarm
optimization for parameter determination and feature selection of support
vector machines. Expert Syst. Appl. 35 (4), 1817–1824. doi:10.1016/j.eswa.2007.
08.088
Choi, J., Hwang, J., Ramalingam, M., Jeong, H. S., and Jang, S. (2023). Effects of
HDAC inhibitors on neuroblastoma SH-SY5Y cell differentiation into mature neurons
via the Wnt signaling pathway. BMC Neurosci. 24 (1), 28. doi:10.1186/s12868-02300798-0
Madugula, S. S., and Yarasi, S. (2017). Molecular design of porphyrin dyes for dye
sensitized solar cells: a quantitative structure property relationship study. Int.
J. QUANTUM Chem. 117 (14), e25385. doi:10.1002/qua.25385
Coi, A., Massarelli, I., Murgia, L., Saraceno, M., Calderone, V., and Bianucci, A. M.
(2006). Prediction of hERG potassium channel affinity by the CODESSA approach.
Bioorg. Med. Chem. 14 (9), 3153–3159. doi:10.1016/j.bmc.2005.12.030
Myint, K. Z., and Xie, X. Q. (2010). Recent advances in fragment-based QSAR and
multi-dimensional QSAR methods. Int. J. Mol. Sci. 11 (10), 3846–3866. doi:10.3390/
ijms11103846
Collobert, R., and Bengio, S. (2001). SVMTorch: support vector machines for largescale regression problems. J. Mach. Learn. Res. 1 (2), 143–160. doi:10.1162/
15324430152733142
Pan, G., Zhang, P., Chen, A., Deng, Y., Zhang, Z., Lu, H., et al. (2023). Aerobic
glycolysis in colon cancer is repressed by naringin via the HIF1A pathway. J. Zhejiang
Univ. Sci. B. Mar. 24 (3), 221–231. doi:10.1631/jzus.B2200221
Eberhart, R. C., and Shi, Y. H. (2004). Guest editorial special issue on particle swarm
optimization. IEEE Trans. Evol. Comput. JUN 8 (3), 201–203. doi:10.1109/tevc.2004.
830335
Pervaiz, S., Ul-Qayyum, Z., Bangyal, W. H., Gao, L., and Ahmad, J. (2021). A
systematic literature review on particle swarm optimization techniques for medical
diseases detection. Comput. Math. METHODS Med. 2021, 2021–2110. doi:10.1155/
2021/5990999
Evans, D. A. (2014). History of the harvard ChemDraw project. Angew. CHEMIEINTERNATIONAL Ed. 53 (42), 11140–11145. doi:10.1002/anie.201405820
Fan, T. J., Sun, G. H., Zhao, L. J., Cui, X., and Zhong, R. G. (2018). QSAR and
classification study on prediction of acute oral toxicity of N-nitroso compounds. Int.
J. Mol. Sci. 19 (10), 3015. doi:10.3390/ijms19103015
Place, R. F., Noonan, E. J., and Giardina, C. (2005). HDAC inhibition prevents NFkappa B activation by suppressing proteasome activity: down-regulation of proteasome
subunit expression stabilizes I kappa B alpha. Biochem. Pharmacol. 70 (3), 394–406.
doi:10.1016/j.bcp.2005.04.030
Fang, Z. J., Yu, X. L., and Zeng, Q. (2022). Random forest algorithm-based
accurate prediction of chemical toxicity to Tetrahymena pyriformis.
TOXICOLOGY, 480. doi:10.1016/j.tox.2022.153325
Roy, R., Ria, T., RoyMahaPatra, D., and Uh, Sk (2023). Single inhibitors versus dual
inhibitors: role of HDAC in cancer. ACS Omega 8 (19), 16532–16544. doi:10.1021/
acsomega.3c00222
Feng, W., Boukir, S., and Huang, W. (2019). “Ieee. MARGIN-BASED random forest
for imbalanced land cover classification,” in Ieee international geoscience and remote
sensing symposium (China: IGARSS), 3085–3088.
Sang, D. M., Na, I. H., Anh, D. T., Thi Mai Dung, D., Thi Thu Hang, N., PhuongAnh, N. T., et al. (2023). Novel (E)-3-(3-Oxo-4-substituted-3,4-dihydro-2H-benzo
[b] [1,4]oxazin-6-yl)-N-hydroxypropenamides as histone deacetylase inhibitors:
design, synthesis and bioevaluation. Chem. Biodivers. 20. doi:10.1002/cbdv.
202201030
Free, S. M., Jr., and Wilson, J. W. (1964). A MATHEMATICAL CONTRIBUTION
TO STRUCTURE-ACTIVITY STUDIES. J. Med. Chem. 7, 395–399. doi:10.1021/
jm00334a001
Santamaria-Bonfil, G., Reyes-Ballesteros, A., and Gershenson, C. (2016). Wind speed
forecasting for wind farms: a method based on support vector regression. Renew.
ENERGY 85, 790–809. doi:10.1016/j.renene.2015.07.004
Gao, Z., Xia, R. Z., and Zhang, P. J. (2022). Prediction of anti-proliferation effect
of [1,2,3]Triazolo[4,5-d]pyrimidine derivatives by random forest and mix-kernel
function SVM with PSO. Chem. Pharm. Bull. 70 (10), 684–693. doi:10.1248/cpb.
c22-00376
Schmoll, H. J., and Stein, A. (2014). COLORECTAL CANCER IN 2013 towards
improved drugs, combinations and patient selection. Nat. Rev. Clin. Oncol. 11 (2),
79–80. doi:10.1038/nrclinonc.2013.254
Gillette, T. G. (2021). HDAC inhibition in the heart: erasing hidden fibrosis.
Circulation 143 (19), 1891–1893. doi:10.1161/CIRCULATIONAHA.121.054262
Shi, Y., Li, J., Tang, M., Liu, J., Zhong, Y., and Huang, W. (2022a). CircHADHAaugmented autophagy suppresses tumor growth of colon cancer by regulating
autophagy-related gene via miR-361. Front. Oncol. 12, 937209. doi:10.3389/fonc.
2022.937209
Girosi, F. (1998). An equivalence between sparse approximation and support vector
machines. NEURAL Comput. 10 (6), 1455–1480. doi:10.1162/089976698300017269
Gong, Y. J., Li, J. J., Zhou, Y. C., Chung, H. S. H., Shi, Y. H., et al. (2016). Genetic
learning particle swarm optimization. IEEE Trans. Cybern. 46 (10), 2277–2290. doi:10.
1109/TCYB.2015.2475174
Shi, Y., Li, J. Y., Tang, M., Liu, J. W., Zhong, Y. L., and Huang, W. (2022b).
CircHADHA-augmented autophagy suppresses tumor growth of colon cancer by
regulating autophagy-related gene via miR-361. Front. Oncol. 12, 12. doi:10.3389/
fonc.2022.937209
Hansch, C., and Steward, A. R. (1964). THE USE OF SUBSTITUENT
CONSTANTS IN THE ANALYSIS OF THE STRUCTURE-ACTIVITY
RELATIONSHIP IN PENICILLIN DERIVATIVES. J. Med. Chem. 7, 691–694.
doi:10.1021/jm00336a001
Song, J. (2015). Bias corrections for Random Forest in regression using
residual rotation. J. KOREAN Stat. Soc. 44 (2), 321–326. doi:10.1016/j.jkss.
2015.01.003
Huang, T., Sun, G., Zhao, L., Zhang, N., Zhong, R., and Peng, Y. (2021).
Quantitative structure-activity relationship (QSAR) studies on the toxic effects
of nitroaromatic compounds (NACs): a systematic review. Int. J. Mol. Sci. Aug 9
(16), 22. doi:10.3390/ijms22168557
Subasi, A. (2013). Classification of EMG signals using PSO optimized SVM for
diagnosis of neuromuscular disorders. Comput. Biol. Med. Jun 43 (5), 576–586. doi:10.
1016/j.compbiomed.2013.01.020
Huang, Y., and Zhao, L. (2018). Review on landslide susceptibility mapping
using support vector machines. CATENA 165, 520–529. doi:10.1016/j.catena.2018.
03.003
Sun, N., Yang, K., Yan, W., Yao, M., Yu, C., Duan, W., et al. (2023). Design and
synthesis of triazole-containing HDAC inhibitors that induce antitumor effects and
immune response. J. Med. Chem. 66 (7), 4802–4826. doi:10.1021/acs.jmedchem.
2c01985
Kang, M. R., Kang, J. S., Han, S. B., Kim, J. H., Kim, D. M., Lee, K., et al. (2009). A
novel delta-lactam-based histone deacetylase inhibitor, KBH-A42, induces cell cycle
arrest and apoptosis in colon cancer cells. Biochem. Pharmacol. 78 (5), 486–494. doi:10.
1016/j.bcp.2009.05.010
Tavares, M. T., Shen, S., Knox, T., Hadley, M., Kutil, Z., Bařinka, C., et al. (2017).
Synthesis and pharmacological evaluation of selective histone deacetylase 6 inhibitors in
melanoma models. ACS Med. Chem. Lett. 8 (10), 1031–1036. doi:10.1021/
acsmedchemlett.7b00223
Katritzky, A. R., Kuanar, M., Fara, D. C., Karelson, M., Acree, W. E., Solov’ev, V. P.,
et al. (2005). QSAR modeling of blood: air and tissue: air partition coefficients using
theoretical descriptors. Bioorg. Med. Chem. 13 (23), 6450–6463. doi:10.1016/j.bmc.2005.
06.066
Frontiers in Pharmacology
Wang, J., Du, Y., Liu, X. M., Cho, W. C., and Yang, Y. X. (2015). MicroRNAs as
regulator of signaling networks in metastatic colon cancer. BIOMED Res. Int. 2015,
823620. doi:10.1155/2015/823620
08
frontiersin.org
Wang and Zhang
10.3389/fphar.2023.1260349
Wang, Y., Liu, Z., Qu, A. L., Zhang, P. J., Si, H. Z., and Zhai, H. L. (2020). Study of
tacrine derivatives for acetylcholinesterase inhibitors based on artificial intelligence. Lat.
Am. J. Pharm. 39 (6), 1159–1170.
Zhang, H., Li, J., and Kim, C. K. (2013). Quantitative structure-properties relationship
studies on physicochemical properties of organic molecules using CODESSA. ASIAN
J. Chem. 25 (10), 5670–5672. doi:10.14233/ajchem.2013.oh58
Xiong, X., Wang, S., Gao, Z., and Ye, Y. (2023). C6orf15 acts as a potential novel
marker of adverse pathological features and prognosis for colon cancer. Pathol. Res.
Pract. 245, 154426. doi:10.1016/j.prp.2023.154426
Zhang, L., Wang, P. K., Jiang, T. Y., Fan, G. H., and Dan, C. H. (2015a). Ieee. Prediction
of Torpedo initial velocity based on random forests regression. Int. Conf. INTELLIGENT
HUMAN-MACHINE Syst. Cybern. IHMSC I, 337–339. doi:10.1109/IHMSC.2015.17
Yao, Y. W., Liao, C. H., Li, Z., Wang, Z., Sun, Q., Liu, C., et al. (2014). Design,
synthesis, and biological evaluation of 1, 3-disubstituted-pyrazole derivatives as new
class I and IIb histone deacetylase inhibitors. Eur. J. Med. Chem. 86, 639–652. doi:10.
1016/j.ejmech.2014.09.024
Zhang, Y. D., Wang, S. H., and Ji, G. L. (2015b). A comprehensive survey on particle
swarm optimization algorithm and its applications. Math. PROBLEMS Eng. 2015, 1–38.
doi:10.1155/2015/931256
Zhao, Y. T., Liu, X. Q., Ouyang, J., Wang, Y., Xu, S., Tian, D., et al. (2020). Studies on
the IC50 of metabolically stable 1-(3,3-diphenylpropyl)-piperidinyl amides and ureas as
human CCR5 receptor antagonists based on QSAR. Lett. DRUG Des. Discov. 17 (8),
1036–1046. doi:10.2174/1570180817666200320105725
Yang, X., Qiu, H., Zhang, Y., and Zhang, P. (2023). Quantitative structure–activity
relationship study of amide derivatives as xanthine oxidase inhibitors using machine
learning. Front. pharmacol., 14.
Frontiers in Pharmacology
09
frontiersin.org