Machine-Learning-based prediction of human SERT protein ligand affinity using molecular docking and interaction analysis

Papanagnou, Dimitrios

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19694

Title:	Machine-Learning-based prediction of human SERT protein ligand affinity using molecular docking and interaction analysis
Authors:	Papanagnou, Dimitrios Ματσόπουλος Γιώργος
Keywords:	Serotonin Transporter (SERT) Molecular Docking Supervised Classification Machine Learning Binding Affinity
Issue Date:	25-Jun-2025
Abstract:	The following thesis presents a multi-disciplinary computational approach for classifying potential inhibitors of the human serotonin transporter (SERT) into three distinct categories: strong binders, moderate binders, and non-binders. SERT is the primary target for many antidepressants. The pipeline integrates several steps including molecular docking, molecular descriptor analysis, residue-level interaction profiling and creation of a supervised machine learning model in order to extract and clarify ligand-SERT interactions. A total of 74 compounds with and without known pharmacological action were studied that belong mainly to wide antidepressant categories, such as SSRIs, SNRIs, TCAs and other unrelated categories which are considered non-binders. The categorization of these ligands into the 3 classes was assigned based on the available inhibition constant values (Ki) with human SERT receptor from authorized pharmacological sources. Initially, molecular docking was employed with the aid of “AutoDock Vina” and “Chimera” software to generate the top ten binding poses for each ligand. The validity of the docking process and protocol was assessed by comparing the predicted binding conformation of the known SSRI drug “Paroxetine” with the baseline crystallographic structure from Protein Data Bank (PDB: 5I6X), resulting in a nearly perfect alignment. A custom Python script was applied to select the top five out of ten poses by ranking them based on their binding affinity and root-mean-square deviation values (RMSD). Extensive molecular and residue details were obtained using “BIOVIA Discovery Studio”, including “Surface_Area”, geometric angles and distance-based features. Statistical analyses were conducted to examine the correlations of features with the target variable, which is the class label and to detect potential multicollinearity among them. Notably, for strong binders, hydrophobic residues, such as “ALA_173”, “ILE_172”, and “PHE_341” were found to be critical. Apart from these, distinct distributions of “Polar_Surface_Area” and other angular features like “ANGLE_HAY” and “GAMMA” were observed. Several machine learning algorithms were trained including Random Forest, XGBoost, LightGBM, Logistic Regression, SVM and Voting Classifier. Nested cross-validation technique was integrated to minimize the risk of overfitting, however performance was moderate, due to the overlapping descriptor distributions between moderate and adjacent classes. Tree-based models outperformed, while at the same time facilitated interpretability of model decisions through SHAP summary and partial dependence plots. These plots highlighted the most predictive and important features across “STRONG BINDING” and “MODERATE BINDING” classes and confirmed that moderate binders confused the model. Despite the controversial success of the models used, the assumptions and limitations under which the present thesis was conducted, are outlined. Most decisive of them is the limited sample size, the static docking simulations and the custom script that selected the five best poses. Nevertheless, the study suggests for future work to incorporate molecular dynamics simulations from a wider range, include more targeted receptors for docking that are responsible for antidepressant activity, such as NET and DAT and molecular fingerprints that capture atomic level interactions. By this method, the classification accuracy and validity of results will be indisputable. This thesis lays a foundation for an innovative plan for detecting potential antidepressants drugs with the aid of several computational tools, but it requires a lot of optimizations to be considered reliable.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19694
Appears in Collections:	Μεταπτυχιακές Εργασίες - M.Sc. Theses

Files in This Item:

File	Description	Size	Format
Papanagnou_Dimitrios_25_June_2025.pdf	Diploma Thesis	7.69 MB	Adobe PDF	View/Open

Show full item record