Skip to main content

Posts

Showing posts from May, 2026

DFT VS DFTB VS MLP IN AMSTERDAM MODELING SUITE

  Amsterdam Modeling Suite provides several computational methodologies, including Density Functional Theory (DFT), Density Functional Tight Binding (DFTB), and Machine Learning Potentials (MLPs), each designed to balance computational accuracy and efficiency differently. DFT is widely recognized as a highly accurate quantum mechanical approach for investigating electronic structures, molecular properties, reaction pathways, and spectroscopic behavior. By explicitly describing electron density through quantum mechanical formalisms, DFT offers reliable predictions of energies, optimized geometries, charge distributions, and chemical reactivity. Nevertheless, its computational requirements increase rapidly with system size, which generally limits its application to small and medium-sized systems or highly detailed mechanistic studies. DFTB serves as a computationally efficient approximation to DFT by employing parameterized interactions that simplify the electronic structure calculat...

Data Science in Chemistry to find Drug-Drug Similarity

  SMILES → RDKit Mol. → Morgan fingerprint → Dice Similarity 1. SMILES (Simplified Molecular Input Line Entry System): A short text string that represents a molecule’s structure. 2. RDKit Mol: Converts SMILE text into a molecule object. This makes it usable for computation. It represents molecules as a graph internally (atoms = nodes, bonds = edges). 3. Morgan fingerprint: A "structural barcode” generated from the molecule to capture what sub-structures/features exist in the drug. 4. Dice Similarity Coefficient (DSC): A 0-1 score that measures how much the two “barcodes” overlap (1 = very similar, 0 = not similar). Key-takeaways from my code results: Caffeine and Theophylline show a Dice similarity of 0.62, demonstrating that fingerprint-based similarity can quantify how structurally alike two drugs are. ✅ Why this is useful: This workflow is used in the chemistry industry to quickly compare compounds at scale i.e. supporting tasks like similarity search, clustering, and early-sta...

ML MODEL FOR DRUG TOXICITY

  ⚗️ How I Built ML Models to Predict Drug Toxicity Before Synthesis 40% of drugs fail clinically due to ADMET issues discovered too late. I developed an ML pipeline that predicts toxicity from molecular structure—enabling smarter synthesis decisions. THE PROBLEM 🎯 Traditional: Synthesise → Test → 60% fail ADMET. Each compound: $10K-50K to make and test. Goal: Predict Absorption, Distribution, Metabolism, Excretion, Toxicity computationally MY ML PIPELINE 1. PROBLEM DEFINITION 15 ADMET endpoints: hERG cardiotoxicity, hepatotoxicity, BBB penetration, CYP450 inhibition, solubility, permeability, clearance, half-life. Target: >80% accuracy, 50% lab reduction 2. DATA COLLECTION 150K molecules (ChEMBL, PubChem, ToxCast), 2.3M ADMET measurements, 70/15/15 split 3. MOLECULAR FEATURIZATION Morgan fingerprints (2048-bit), RDKit descriptors (LogP, TPSA, MW), graph representations (atoms=nodes, bonds=edges), 200+ properties 4. MODEL ARCHITECTURE Ensemble: Random Forest + XGBoost + Graph N...

GIT HUB GRAPH NEURAL NETWORK PROJECT

  From Molecules to Graphs: Graph Neural Network Project In drug discovery, molecules are not just chemical formulas — they are graphs. Atoms connect through bonds, forming complex structures. Instead of treating molecules as simple data points, I wanted to model them the way they truly are: as networks. So I built a Graph Convolutional Network (GCN) to predict molecular toxicity using the Tox21 dataset. The dataset contains thousands of chemical compounds, each labelled with 12 toxicity targets. It downloads automatically through PyTorch Geometric and converts SMILES strings into graph structures internally, allowing me to focus directly on modelling and learning. What I did - Converted molecules into graph structures (atoms = nodes, bonds = edges) - Implemented a 2-layer GCN using PyTorch Geometric - Trained the model on 12 toxicity prediction tasks - Handled missing experimental labels properly - Evaluated performance using ROC-AUC (instead of accuracy) Result : The model achi...

ADME PROCESS

  90% of drug candidates fail during development. It’s time we change the maths! If you’re in lead optimization or planning IND-enabling work, I can offer a free 48-hour ADMET benchmark on 5 compounds (SMILES) to show signal on your own chemistry—no obligation. We can provide end to end service for drug discovery and clinical stages of drug development. As the CSO at Prognica Labs, I engage with discovery teams weekly who are facing common industry challenges: unpredictable ADMET liabilities, late-stage toxicity failures, and lengthy synthesis cycles. We developed Prognica’s AI/ML platform to address these issues. By integrating predictive models directly into workflows, we assist biotechs and pharma companies in reducing synthesis cycles by 40-60% and accelerating hit-to-lead timelines by over 6 months. Currently, our partners are experiencing the fastest ROI in three key areas: 🔹 ADMET Hit-to-Lead Optimization: Simultaneous optimization of potency and ADMET properties, generatin...