From Molecules to Graphs: Graph Neural Network Project
In drug discovery, molecules are not just chemical formulas — they are graphs.
Atoms connect through bonds, forming complex structures. Instead of treating molecules as simple data points, I wanted to model them the way they truly are: as networks.
So I built a Graph Convolutional Network (GCN) to predict molecular toxicity using the Tox21 dataset.
The dataset contains thousands of chemical compounds, each labelled with 12 toxicity targets. It downloads automatically through PyTorch Geometric and converts SMILES strings into graph structures internally, allowing me to focus directly on modelling and learning.
What I did
- Converted molecules into graph structures (atoms = nodes, bonds = edges)
- Implemented a 2-layer GCN using PyTorch Geometric
- Trained the model on 12 toxicity prediction tasks
- Handled missing experimental labels properly
- Evaluated performance using ROC-AUC (instead of accuracy)
Result :
The model achieved a test ROC-AUC of ~0.76 using a simple 80/20 split.
What excited me most was seeing how the model gradually learned structural patterns in molecules, improving from 0.69 to 0.76 ROC-AUC over training.
This project strengthened my understanding of:
• Graph Neural Networks
• Multi-task learning
• Molecular data representation
• AI applications in computational chemistry
Here’s the full project code:
GitHub
Comments
Post a Comment