SMILES → RDKit Mol. → Morgan fingerprint → Dice Similarity
1. SMILES (Simplified Molecular Input Line Entry System): A short text string that represents a molecule’s structure.
2. RDKit Mol: Converts SMILE text into a molecule object. This makes it usable for computation. It represents molecules as a graph internally (atoms = nodes, bonds = edges).
3. Morgan fingerprint: A "structural barcode” generated from the molecule to capture what sub-structures/features exist in the drug.
4. Dice Similarity Coefficient (DSC): A 0-1 score that measures how much the two “barcodes” overlap (1 = very similar, 0 = not similar).
Key-takeaways from my code results:
Caffeine and Theophylline show a Dice similarity of 0.62, demonstrating that fingerprint-based similarity can quantify how structurally alike two drugs are.
✅ Why this is useful: This workflow is used in the chemistry industry to quickly compare compounds at scale i.e. supporting tasks like similarity search, clustering, and early-stage screening.
Comments
Post a Comment