Three steps connect centuries of traditional knowledge to modern drug targets. Here's what happens at each stage — and why the convergence signal works.
The Discovery Funnel
Confidence ≥ 0.7 = composite score: binding evidence × literature co-occurrence × pathway centrality (each 0–1, multiplicative)
Why convergence across traditions is the first signal
Traditional medicine systems encode thousands of years of empirical observation. When traditions with limited historical cross-pollination document the same plant for the same therapeutic purpose, that convergence is a meaningful pharmacological signal — even accounting for documented trade-route contact, mechanism-level agreement is unlikely to be coincidental.
We aggregate usage data from four ethnobotanical corpora — IMPPAT, TCMBank, ETCM, and the Unani pharmacopeia — cross-referenced by plant name, family, and chemical synonym. Compound–organism pairs are bridged via LOTUS. LOTUS provides chemical structures; usage data comes exclusively from the ethnobotanical corpora. A plant documented across three or more traditions earns elevated priority in the compound pipeline — we require independent documentation, so cross-citations within a single tradition don't count.
Turmeric: 3 independent traditions →ayurveda
tcm
western
— teal = convergent theme across traditions
Scoring druggability from first principles
Every documented plant yields dozens to hundreds of phytochemicals. We profile each compound against Lipinski's Rule of Five — MW ≤ 500, LogP ≤ 5, ≤ 5 H-bond donors, ≤ 10 H-bond acceptors — and compute a Quantitative Estimate of Druglikeness (QED) from eight composite molecular properties.
QED ranges from 0 to 1; median QED for approved oral drugs is ~0.5 (Bickerton et al., 2012). We classify compounds as drug-like (QED ≥ 0.5), lead-like (QED ≥ 0.35, lower MW), or fragment-like (MW < 250 — small, high-efficiency starting points).
ar-Turmerone (QED 0.61) outscores curcumin on drug-likeness, illustrating why systematic profiling surfaces better candidates than literature prominence alone. (Curcumin scores well on QED but is a known PAINS compound — a reminder that no single metric replaces orthogonal screening.)
"When the targets from a plant's compounds align with the pathways its traditions documented — inflammation, neuroprotection, antimicrobial — that overlap is convergence at the molecular level."
The signal the platform is built to surface
Confidence scoring across three evidence sources
Protein targets are identified by cross-referencing STITCH (chemical–protein interactions from text mining and experiments) and ChEMBL (binding assay data curated from literature). Open Targets contributes pathway centrality scores for identified targets — informing one dimension of the composite confidence, not the compound–target binding signal itself.
For each compound–target pair we compute a composite confidence: binding evidence strength × literature co-occurrence × pathway centrality (each normalized 0–1; multiplicative scoring penalizes weak evidence in any single dimension). Scores above 0.7 indicate high confidence; 0.5–0.7 moderate. Edge weight in the network graph reflects that score.
COX-2
PTGS2 · Inflammation
NF-κB p65
RELA · Inflammation / Immunity
AKT1
AKT1 · PI3K / Cancer signaling
GSK-3β
GSK3B · Neuroprotection / Tau
The full platform goes deeper — formulation builder, ADMET profiles, scaffold analysis, and clinical pathway linkages.