The Discovery Pipeline

How We Trace Plants
to Protein Targets

Three steps connect centuries of traditional knowledge to modern drug targets. Here's what happens at each stage — and why the convergence signal works.

The Discovery Funnel

28KMedicinal Plants768KSource CompoundsPIPELINE FILTERSConvergence ranking (≥ 3 traditions = priority)QED + Lipinski RO5 screeningSTITCH · ChEMBL · Open Targetsconfidence ≥ 0.7~155Khigh-confidencecompound–target pairsof 940K total · 1 in 6 clears the threshold

Confidence ≥ 0.7 = composite score: binding evidence × literature co-occurrence × pathway centrality (each 0–1, multiplicative)

I

Traditional Knowledge

Why convergence across traditions is the first signal

Traditional medicine systems encode thousands of years of empirical observation. When traditions with limited historical cross-pollination document the same plant for the same therapeutic purpose, that convergence is a meaningful pharmacological signal — even accounting for documented trade-route contact, mechanism-level agreement is unlikely to be coincidental.

We aggregate usage data from four ethnobotanical corpora — IMPPAT, TCMBank, ETCM, and the Unani pharmacopeia — cross-referenced by plant name, family, and chemical synonym. Compound–organism pairs are bridged via LOTUS. LOTUS provides chemical structures; usage data comes exclusively from the ethnobotanical corpora. A plant documented across three or more traditions earns elevated priority in the compound pipeline — we require independent documentation, so cross-citations within a single tradition don't count.

Turmeric: 3 independent traditions →

ayurveda

  • inflammation
  • skin disorders
  • digestive aid
  • wound healing

tcm

  • blood stasis
  • pain relief
  • jaundice
  • menstrual disorders

western

  • anti-inflammatory
  • antioxidant
  • joint health
  • digestive support

teal = convergent theme across traditions

II

Bioactive Compounds

Scoring druggability from first principles

Every documented plant yields dozens to hundreds of phytochemicals. We profile each compound against Lipinski's Rule of Five — MW ≤ 500, LogP ≤ 5, ≤ 5 H-bond donors, ≤ 10 H-bond acceptors — and compute a Quantitative Estimate of Druglikeness (QED) from eight composite molecular properties.

QED ranges from 0 to 1; median QED for approved oral drugs is ~0.5 (Bickerton et al., 2012). We classify compounds as drug-like (QED ≥ 0.5), lead-like (QED ≥ 0.35, lower MW), or fragment-like (MW < 250 — small, high-efficiency starting points).

ar-Turmerone (QED 0.61) outscores curcumin on drug-likeness, illustrating why systematic profiling surfaces better candidates than literature prominence alone. (Curcumin scores well on QED but is a known PAINS compound — a reminder that no single metric replaces orthogonal screening.)

Curcumin

drug-like
QED
0.52
MW 368.4 LogP 3.2

Bisdemethoxycurcumin

lead-like
QED
0.48
MW 338.4 LogP 2.9

ar-Turmerone

fragment-like
QED
0.61
MW 218.3 LogP 3.5
"When the targets from a plant's compounds align with the pathways its traditions documented — inflammation, neuroprotection, antimicrobial — that overlap is convergence at the molecular level."

The signal the platform is built to surface

III

Target Network

Confidence scoring across three evidence sources

Protein targets are identified by cross-referencing STITCH (chemical–protein interactions from text mining and experiments) and ChEMBL (binding assay data curated from literature). Open Targets contributes pathway centrality scores for identified targets — informing one dimension of the composite confidence, not the compound–target binding signal itself.

For each compound–target pair we compute a composite confidence: binding evidence strength × literature co-occurrence × pathway centrality (each normalized 0–1; multiplicative scoring penalizes weak evidence in any single dimension). Scores above 0.7 indicate high confidence; 0.5–0.7 moderate. Edge weight in the network graph reflects that score.

COX-2

PTGS2 · Inflammation

0.91

NF-κB p65

RELA · Inflammation / Immunity

0.87

AKT1

AKT1 · PI3K / Cancer signaling

0.74

GSK-3β

GSK3B · Neuroprotection / Tau

0.68

The full platform goes deeper — formulation builder, ADMET profiles, scaffold analysis, and clinical pathway linkages.

3ITPhyto

Ready to explore?

Partner with us for drug discovery, license our curated data, or commission contract R&D.

Get In Touch
© 2026 3IT Phyto — Medicinal Plant Informatics