Rephetio: Repurposing drugs on a hetnet [rephetio]

Development and evaluation of a crowdsourcing methodology for knowledge base construction: identifying relationships between clinical problems and medications

Extracting indications from the ehrlink resource

ehrlink is our name for a study where an EHR system prompted clinicians to report the problem that a medication was prescribed for [1]. The resulting high-confidence set contained 11,166 problem-medication pairs with precision exceeding 95%. Thus far, the comments pertaining to ehrlink have been scattered, so this discussion is meant to consolidate and provide a home for further analysis.

Here is the history of this collaborative integration effort:

  1. @b_good initially suggested the resource and located the data supplement.
  2. @dhimmel converted the pdf data supplement to a tsv file (comment, notebook, download).
  3. @dhimmel determined the identifiers were not from a standard terminology
  4. @allisonmccoy joined the discussion, confirming the proprietary identifiers and providing additional related studies.
  5. @allisonmccoy and @TIOprea discussed the reliability of the resource.
  6. @alizee mapped the medication terms from ehrlink to RxNorm (comment, repository).
  7. @dhimmel mapped the RxNorm concepts matched by @alizee to RxNorm ingredients (comment, notebook, download).

Mapping ehrlink diseases to the DO

The ehrlink high-confidence set contains indications for 1,596 problems (download). We used a simplistic string matching scheme to map these terms to the disease ontology. Lowercase ehrlink problem names were matched to lowercase DO names and synonyms (notebook, results).

22.9% = 365 / 1596 of the ehrlink problems mapped to the disease ontology. Of the 137 DO slim terms, 50 had a matching ehrlink problem. When we include propagated matching to DO slim terms, 5 additional diseases get matched. While these recall numbers appear low, we do recover a decent extent of the major complex diseases with few to no false positives.

Mapping ehrlink to DO and RxNorm ingredient terms

We created a version of ehrlink with the subset problem-medication pairs that mapped to standardized terminologies (notebook, download). We converted problems to DO terms (see above). Then we converted medications to RxNorm concepts, using the mapping produced by @alizee. We excluded any RxNorm matches with score < 55 as errors were observed below this threshold. Overall, the RxNorm approximateTerm function of the API performed impressively. Next we converted RxNorm concepts into their active ingredients and restricted to single-ingredient medications.

33.3% = 3719 / 11166 of the original problem-medication pairs successfully mapped to an ingredient and DO term. Users should take note that our mapping procedure was motivated by precision and automation, rather than recall.

Status: Completed
Referenced by
Cite this as
Daniel Himmelstein (2015) Extracting indications from the ehrlink resource. Thinklab. doi:10.15363/thinklab.d62

Creative Commons License