## MEDI indications data — discrepancy in resource-specific counts

Update: See reply — this issue has been resolved. Links to our analyses and code have been changed to archived versions in this post.

MEDI is a publicly-available indication resource standardized to ICD9/UMLS concepts for diseases and RxNorm ingredients for drugs. The accompanying publication clearly and concisely presents the analysis, which follows a rational, resourceful, and thorough methodology [1].

The data is also already online, which is not the case with some other indication resources we've evaluated. However, when processing the data (source on github), we came across a potential discrepancy between Table 2 of the manuscript and the statistics we generated. The problem could have arisen from a mistake in our data processing or in MEDI's data export.

Specifically, from the manuscript [1]:

Table 2: Number of unique medications, ICD9 codes, and indication pairs extracted from each resource

ResourceMedications (% of total)ICD9 codes (% of total)Indication pairs (% of total)
RxNorm1,726 (56)999 (33)8,040 (13)
SIDER 21,554 (50)1,703 (57)17,702 (28)
MedlinePlus1,629 (52)869 (29)16,581(26)
Wikipedia2,608 (84)2,624 (87)34,911 (55)
Union of all resources3,1123,00963,343

Our analysis found different resource-specific counts. The comparison is complicated since the resource to numeric identifier mapping is unknown:

resourcemedicationsdiseasesindications
13,0912,98553,615
21,6481,0756,279
39845512,497
4447222952
all3,1123,00963,343
hps2,1391,34513,379

We will reach out to the MEDI authors for assistance. Currently the discrepancy seems to have a negligible effect on the high-precision subset.

Daniel Himmelstein Researcher

After contacting Dr. Wei-Qi Wei, we located the cause of the discrepancy. The integer values in the MENTIONEDBYRESOURCES column of MEDI_01212013_0.csv and MEDI_01212013_UMLS.csv refer to how many resources reported the indication. We had incorrectly assumed that this column referred to which resources reported the indication. Therefore, it appeared that each indication was only reported by a single resource.

Resource-specific indications data is not available from the MEDI website. However, the true counts for each resource combination are provided in manuscript Figure 2 [1]:

We would like to thank the authors for their prompt response and clarification.

• Daniel Himmelstein: The authors do not plan on releasing the resource-specific indication data for the current MEDI database. However, they will consider doing so for future releases.

Cite this as
Daniel Himmelstein (2015) MEDI indications data — discrepancy in resource-specific counts. Thinklab. doi:10.15363/thinklab.d31