Natural Language Processing is Changing How We Research Rare Diseases

Diagnosing Rare Diseases

Diagnosing rare diseases is historically difficult. Physicians are taught to think about horses instead of zebras because horses are abundantly more common. For instance, if you’re tired for a week, its much more likely that you didn’t sleep well due to stress than it is that you have a rare disease like narcolepsy. However, this means that for those who DO have rare conditions, diagnosis can take a very long time.

One physician explains that she was taught this protocol. However, as a new mom, she was still concerned that her daughter had a rare condition called Hirschsprung disease. This was quickly ruled out, but it made her think more deeply about rare disease patients who have to wait a long time for a diagnosis.

Considering that one in every 10 people in the United States has a rare disease, despite the “rarity” of each individual illness, we must be more vigilant in considering all possibilities. The issue is that there are upwards of 7,000 types of rare conditions on top of more common diagnoses.

Due to this, rare diseases are typically not considered until all other options have been proven wrong. When these have been exhausted, a physician often has to comb through a patient’s record by hand, documenting the seemingly disjointed symptoms and evaluating how they may be otherwise connected. This task is time-consuming and often leads to manual error.

Thankfully, new tools are becoming available for doctors that supplement manual work with technology, allowing for faster diagnosis and better data. One of these is called NLP.

Natural Language Processing

Natural Language Processing (NLP) allows data to be collected from patient portals, EHRs, and peer reviewed literature. The technology can examine discrete fields and free text to extract any information relevant to the patient. This provides a 360 degree view of the patient leading to the most precise diagnosis possible as soon as possible.

For example, the University of Iowa has used NLP to assess clinical phenotypes of infants 200 times faster than their typical manual methods.

It has also been used to enhance rare disease registries and identify proper cohorts for the right clinical trials.

This anonymized data can be used by pharmaceutical companies to develop new therapies for rare diseases.


Shire has implemented NLP to assess both disease severity and the genetic characteristics of Hunter syndrome.

They have created a therapy that can improve the lives of those living with the severe form of the condition. This type typically presents between 2 and 4 years of age. It is progressive and leads to early death. The team wants to initiate a clinical trial for this therapeutic intervention. However, before starting the intervention, they want to ensure they are choosing participants who have the greatest potential to benefit from their treatment. This is especially important because the treatment is invasive.

Shire has accomplished their goal through NLP mapping of genotypes and phenotypes. They classified patients diagnosed with the condition and their associated gene mutations. They had a high yield of relevant results. The team found that their results surpassed those of available databases.

This example and others show the value of NLP and how it may improve collaboration, communication, and overall outcomes, for rare disease research and most importantly, rare disease patients.

You can read more about this technology here.

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email