Big Data in Medicine: Easier Said Than Done

According to an article recently published by Forbes, one data scientist wants to explain why data science has proven to be slow at identifying particular biomarkers in medical patients.

Imran Haque

In addition to an impressive list of professional accomplishments, Imran Haque has a PhD in computer science from Stanford University. This is relevant mostly because the application of artificial intelligence and big data might seem like a job for a computer scientist, but as Haque explains, in medicine, problems arise before a computer even boots up.

At a Rhode Island conference about big data hosted by the American Association for Cancer Research, Haque gave an address where he identified two core problems with applying traditional big data theory to medicine.

Physiological Limitations

Humans are soft and squishy fleshbags, as many important sci fi villains have been quick to point out. Our bodies, in many cases, just aren’t that good at “feeding” data to scientists.

Big data has so far been collected mostly online. That’s logical – as the name implies, huge sample sizes are required before complex algorithms can yield any sort of comprehensive information on a subject. The internet provides researchers with millions of potential subjects who produce an almost constant stream of data as soon as they log in. On the other hand, people with certain illnesses or conditions may be few and far between.

For example, cystic fibrosis, a rare but not unheard of disease, affects some 30,000 individuals in the United States. Comparatively, Facebook averages around 1.49 billion visits every day. Every post, every login, every day is a new plot point on a big data chart. But when collecting a data point for a cystic fibrosis patient, it might take several weeks to run just one test. In addition to that, there are often physical limitations when attempting to collect medical information on this scale.

For example, when trying to detect tumors early, it might take a blood sample as large as 80mL to yield a single molecule of mutated DNA that suggests a tumor’s presence. Doubtlessly many molecules of mutated DNA would be necessary from each patient in order to create enough data to apply big data principles, requiring a large amount of blood over a long period of time.

Limitations of the Nature of Big Data

Haque also pointed out that big data does not always yield satisfying, clear-cut answers as some seem to expect. Big data relies on pattern recognition to help researchers draw meaningful conclusions about a given subject, but all too often finding patterns to look for in medicine is easier said than done.

Variations between individuals’ biological markers can be considerable. This can create the suggestion of patterns that may not be there, or lead scientists to draw inaccurate conclusions about a given subject – especially when the input data pool is small, like in the rare disease community.

Just determining what biological markers to focus on, as well as accounting for the all the various differences that may be present between any two subjects being studied, would be a monumental task for data scientists – and would be unlikely to yield much conclusive evidence for anything!

Exercise Skepticism

Haque understands big data is tempting technology. It seems so promising in so many applications that of course we would try to apply it in our medicine – but Haque wants to quell just some of the hype. Data, he points out, isn’t dishonest. But what data is collected, how, and by who – can be misleading.

Big data and machine learning will almost always produce some kind of result, Haque explained, but it will often yield the result you “want” in the first place. If you look at a data set expecting to find something, chances are you’ll find a way to find it.


Regardless of Haque’s reservations about big data’s current effectiveness in medicine, many remain hopeful the technology will play an important role in assisting the doctors of tomorrow. In what ways might big data be medically useful in ways outside of observing biomarkers? Share your thoughts with Patient Worthy!

Share this post

Follow us