The Challenges of Big Data in Medicine

There is no denying the amount of data we create every day is increasing by leaps and bounds, but the data storage across the human body is truly incredible. According to Yevgeniy Grigoryev, PhD, a Biology Lecturer at City College of New York, the approximate data stored in the human body is 150 zettabytes. One zettabyte is 1,099,511,627,776 gigabytes, so the challenges big data presents in medicine are huge to say the least.

Bioinformaticians face significant challenges as they try to sift through all of the data they aggregate in order to mine, investigate, and find useful information. An article written by Andy Oram on Forbes titled, "Sequencing, Cloud Computing, and Analytics Meet Around Genetics and Pharma," looked at how these questions are being addressed and the tools being using to achieve these goals.

Oram was at the recent Bio-IT World Conference and Expo, where 2,700 conference attendees were tackling the size and chaos of clinical research data. The data is chaotic because currently technology exists to collect information from many different touch points. Whether it is EMR (electronic medical records), research papers, customer feedbacks, health monitoring or many different medical records, it is not organized. Making sense of the structured and unstructured data will require big data solutions capable of analyzing all of this information.

According to Oram, an estimated 80 percent of patient information in doctors’ records is unstructured. If you have seen how a doctor writes, you can understand the difficulty in digitizing this information so it can be part of a bigger data source to improve treatment, medications and other methodologies.

One of the technologies being offered at the conference was natural language processing (NLP) to collect data from free-form text by companies such as Cambridge Semantics using NLP. With NLP a computer program is able to understand human speech as it is spoken with deep analytics, sentence segmentation, part-of-speech tagging, parsing, named entity extraction and co-reference resolution.

The technology Cambridge Semantics uses include OWL, Cassandra and Hadoop—that are designed to manage big data more effectively. Web Ontology Language or OWL, is the ontology language of the semantic Web used for fast, flexible data modeling and efficient automated reasoning. This is one solution of many provided by just one company to extract valuable information from data the medical field generates.

In order to take advantage of these innovations it will require changing the way doctors, researchers, healthcare facilities and institutions use technology to make the information they gather easily available, which is easier said than done.

Edited by Maurice Nagle

Get stories like this delivered straight to your inbox. [Free eNews Subscription]

By Frank Griffin , HealthTechZone Contributing Writer

More Healthcare Technology Feature Articles >>