Opinion

More than meets the eye: Unstructured data's untapped potential

This is a guest post from John Schneider, CTO at Apixio.

It’s a paradox today we have more healthcare data than ever, yet we can’t seem to do meaningful work with it. Why? Because much of the new data is unstructured.

Nearly 80% of the data in the 1.2 billion clinical care documents that the U.S. produces annually is unstructured. Unstructured data includes written doctor’s notes, scanned documents, images and other free-form files. It is notoriously difficult to use because unlike structured data that is easy to view and use, this data is unorganized, text-heavy and hostile to easy processing. Putting it in simple terms: It’s way harder for a computer to understand a paragraph than it is to understand a checkbox.

Because of this, healthcare organizations often rely entirely on structured data, like billing data, to guide their decisions. But this data is noisy and provides a cloudy and incomplete patient health picture. It’s the difference between knowing that a patient was billed for a diagnostic test, such as an electrocardiogram, as opposed to reading deep into the patient chart to know that the patient has poorly controlled diabetes that puts them at a higher risk for stroke and heart disease. The billing data isn’t even a decent map to the territory, you can get lost fast if that’s what you are depending on.

A closer look at unstructured data can catalyze healthcare transformation  

While dealing with this complex mess of data is daunting, it is a necessary step toward true healthcare transformation. By joining together insights from structured and unstructured data, we have the ability to solve some of the most pressing healthcare issues and ensure the delivery of high-quality, low-cost care.

Thankfully, things are changing rapidly. In fact, data science teams have made large strides toward processing unstructured data over the past couple years. Using machine learning, the same technology that Google and Tesla are using to create self-driving cars, data scientists can extract unstructured data from medical records, coax insights from them and pass them on to humans for confirmation. This workflow can improve a whole host of tricky healthcare issues for different stakeholders such as insurers, physicians and researchers.

For insurers  

Insurers can benefit from unstructured data for many reporting needs, including risk adjustment. The government uses risk adjustment to calibrate payment to Medicare Advantage plans based on the health of their population. Through risk adjustment, diagnosis codes on insurance claims are used to determine what chronic conditions each plan member was treated for and establish an overall “risk” score.

However, this exercise is fraught with challenges. Patient conditions are often not included in the diagnosis codes captured on insurance claims. To fill these holes, insurance plans go through a slow and cumbersome end-of-year process of retrieving printed medical charts and reviewing them for relevant information that will fill a gap in the coded information about a patient. This is a process that could be streamlined with access to unstructured data.

The ability to access the free-form text in clinical charts and pair it with medical billing data enables insurers to understand the chronic conditions of members in a faster and more accurate way. This gives insurers a more complete view of population health so they can better target care. Better care and a healthier population make for more resilient insurance companies and a stronger industry.

For physicians

With the ability to join together disparate unstructured data, physicians can access important facets of a patient's medical history that were previously out of reach.

Advanced analytics are already being used to supplement physicians’ expertise, serve as second opinions or personal assistants.

For example, radiologists and ophthalmologists have started using computers to assist in reading images. Google’s DeepMind AI has partnered with the U.K.’s National Health Services Trust to better the diagnosis and treatment of diabetic retinopathy and macular degeneration, through the analysis of retinal images that gave early warning of disease progression. Analyzing these images greatly improved the quality of diagnosis and served as a consistent means for doctors to confidently classify various stages of this condition.

The ability to take analysis beyond structured data and apply it to things like images opens up a world of possibilities for care providers. With the ability to quickly analyze data - unstructured and structured - physicians have a powerful tool to help them quickly detect and diagnose disease, improve care outcomes and lower costs.

The potential of these positive effects will be accelerated when these insights are streamlined, stored,  and shared among healthcare professionals - giving patients comprehensive and well-informed care at every healthcare system touch point.

For Researchers

For researchers, understanding unstructured data paves the way for the development of more sophisticated care standards, ones that are based on cause-and-effect patterns across enormous data sets.

Researchers at Stanford University are leveraging unstructured medical notes to perform post-market surveillance of FDA-approved drugs. Every year, drug-related adverse events account for almost half of adverse events that occur in hospital settings. Stanford’s researchers have developed machine learning algorithms that consume unstructured clinical notes to detect negative treatment outcomes related to specific drugs or drug-to-drug interactions. This work demonstrates the feasibility of using large amounts of free-text notes to monitor the effects of medical drugs after they have been licensed for use — including previously unreported adverse interactions.

Clearly we can’t expect humans to quickly understand such a huge amount of data from so many sources themselves. Thankfully, we have achieved a level of sophistication in technology that makes the promise of large-scale, self-improving analysis of unstructured data well within our reach. Now that we have the means to achieve this, there are no excuses for not taking advantage of all clinical data for improvement across the healthcare system.

Filed Under: Health IT