27-Jul-2017

Graph Database Technology – The Power To Probe Complex Pharma Datasets

Summary

Neo4j’s Emil Eifrem takes a look at how life science researchers can use graph databases to get granular insight from big data and make real advances in research

Last Updated: 27-Jul-2017

Big data, defined as large complex data sets, has the potential to throw light onto every link in the life sciences value chain, which is why data mining has become so important to researchers – but having the right technology is key to its success.

Traditional database tools, namely SQL and relational database technology, find the volume as well as the unstructured nature of these complex datasets extremely difficult to work with. Why? Because they model the world as a set of tables and columns, peppered with complex joins as the data becomes more inter-connected. Data queries are technically difficult and notoriously expensive to run and performance can be questionable as data sizes increase.

However, a here-and-now solution has raised its head in the form of graph database technology, which links the relationships in data, which are of key interest to life science researchers. Graph database technology has been around for a while, but its innate ability to focus on the relationship between the entities involved, rather than the entities themselves, has recently caught the eye of the life sciences industry.

The unique power of graph database technology lies in discovering relationships between data points and understanding them on a gargantuan scale. Research is about exploring the unknown. Diving into unchartered territories. This is why it is so good at enabling medical researchers to handle vast datasets as well as expose hidden patterns in areas such as new molecule research and large clinical trials that are so difficult to detect using SQL-based relational database management system (RDBMS) or other approaches.

Medical and pharma use cases spotlight graphs’ capabilities

Tim Williamson, a data scientist at Monsanto, whose role is centered on gaining better research inferences from genomic datasets, has harnessed the power of graph technology and is enthusiastic about the results.

Monsanto is researching plant varieties and what genetic traits triggers them to thrive in different environmental conditions. These genetic problems rely on being able to see a dataset as an ancestor family tree. Williamson has found that these family tree datasets have a graph database structure, which makes it exceptionally easy to write graph queries for them. Data analysis that took minutes and hours is now taking seconds, freeing up time to concentrate on more important generic features.

The EU FP7 HOMAGE consortium, which focuses on early detection and prevention of heart failure, is another example. It has been using graph to study data sets from 45,000 patients in 22 cohort studies that has been connected with existing biomedical information in public databases to create an in-depth analysis platform. Graph technology has been invaluable in examining the data’s complexity. To highlight this, a graph database for just one heart failure network analysis platform contains over 130,000 nodes and seven million relationships alone.In yet another case, the Novartis Institute for Biomedical Research has created a large graph database of heterogeneous biological data, which is being combined with text mining results. There are half a billion relationships in the database and Novartis plans to triple this number. Stephan Reiling, Senior Scientist on the project is an advocate for graph databases following how it has displayed its flexibility in navigating around these large data sources.

Mining complex data

Data is connected. It is finding the connections and joining the dots that has been a problem for life science researchers, especially given the enormity of the challenge with some gargantuan datasets. Being able to mine this data and quickly find the links is proving imperative to speeding up research.

Graph technology, thanks to its in-built capabilities to model complexity, scale and connections is rapidly becoming the chosen data tool for serious medical research.

The author is CEO of Neo4j, the company behind the world’s leading graph database (http://neo4j.com/)

For sharing and learning about Graph Databases in Life and Health Sciences, visit the Neo4j portal https://neo4j.com/developer/life-sciences-and-healthcare/