Data4Cure featured in GEN

Genetic Engineering & Biotechnology News (GEN) featured Data4Cure, Inc. in the article Machines Learn to Sift Big Biodata by Kristen Slawinski, Ph.D. GEN is the premier biotech publication since its launch in 1981.

Here are some highlights from the article:

“As more relevant connections are made among biological data, more meaningful conclusions can be made to drive drug discovery and development. Large datasets are a starting point for integrating and connecting data; however, there is also critical biological information that exists discreetly buried within millions of scientific publications. Accessing and linking this information requires an intelligence beyond human processing.”

“Data4Cure’s all-inclusive bioinformatics platform, the Biomedical Intelligence Cloud, is powered by a dynamic graphical knowledge base named CURIE, which uses advanced machine learning to automatically mine molecular datasets as well as published texts.”

“The robust network of biological information contained within the Biomedical Intelligence Cloud allows users to visualize their favorite biological molecule at a systems level. Even more impressive is the way in which the data can be analyzed to stratify patients into disease subtypes, a capability that can lead to predictions of therapeutic efficacy.

For example, a pathway activity analysis using patient expression data can identify immune cell-infiltrated and noninfiltrated tumor subtypes and associates them with specific genomic alterations. This information can then be used in immuno-oncology clinical trial planning by linking subtypes to treatment response.

Use of Data4Cure’s bioinformatic platform ensures an informed drug development plan which is beneficial for everyone.”

Read the full article here.

Data4Cure to present at BioData World West 2017

How can AI and machine learning extend biomedical knowledge and not add to the noise and complexity that hamper progress in the field?

Meet us at the BioData West conference in San Francisco, April 26-27. We will present new applications on our Biomedical Intelligence Cloud platform and discuss approaches which apply advanced machine learning and AI algorithms to continuously update biomedical knowledge.

Featured applications merge algorithms with prior knowledge of how cells and biological systems work. This prior knowledge can significantly limit the search space of possible solutions and guide algorithms towards biologically-sound solutions that have a greater chance of success in clinical development.

We will focus on Data4Cure’s model-based molecular and immune-infiltrate stratification engine which identifies cancer subtypes and associates them with disease pathway activities, immune cell infiltration, and predicted response to targeted drugs and immunotherapies.

If you’re at the conference please meet us at booth #6 and see our presentations in the sessions below:

Panel Discussion: 4/26 2:50pm Artificial Intelligence Track
Presentation: 4/27 11:40am Genomics and Health Track

Watch videos from our PMWC 2017 session

In January Data4Cure hosted a session at the Precision Medicine World Conference (PMWC) in Mountain View, CA focused on molecular subtyping in clinical drug development. The session highlighted some of the unique new aspects of the Biomedical Intelligence® Cloud in the context of molecular and immune-infiltrate stratification of tumors and mapping disease and therapeutic pathways.

Videos of our session are now available online.

See talks from Janusz Dutkowski, CEO, Data4Cure, and Pallavi Sachdev, Director of Oncology Biomarker Research at Eisai to learn more:

Join us at Biomarker and NGS Data Analysis and Informatics Conferences

Please join Data4Cure at the annual Biomarker and NGS Data Analysis and Informatics Conferences to be held on February, 6-7 in San Diego, CA.

The pursuit of driving precision medicine and the quest to refine a biomarker model for immuno-oncology drug development, the event aims to gather experts from pharma, bio-pharma companies, and research institutes to discuss current challenges and opportunities in biomarker discovery, CDx, NGS, regulatory and reimbursement issues and novel technologies.

We hope to see you there!

Data4Cure’s AI-Powered Biomedical Intelligence Cloud Features New Tools for Immuno-Oncology

La Jolla, CAJanuary 24, 2017 /PRNewswire/ — Data4Cure, Inc. today announced a series of updates and new case studies using its Biomedical Intelligence® Cloud – a semantic data-driven discovery platform designed to continuously grow knowledge from a multitude of genomic, molecular and clinical data that are accumulating rapidly in the field. These updates were presented at a session hosted by Data4Cure at the Precision Medicine World Conference (PMWC) in Mountain View, CA.

Watch videos from the PMWC session here:

The company’s Biomedical Intelligence® Cloud now includes new tools for immune-oncology research, improved applications for mapping the molecular basis of disease, and a new multidimensional version of the company’s molecular stratification engine.

Data4Cure’s platform is powered by CURIE™, a dynamic biomedical knowledge graph that is continuously updated with information dynamically extracted from a variety of data sources. Cross-referencing tens of thousands of datasets and bioinformatics analyses, literature, clinical trials and external databases, CURIE provides immediate data-driven answers to over 100 million biomedical questions.

CURIE understands biology,” said Janusz Dutkowski, Ph.D., CEO, Data4Cure, who chaired the PMWC session. “It combines advanced machine learning, systems biology and semantic search capabilities providing a new way to make discoveries. Our platform allows biological and clinical researchers to leverage diverse datasets to discover relationships between entities in a molecular system, and uncover how these entities and relations are affected by the disease and environment, and how they respond to drug interventions.”

CURIE is used by multiple applications running on Data4Cure’s Biomedical Intelligence Cloud. One of these presented in the update is the Disease Maps application which leverages a plethora of public and proprietary data to create up-do-date molecular maps for over 500 disease conditions including multiple cancer types, Alzheimer’s disease, Rheumatoid arthritis, diabetes, and other metabolic, immunological and neurological disorders.

The newest additions to Biomedical Intelligence Cloud platform include tools for immune-oncology research that Data4Cure has been testing with selected pharmaceutical partner companies. “These tools include algorithms for inferring immune cell types infiltrating tumors as well as systems biology tools for interrogating immune pathway activities associated with specific diseases subtypes,” said Roy Ronen, Ph.D., VP, Computational Biology at Data4Cure. The new tools are integrated with CURIE and other applications including the stratification and pathway analysis tools.

The platform also includes a redesigned network-based stratification engine which uses network maps along with genetic, epigenetic and transcriptional evidence to identify disease subtypes that are directly tied to the network and pathway-level activities and aberrations, and can be further associated with predicted sensitivity or resistance to selected drugs. In oncology applications the stratification process can now be informed by immune-infiltrate profiles inferred by the platform.

Data4Cure @ PMWC 2017

Updates were also introduced to the Multiscale Pathway Maps which combine with the Stratifier to analyze subtype-specific pathway activities, including among the immune pathway component. The updated Disease Maps application –   providing an integrative, data-driven view of a disease networks and drivers – features enhanced omics integration at the pathway-level and annotates maps with detailed information about drug targets, approved therapies and clinical trials.

Data4Cure Disease Maps

Launched in 2016, the Biomedical Intelligence Cloud has attracted customers and partners across some of the largest global pharmaceutical companies and leading research institutes. “We are very excited to work with some of the best research groups in the industry and academia,” said Dutkowski. “By dramatically accelerating the access to data-driven knowledge and uncovering non-obvious connections hidden in millions of datasets, scientific papers and clinical trials we can really start to turn data into cures.”

Learn more at www.data4cure

Data4Cure to organize a session at the upcoming PMWC 2017

Janusz Dutkowski, CEO, Data4Cure will be chairing a session on “How Molecular Subtyping Can Aid Clinical Drug Development” at the upcoming 2017 Precision Medicine World Conference (PMWC) Silicon Valley.

Pallavi Sachdev, MPH, PhD, Director of Oncology Biomarker Research, Eisai Inc. will be the second speaker at the session which is scheduled for the morning of January 24.

Update: Videos from the session are now available to watch online.

The 11th PMWC, co-hosted with Stanford Health Care, UCSF, Intermountain Healthcare, Duke University, and Duke Health, takes place January 22-25, 2017 in the heart of Silicon Valley, gathering global thought leaders across medicine, industry, research, regulatory, and the payor community to discuss the breakthroughs and challenges the world of precision medicine faces. Learn more at:

Session Synopsis: Deep NGS-based molecular and immune-infiltrate characterization and subtyping of tumors has the potential to identify molecular subsets that will benefit from immunotherapy as well as propose combination therapy to target single-agent resistance pathways. This session will showcase the utility of molecular subtyping to aid clinical drug development.

Session Speaker Profile: Dr. Sachdev, Director of Oncology Biomarker Research at Eisai, a human health care (hhc) company, supports the clinical development of a diverse in-house pipeline of oncology compounds. During her time at Eisai, Dr. Sachdev has increasingly focused on cross-functional and global coordination of biomarker research activities to support the development of targeted therapeutics by alignment of drug and diagnostic strategies to achieve the oncology pipeline clinical objectives. Previously, Dr. Sachdev was a scientist at The Rockefeller University focusing on stem cell biology, cancer signaling networks and systems biology. Dr. Sachdev received her B.S. from Cornell University, her Ph.D. in Biochemistry and Molecular Biology from Mount Sinai-New York University School of Medicine. She has co-authored numerous research publications in the fields of cancer signaling networks, cancer genomics and personalized medicine and presented at international scientific conferences.

Session Chair Profile: Janusz Dutkowski, PhD, co-founded Data4Cure with the goal to change how we acquire, integrate and apply biomedical data towards development of new therapies. He has a background in mathematics and computer science and has been working in business intelligence and operational research before transitioning to computational biology for his PhD and postdoc. Before founding Data4Cure, he was a scientist at the University of California, San Diego where he developed methods for multiscale network analysis and cancer biomarker discovery from large omics data. He has co-authored 20 research papers published in acclaimed scientific journals including Nature Biotechnology, Science and Cell and has been a speaker at numerous international conferences and scientific meetings.

We are looking forward to seeing you at the session!

How knowledge is born: putting data in context

Data is transforming biology and medicine. In many biomedical domains we have more data every year than collectively ever before – the consequence of exponential growth.

Yet data is not knowledge. And growing knowledge has proven a lot more challenging than sheer data accumulation.

Finding new drug targets, biomarkers, and personalized treatment strategies will incresingly depend on our ability to translate data to knowledge. Can we do this? Can we do this more effectively?

Why doesn’t the rate of knowledge growth reflect in any way the (exponential) rate of data accumulation?


We think the answer lies in context, or really the lack of it. Here is why.

Imagine you are analyzing data at pharmaceutical company or a research lab. Your new data suggests that gene X is aberrantly activated in disease Y and you think that X might be a possible target for Y. You want to quickly find out if the gene has been linked to disease Y in literature. Is X genetically linked to Y? Is there other data in cell lines, model organisms or human that can support your finding? What are the downstream pathways for X, what is it regulated by and what are its interactors? What drugs target gene X, what might be the molecular and phenotypic consequences of inhibiting X, and so on…

Now imagine your data suggests 1000 aberrantly activated genes. What you’ll likely see is that the time spent on your initial analysis (i.e. identifying the 1000 aberrantly regulated genes) is a fraction of what it takes to try to make sense of the data in the context of all other data and results. And you are never really done. In a rapidly advancing field prior knowledge is always changing and context needs constant updates.

Data just doesn’t come with this kind of rich context included. We need to make the right connections to prior knowledge, other datasets and other results. By doing so we develop an understanding of how our result fit in, what novel insights they provide, and how they change the current state of knowledge. That’s how most new knowledge is born.

Building Biomedical Intelligence

What if we could automate this process? What if data could automatically connect to literature, other relevant datasets and prior knowledge?

At Data4Cure we developed an advanced dynamic ontology platform that puts data into context making it possible to find connections and rapidly grow knowledge from large biomedical datasets.

Our platform is based on a dynamic biomedical ontology – the CURIE Knowledge Graph – and a growing set of applications that continuously extract, annotate and integrate contextual information from various sources. These applications

1. gather, import and process vast amounts of molecular and clinical data, including genome-wide DNA, RNA, epigenetic, and proteomics profiles,

2. ingest reference databases, including databases of clinically-associated variants, genotype-phenotype, drug-disease, and drug-target associations,

3. crawl and parse through millions of scientific papers and clinical trials to mine meaningful relationships and information from free text,

4. aggregate molecular and clinical information with genome-wide molecular networks and pathways to identify network/pathway-level activities and aberrations.

The resulting contextual data and results flow into the CURIE Knowledge Graph which provides a search engine for end users and an API used by all other applications on the platform.

Across a wide range of disease areas and applications, Biomedical Intelligence applications use biology-informed statistical and algorithmic priors to effectively aggregate signal and filter out biological and technical noise. Together, with the CURIE Knowledge Graph they comprise the Data4Cure Biomedical Intelligence Cloud.

In subsequent posts we will dive deeper into individual components of the Biomedical Intelligence Cloud and discuss new technologies that enable them.

The first post in this series is about our Multiscale Maps. It is already available here.

Interested in learning more? Contact us for a free demo or drop us a line at

Boosting statistical power and context with Multiscale Maps of the Cell

The hierarchy

It is not about components, it’s about the machine. Like many other complex systems, biological machines are hierarchical: proteins interacting in complexes which work as part of pathways forming more general processes and so on.

To better understand disease mechanisms we need to be able to interrogate all levels of the functional hierarchy. It is important to understand when perturbations to different components of the same pathway or process lead to similar phenotypes and when their effects are distinct (component-specific). This has major implications for many practical applications, for example:

1. Deciphering complex genotype-phenotype relationships, e.g. mapping rare variants to common pathways and processes underlying a disease of interest,

2. Predicting the effects of chemical or genetic perturbations, such as gross transcriptional changes in multiple system components.

3. Identifying and optimizing biomarkers towards increased power and specificity, e.g. identifying homogeneous pathway activities and aberrations distinguishing responders from non-responders in clinical trials.

Each of the above tasks can get complex very quickly. One needs to process and integrate large pathway repositories; engineer effective strategies for determining pathway and network activities based on gene-level data; and develop effective tools for visualizing and interrogating the results (pathway analysis results are often complex in their own right).

Our goal for the Multiscale Maps was to simplify the process and make comprehensive systems-level analyses much easier and effective so that they can be one of the go-to tools for many bioinformatics problems.

Multiscale visualization

Data4Cure Maps combine vast repositories of manually-curated pathways but coupled with an unbiased data-driven ontology, the NeXO 2.0 (if you’re interested, take a look at the early papers on NeXO [123]).

To visualize the pathway hierarchies we developed a really nice new multi-scale graph visualization framework. What it does is it allows you to see all the hierarchical system components at once and easily zoom into selected regions of the system to dig deeper.

We tried to make the data, statistics and information associated with each pathway as easily accessible as possible. In fact, molecular data for thousands of biological and medical conditions is available in the system and pre-integrated with the maps (disease areas include oncology, immune system, cardiovascular, metabolic and neurologic conditions). You can start exploring immediately.  You can also easily extend the Maps by bringing in your own data for analysis.


Integrating multidimensional data

Maps work with all major types of molecular data not just gene expression. Genetic variants and somatic mutations, gene, miRNA, and protein expression and DNA methylation profiles can be correlated with each other or with available clinical measurements.

For example, the Maps allow you to quickly find disregulated pathways associated with genetic variants and mutations or genetic/chemical perturbations. You can also investigate pathways and processes enriched for mutations (small indel and CNV) and/or epigenetic alterations across various cancer types and find which alterations are significantly associated with clinical variables such as response to treatment and overall survival. The association engine analyzes molecular data in the context of the Multiscale Maps to detect significantly perturbed or activated hierarchical components, rather than single genes only, taking into account the type of molecular data, gene size and position, as well as the hierarchical pathway information from the Multiscale Map.

Increasing power

And importantly, these associations can often be made when it would not be possible to make them at the single gene level (due to lack of power).  Standard approaches – identifying individual genes mutated at high frequencies – detect many potential drivers but lack power to identify genes mutated less frequently that still may play key roles in cancer (see the comprehensive analysis from Gad Getz lab here).

Importantly, it is often of interest to find the associations between putative drivers and clinical phenotypes such as patient survival, response to treatment or cancer stage. By analyzing molecular and clinical data across over 33 cancers profiled by The Cancer Genome Atlas (TCGA) we found that we are able to detect up to three times more statistically significant associations (while correcting for pathway/gene overlap) with Multiscale Map features rather than single features. Molecular features analyzed include somatic mutations, CNV and DNA hypermethylation events:


Moreover, the number of phenotypes across the 33 cancers for which at least one biomarker candidate was identified increased ~2 fold by including multiscale markers:


For many problems, Multiscale Maps can provide a tool to help address biological complexity and small sample size.

We think it’s time to move toward more comprehensive systems-level views of health and disease. Multiscale Maps provide a tool that makes it easy to explore molecular data and find key insights at any level of molecular hierarchy: from individual genes and proteins, to complexes, pathways and higher-order processes.

Let us know what you think! Drop us a line at or request a free demo of the Multiscale Maps and our Biomedical Intelligence Cloud at

Until next time,

have a great week!

Janusz Dutkowski

Further reading

  1. Dutkowski et al., Nature Biotechnology 31, 38–45 (2013) doi:10.1038/nbt.2463
  2. Dutkowski et al., Nucleic Acids Research 42, D1–D6 (2014) doi:10.1093/nar/gkt1192
  3. Dolinski and Bostein, Nature Biotechnology 31, 34–35 (2013) doi:10.1038/nbt.2476
  4. Ideker, Dutkowski, Hood, Cell 144(6), 860-3 (2011) doi:10.1016/j.cell.2011.03.007.
  5. Yu, et al., Cell Systems 2(2): 77–88. (2016) doi:10.1016/j.cels.2016.02.003
  6. Carter, Hofree, Ideker, Current Opinion in Genetics & Development 23 (6): 611–621. (2013) doi:10.1016/j.gde.2013.10.003
  7. Carvunis, Ideker, Cell (2014) doi:10.1016/j.cell.2014.03.009

Data4Cure at the 2016 Bio-IT World Conference & Expo

Janusz Dutkowski, PhD, CEO and co-founder of Data4Cure will give a talk at the 2016 Bio-IT World Conference & Expo in Boston – a premier event featuring IT and informatics enabling technologies that drive biomedical research, drug discovery & development, and clinical and healthcare initiatives.

His talk titled Genomic Variants in Context and at Scale – Integrative Approaches to Predict Pathogenicity and Stratify Patient Cohorts will be part of the Clinical Genomics Track on Thursday, April 7 at 11 am.

The 2016 Bio-IT World Conference & Expo will bring together more than 3,300 attendees from 41 countries to build a global network for precision medicine, bioinformatics and form collaboration across the industry.

Learn more about the conference here.


Join us at the 2016 NGS Data Analysis and Biomarker Conferences

Join us at the 2016 NGS Data Analysis Conference and the Biomarker Conference co-organized in our home town and learn more about bioinformatics and NGS. Data4Cure’s CEO, Janusz Dutkowski, PhD will be presenting on Thursday February 18 at 10 am. His talk is titled “Translating data into knowledge, translating knowledge into cures“. Hope to see you there!

The world leading Next Generation Sequencing meeting, the NGS Data Analysis and Informatics Conference will be held on 18 – 19 February 2016 in San Diego, CA – USA.  During the meeting we will be discussing strategies to accurately analyze and scale-up informatics for the data generated through the NGS technology while ensuring quality and scalability of the data.

Click here to learn more about the 2016 NGS Data Analysis Conference and the Biomarker Conference.