The AI & Insights Layer: Synthesizing Knowledge, Powering Intelligent Discovery

The AI & Insights Layer is the intelligence core of the Data4Cure Biomedical Intelligence® Platform, where harmonized data, scalable computation, and structured knowledge converge to generate actionable insight. Building on the Data Hub, Biomedical App Engine, and CURIE Knowledge Graph, this layer applies advanced AI, graph-based learning, and generative reasoning to help researchers uncover mechanisms, predict outcomes, and accelerate discovery.

The Architecture for Intelligence

The Architecture for Intelligence is Data4Cure’s unifying framework for continuous biomedical discovery. It connects data, analytics, and AI across four integrated layers—each transforming raw information into structured knowledge and predictive insight:

  1. Data Hub – Ingests, harmonizes, and annotates large-scale public and proprietary datasets within a unified semantic data lakehouse.
  2. Biomedical App Engine – Powers scalable analytics, modeling, and visualization built on harmonized data.
  3. CURIE Knowledge Graph – Contextualizes findings within an interconnected network of over four billion biological and clinical relationships, enabling reasoning across domains.
  4. AI & Insights Layer – Integrates advanced AI models and generative tools to synthesize new hypotheses, predictions, and insights—completing the feedback loop across the platform.

Together, these layers form a self-reinforcing cycle of data → analysis → knowledge → insight, ensuring every study contributes to an expanding foundation of biomedical intelligence.

From Data to Insight: AI That Synthesizes Biomedical Knowledge

Within this architecture, the AI & Insights Layer transforms harmonized, semantically enriched data into predictive, interpretable knowledge. Models trained on millions of biological and clinical relationships learn from vast amounts of heterogeneous evidence—creating insights and enabling new discoveries:

  • The RNA1 Foundation Model and Subtype Intelligence – The RNA1 transformer model, trained on 182 K+ RNA-seq samples, powers Subtype Intelligence across 150 K tumor samples. RNA1 supports both the assignment of published subtypes and the discovery of novel molecular classes across 25 cancers, capturing disease heterogeneity with unprecedented resolution.
  • Knowledge Graph AI for Target Discovery – The Target Intelligence app applies graph embedding models trained on the CURIE Knowledge Graph to identify and prioritize novel drug targets and infer new target indications. This structured, explainable approach has highlighted promising examples such as KIF11 in hepatocellular carcinoma.
  • AI-Driven Knowledge Synthesis and Exploration – The CURIE AI Reports and CURIE AI Assistant together represent a new paradigm for understanding biomedical knowledge.
    • CURIE AI Reports automatically generate traceable, evidence-grounded summaries for genes, diseases, and drugs. Built on structured graph evidence, they integrate literature and data-driven findings into cohesive, interpretable outputs that support decision-making.
    • The CURIE AI Assistant extends this capability interactively—allowing researchers to query the platform in natural language, explore related datasets and atlases, and surface contextualized answers grounded in graph-based evidence

Exploring Biology Through Atlases and Sample Universes

The AI & Insights Layer powers a family of large-scale, AI-driven resources that make complex biological data explorable and comparable at scale.

Single-Cell Atlases

The Data4Cure Single-Cell Atlases are powered by a transformer-based foundation model for single-cell data, enabling consistent harmonization and representation of cellular states across diverse studies. These atlases combine millions of single-cell profiles to provide a comprehensive, contextual view of cell biology across multiple therapeutic areas.

Atlases currently available:

  • Cancer Cell Atlas v2.5 – Integrates 5.2 million cells from 83 datasets covering ~90 cancer types, extending malignant cell annotations and enabling exploration of the tumor microenvironment at greater resolution than previous releases.
  • Endothelial Cell Atlas – Combines 852,000 endothelial cells from ~800 donors across 20+ cardiovascular and metabolic diseases, capturing vascular remodeling and endothelial subtype diversity across tissues.
  • IBD Atlas – Integrates immune, epithelial, and stromal cell profiles across inflammatory bowel disease studies to reveal cell-type–specific mechanisms of intestinal inflammation.
  • Chronic Respiratory Disease Atlas – Captures airway and lung tissue heterogeneity across asthma, COPD, and related respiratory conditions.
  • Immune Checkpoint Inhibition (ICI) Atlas – Provides a single-cell view of immune response and resistance mechanisms across ICI-treated tumors.
  • Neuroinflammation Atlas – Aggregates data from neuroinflammatory and neurodegenerative models to uncover cell-type–specific molecular changes.
  • Multiple Sclerosis Atlas – Focuses on immune–glial interactions and demyelination mechanisms across MS patient datasets.

Each atlas includes harmonized metadata, deep annotations, and high-performance single-cell browsers for interactive visualization and cross-study analysis—offering an integrated, foundation model–driven view of cell biology across therapeutic areas.

Sample Universes

Complementing the single-cell resources, the Sample Universes provide harmonized, large-scale bulk RNA-seq datasets that combine thousands of studies within each disease domain. They leverage the RNA1 Foundation Model and CuratorAI for uniform integration, annotation, and discovery of cross-study relationships.

Currently available:

  • Oncology Sample Universe – Incorporates 152,793 bulk RNA-seq samples from 7,456 datasets spanning 25 cancer types. It includes detailed annotations on tissue, treatment, and outcomes, all harmonized through CuratorAI. The Universe supports Subtype Intelligence, enabling AI-driven discovery of tumor subtypes and clinical associations.
  • Neurodegeneration Sample Universe – Aggregates bulk RNA-seq datasets across major neurodegenerative diseases, providing a harmonized resource for comparative molecular profiling.

Together, these Universes provide harmonized, AI-ready resources for exploring gene expression, biomarker patterns, and treatment-associated signatures across disease areas. Sample Universes for other disease areas are currently in progress.

CuratorAI: Harmonizing Data Through Intelligent Annotation

Behind every Sample Universe lies CuratorAI—Data4Cure’s large language model–powered metadata harmonization system. CuratorAI automatically extracts, standardizes, and links key metadata entities (such as diseases, tissues, treatments, and phenotypes) to the CURIE Knowledge Graph ontologies.

This semantic alignment allows data from disparate studies to be explored coherently and enables downstream AI models to reason across datasets. In practice, CuratorAI:

  • Recognizes and expands abbreviations (e.g., linking “AA” to Alopecia Areata).
  • Identifies and maps key concepts like disease, treatment, and time points to structured ontologies.
  • Resolves missing or inconsistent metadata using contextual inference.
  • Continuously improves through feedback from new datasets and user-curated annotations.

By combining LLM-powered understanding with ontology-based grounding, CuratorAI transforms unstructured metadata into a structured semantic layer—making large-scale data discoverable, comparable, and ready for AI-driven analysis.

CURIE AI Assistant: A New Gateway to Biomedical Intelligence

The CURIE AI Assistant introduces a natural language, AI-powered way to explore the platform’s structured knowledge and data. Powered by large language models and deeply integrated with the CURIE Knowledge Graph, it allows researchers to:

  • Ask scientific questions in natural language.
  • Retrieve contextualized, evidence-based answers grounded in data and literature.
  • Explore linked datasets, atlases, and reports suggested by the Assistant.
  • Move seamlessly between exploration, analysis, and interpretation.

By combining semantic knowledge representation with generative reasoning, the CURIE AI Assistant transforms how researchers interact with biomedical information—making exploration intuitive, contextual, and dynamic.

Toward Intelligent, Contextual Discovery

The AI & Insights Layer is where the Data4Cure Biomedical Intelligence® Cloud realizes its full potential—transforming harmonized data and structured knowledge into explainable, AI-powered insight.

From foundation models and knowledge-graph AI to generative reporting and the CURIE AI Assistant, this layer empowers scientists to reason with data, uncover hidden relationships, and advance translational research—continuously and collaboratively.

This is where biomedical data becomes living knowledge.