Skip to content

Introduction

The Open Health Stack's Analytics components provide a scalable and flexible collection of tools to transform complex HL7 FHIR data into formats for running analytics workloads and building downstream applications.

Using OHS, developers can use familiar languages and tools to build analytics solutions for different use cases: from generating reports and powering dashboards to exploratory data science and machine learning.

FHIR Data Pipes Image

Key features

  • Apache Beam based ETL pipelines to continuously transform FHIR resources to "near lossless" 'FHIR-in-Parquet' representation based on a natural schema for projecting FHIR resource to Parquet.

  • Pipelines Controller module provides pipeline management and scheduling capabilities.

  • Flexible deployment modes to meet the needs of different projects and teams from simple single machine to multi-worker horizontally scalable distributed environments. With support for local, on-prem or cloud based runners.

  • Support for different target databases including traditional RDBMS (such as PostgreSQL) or any OLAP engines that can load Parquet files (such as SparkSQL or DuckDB).

  • Simplify querying data by defining views in SQL or as ViewDefinition resources to create flattened tables. Easily build analytics applications with common languages (e.g. SQL, python) and BI or data visualizations tools ( e.g. Apache Superset).

Use cases

  • The primary use case for FHIR Data Pipes is to enable continuous transformation of FHIR data into analytics friendly representations to make it easier for developers to: build dashboards, generate reports, perform data science task, and create features for machine learning models.

  • A secondary use case is for piping FHIR data from a FHIR source to another FHIR server e.g. for integration into a central FHIR repository.