Additional topics

Authentication

Any openIDConnect provider can be used to supply oAuth credentials.

The required set of parameters are:

oidConnectUrl
clientId
clientSecret
oAuthUsername
oAuthPassword

You can add oAuth authentication to the pipeline by providing the required set of parameters via the command line or in the /pipelines/controller/config/application.yaml file.

Cmd lineapplication.yaml

$ java -cp ./pipelines/batch/target/batch-bundled-0.1.0-SNAPSHOT.jar org.openmrs.analytics.FhirEtl \
    --fhirServerUrl=[FHIR_SERVER_URL] --outputParquetPath=[PATH] \
    --resourceList=Patient,Encounter,Observation --batchSize=200 \
    --clientId=[CLIENT_ID] --clientSecret=[CLIENT_SECRET] \
    --OAuthUsername=[USERNAME] --OAuthPassword=[PASSSWORD] \
    --oidConnectUrl= [OPENID_CONNECT_URL] \

fhirdata:
.....
    # The following client credentials should be set if the FHIR server accepts
    # OAuth access tokens. Note the client credentials, e.g., the secret, are
    # sensitive, and it is probably a better practice to set these through
    # command-line arguments.
    fhirServerOAuthTokenEndpoint: "https://path_to_endpoint_for_token"
    fhirServerOAuthClientId: "THE_CLIENT_ID"
    fhirServerOAuthClientSecret: "THE_CLIENT_SECRET"

Config properties

The main configuration for the FHIR Data Pipes Pipeline and Controller is the /pipelines/controller/config/application.yaml file which is well documented.

When using the provided docker images, this will be found in /docker/config.application.yaml

Parquet on FHIR schema

Apache Parquet is a horizontally scalable columnar format that is optimized for performance.

FHIR Data Pipes transforms FHIR resources to "near lossless" 'Parquet on FHIR' representation based on the "Simplified SQL Projection of FHIR Resources" ( 'SQL-on-FHIR-v1') schema

The conversion is done using a forked version of Bunsen library to transform from FHIR (current support for STU3, R4) to the SQL-on-FHIR-v1 schema
The conversion is done by going from StructureDefinition --> AvroConverter --> Parquet
Configurable support for FHIR versions, profiles and extensions is provided

Monitoring pipelines

The pipelines controller exposes management end-points that can help with monitoring the health of pipelines.

The application has been integrated with the Spring Boot Actuator of Spring and has exposed Rest API end points for monitoring, health checks, metrics etc.
The end points can be customised in the configuration file.
It can easily be integrated with tools like Prometheus for monitoring metrics.

Via the Web Control Panel The Web Control panel provides a quick glimpse about the latest state of the application including:

Controls for triggering pipeline run on-demand
A readable view of the application configuration
Location and time of the latest snapshot created by the pipeline run
Metrics of the most recent pipeline
Error logs of the last pipeline if any

These are found in the application.yaml config file in the management: section.

See Config properties

Web Control Panel

The web control panel is a basic spring application provided to make interacting with the pipeline controller easier.

It is not designed to be a full production ready “web admin” panel.

The web control panel has the following features:

Initiate full and incremental pipeline runs
Monitor errors when running pipelines
Recreate view tables
View configuration settings
Access sample jupyter notebooks and ViewDefinition editor

Web Control Panel