Clinical data lakes store, cleanse and unify data that can come from different source systems. Some sources of data include:
- Clinical Trial Management Systems (CTMS)
- Electronic Data Capture (EDC)
- Third-party data sources:
- Files from Application Program Interface (API)-based connection
- Trial Master File (TMF) systems
- Electronic Medical Records (EMR) or Electronic Health Records (EHR)
- Lab data
- Wearable devices
As a result of these variable sources of data, clinical data lakes can become complex, although the same datasets can be used in multiple business use cases. However, more source systems feeding the data lake means greater data redundancy, resulting in maintenance and support nightmares. Moreover, as more use cases emerge for the data store in the clinical data lakes, complexities and redundancies across data models, data transformation jobs and data fetching (read operations) also emerge.
Metadata-driven design and architecture can greatly reduce or solve these challenges.
It provides the ability for a system to treat each data element based on what is available in the system and what is needed to cater to a use case, then using a repository of metadata along with dynamic artificial intelligence/machine learning (AI/ML)-driven data pipelines to ingest, standardize, transform and read datasets without creating redundancy and support or maintenance overheads.
The key components of clinical data lakes which use Metadata Driven Approach are:
- Metadata Repository
- Metadata Identification, Parsing Service
- AI/ML Models as a Service for inference
- Workflow Automation Service
Key AWS services can be leveraged to build a robust Metadata Driven Clinical Data Lake. These include:
- S3
- Lambda
- EC2
- ECS
- Elastic Beanstalk
- Dynamo DB/Redis
- Redshift
Join this free webinar to learn how a metadata-driven approach will help data analysts and bioinformaticians focus on data analysis without worrying about data management-related activities. Learn how this approach helps to scale clinical data lakes by onboarding new source systems and enable more use cases without having to rebuild data pipelines or redesigning data models. All with the added benefit of easier support/maintenance and richer audit trails for governance.
Join Saama VP of Engineering, Krunal Patel , and AWS Partner Network Healthcare & Life Sciences Technical Lead, Dr. Aaron Friedman
, as they lead this webinar on Thursday, March 21, 2019, at 10:00am PT and explore metadata-driven solutions.
Click here for more information and to sign up.