Clinical data lakes store, cleanse and unify data that can come from different source systems. Some sources of data include:
- Clinical Trial Management Systems (CTMS)
- Electronic Data Capture (EDC)
- Third-party data sources:
- Files from Application Program Interface (API)-based connection
- Trial Master File (TMF) systems
- Electronic Medical Records (EMR) or Electronic Health Records (EHR)
- Lab data
- Wearable devices
As a result of these variable sources of data, clinical data lakes can become complex, although the same datasets can be used in multiple business use cases. However, more source systems feeding the data lake means greater data redundancy, resulting in maintenance and support nightmares. Moreover, as more use cases emerge for the data store in the clinical data lakes, complexities and redundancies across data models, data transformation jobs and data fetching (read operations) also emerge.
Metadata-driven design and architecture can greatly reduce or solve these challenges.
It provides the ability for a system to treat each data element based on what is available in the system and what is needed to cater to a use case, then using a repository of metadata along with dynamic Artificial Intelligence/Machine Learning (AI/ML)-driven data pipelines to ingest, standardize, transform and read datasets without creating redundancy and support or maintenance overheads.
The key components of clinical data lakes which use Metadata Driven Approach are:
- Metadata Repository
- Metadata Identification, Parsing Service
- AI/ML Models as a Service for inference
- Workflow Automation Service
Key Amazon Web Services (AWS) solutionss can be leveraged to build a robust Metadata Driven Clinical Data Lake. These include:
- Amazon Simple Storage Service (Amazon S3)
- AWS Lambda
- Amazon Elastic Compute Cloud (Amazon EC2)
- Amazon Elastic Container Service (Amazon ECS)
- AWS Elastic Beanstalk
- Amazon Dynamo DB and Redis
- Amazon Redshift
Join this free webinar to learn how a metadata-driven approach will help data analysts and bioinformaticians focus on data analysis without worrying about data management-related activities. Learn how this approach helps to scale clinical data lakes by onboarding new source systems and enable more use cases without having to rebuild data pipelines or redesigning data models. All with the added benefit of easier support/maintenance and richer audit trails for governance.