AI in Clinical Data Management

Today we’ll be looking at AI’s applicability across clinical data management (CDM) in life sciences and how it improves CDM as a business process. Artificial Intelligence (AI) offers numerous benefits that augment and improve data management. Some of these benefits include reducing the time to issue a query, accelerating time to database lock, automating & streamlining routine data management processes, and allowing data managers to focus on high-value, complex queries. Moreover, it’s fully scalable across your portfolio. 

The common challenges in data management

One of the biggest challenges in data management is the decreased footprint of EDC in terms of data capture solutions in the industry. As recently as 5 – 10 years ago, it was common practice for up to 95% of clinical data to be captured in an EDC system. This meant that legacy EDC-centric processes worked well enough to manage, curate, review, and monitor clinical trial data. 

Now studies have shown a significant drop in the prevalence of EDC systems as primary point of data capture, with between 40% and 70% of clinical trial data being collected outside of EDC. As a result, four core challenges have arisen when it comes to effective data collection, monitoring, and management which include:

  • Data chaos – All non-EDC-collected data is resource intensive to collect and integrate due to increased, varied data sources as well as large or time-series datasets.
  • Lack of self-service – Data Managers typically can’t access & transform the data they need across these sources to get timely and actionable insights to drive their decision-making.
  • Many roles for a single task – Within clinical trial management, individual tasks often require multiple roles, skill sets, and hand-offs to complete. This makes it difficult and time-consuming to complete these tasks in an efficient manner.
  • Manual processes don’t scale – At certain data volumes and formats, manual processes just cannot keep up when exception listings may include many thousands of observations. Large-scale data queries, listings, and manual reviews are not possible when working with large or complex datasets. 

What are the biggest challenges faced by Clinical Data Managers?

In a report shared by PerkinElmer, 63% of Clinical Data Managers reported that maintaining data quality was their biggest challenge, 50% stated trial complexity, and 47% stated inadequate technology to support the trial was the biggest challenge they faced.

When the majority of clinical data was still being captured via paper forms and transcribed into an EDC system, Clinical Data Managers could be assured that the data they required to perform their daily tasks were housed in a single system transfer, received at a standard frequency, and at a low to moderate volume with a separate source at the site that could be consulted if any questions arose. 

The explosion of different data capture solutions in the past few years has resulted in extreme differences in data volume, complexity, and frequency of data transfer acquisition. A steady increase in clinical data collected directly from patients & clinicians means that legacy CDM processes based on querying data at a later date are rendered largely obsolete. This complicates Clinical Data Managers’ abilities to synthesize data issues across multiple systems, domains, and formats. 

As for the technology used by Clinical Data Managers to support clinical trials, there’s a widespread inability in most tools to perform self-service data acquisition, data wrangling, and review and analysis. Additionally, most of the industry-standard tools don’t offer the ability to customize reviews according to the specifics of a given trial or therapeutic area. 

Current tools may be sufficient for elementary reviews, basic enrollment summaries, adverse event reporting rates, or patient disposition, but have limited viability when it comes to the review of disease-specific assessments, functional tests, or questionnaires. 

On top of this, manually processing clinical data is time-consuming, resource intensive, and requires input and feedback from different stakeholders working across siloed systems, delaying progress and increasing the risks of data errors and inaccuracies slipping through the cracks.

How Saama uses AI to drive efficiency across the data management lifecycle

At Saama, we leverage AI as an operational asset that drives efficiency across all data management processes. Specifically, we’ll be focusing on how supervised ML and Generative AI (GenAI) can be used to automate the processes that make up clinical data management. 

Saama’s AI-powered solutions for data management 

Saama’s Data Hub automatically aggregates and stores all clinical data in a single location and a standardized, universal format. From there, Smart Data Quality (SDQ) uses AI and ML models to analyze the data. First, the models identify and flag data discrepancies, and then  determine whether any of the identified discrepancies fall under the categories of review for the corresponding data manager. Then, our solution automatically generates query text that can be edited (if needed)and posted to the EDC system. 

A Data Manager can use these models and solutions to automate routine. data review processes. 

Using GenAI, data managers can ask SDQ to show subjects with certain characteristics or that match certain criteria – all in natural language. Data Managers can review and easily edit the resulting code, view a dry run, and add the listing to their study. The GenAI functionality reduces the time to create targeted listings – without any programming required.

Another application of GenAI for Clinical Data Management is the ability to generate their own data quality (DQ) checks similar to SAS or R data checks without any knowledge of clinical programming. Using a standard language prompt and leveraging a large language model algorithm, Data Manager requests are translated into functional programming code along with generated test data to ensure that the test results are accurate for a final, human-driven validation process

What needs to change within clinical data management to make way for AI?

As efficient and effective as it is, we can’t take a tool as disruptive as AI, drop it into an existing working model or ecosystem, and expect it to be a success. The biggest impact that applied AI has on clinical data management is the impact to legacy processes and consolidation of tasks across the data management lifecycle. 

With these ground-breaking AI technologies now available, the longstanding technical barriers that existed between Clinical Data Managers who understand which DQ checks need to be created and what data needs to be reviewed and the ability to program these activities in a variety of modalities has been removed. However, many current processes are formalized in SOPs and work instructions, clearly delineating the work by multiple roles with limited scope. 

Without changes to these current processes that require each of these roles to be present and accounted for, Data Managers won’t be able to capitalize on the speed, efficiency, and quality opportunities that AI presents. Historical skill sets will also need to evolve in tandem with processes and technology advancements. 


No matter where or how AI is deployed within clinical data management, it always needs to be monitored and reviewed by at least one human to ensure all decisions taken remain firmly in the hands of CDM personnel. If you’d like to learn more about how Saama’s proprietary suite of AI-powered platforms (including Data Hub and SDQ) can transform your data management workflows, book a demo with us and we’ll show you. 

Recommended Reading