Case Study

Clinical Data Sandbox

 Our client wanted to enhance visibility into clinical trial data. The company needed to combine both legacy and incoming data from various internal and external data sets, which required building out novel pipelines.


  • Modular and reusable code has been developed using best practices in Python to automatically read and format the following for further use:

a. Clinical data in SDTM format from multiple clinical trials

b. Clinical metadata from an API and Excel file from multiple clinical trials

c. Public DBs such as the AACT/CTTI clinical database that contain data on past and current trials as well as metrics and metadata.

  • Data structures have been designed to hold key data elements and associated variables (e.g., patients, sites, visits, PIs, etc.) and have been documented for re-use by other developers.
  • The data from item 1 has been delivered in a format that can be used by data analysts to easily create metrics/visualizations. Samples and a tutorial have been provided.
  • A framework for testing ML models has been built, which includes appropriate visualizations for predictive features and variable importance metrics.


Arrayo provided a solution by building an interactive platform using a mix of open source and proprietary technologies. Additionally, we developed proof-of-concept machine learning algorithms to identify the attributes that adversely impacted trial outcomes for the company. This allowed clinical teams and upper management to view statistics of trials and understand how to improve trial design.