Traceable Data Management System

Traversing data silos and understanding the connections across datasets is the biggest hurdle our clients are facing and the one we are solving with TDMS. We were asked to come up with a solution that allows easy usage and extraction of the data for decision-making without prior knowledge of the data structure and source.

Delivery

In short, the Traceable Data Management System (TDMS) is a robust data management solution that helps our client:

Manage the workflows of research lab data
Understand the underlying scientific semantics of the data/metadata
Provide the operation capability to run the data generation and processing
Allow for data versioning
Have an intuitive front-end browser (example below) for users to access data, control workflows, etc.

We developed a solution for standardized data provenance management and traceability that enables R&D organizations to keep track of data origin, transformations, ownership, and usage. The main components of the system include:

Traceable Scientific Data Network Browser (TDNB)- the exploration of scientific data provenance and context with domain knowledge and science-driven visualizations
Traceable Data Capture – automatic discovery agents (for historic data and ongoing locations such as lab instruments), human-curated data entry, add-on component, and API for existing external data processing pipelines and new software development efforts
Traceability Data Store – advanced flexible data model, infrastructure & datapoint-level security, built-in DevOps and DataOps
Analytics – built-in MLOps and Visualization Dashboards (both commercial and custom-made), science-aware solutions (cheminformatics, bioinformatics, etc.), algorithm and data versioning (Gitlab style)
Platform Apps and Components
- Pipelining App: an IDE for data pipelines and interpreter/runner of pipelines designed externally such as AWS Step, Airflow, KNIMEVisualization App: replacement for Spotfire/Tableau/Neo4J, core implementation of TDNB conceptSearch Engine App: replacement for Elastic Search, science-driven
context-aware
- algorithmParser App: converting files (CSV, PDF, Excel, images, etc.) into structured scientific dataLists management (grouping of data into named lists that become meta-data)
- Hardware management (such as off-the-shelf connectors for specific instruments lifecycle)
Scientific R&D Apps and Workflows
- Requestor App
- Data loader & QC agents
- Experiment design
- QSAR
- Data packaging (standardized datasets)

Value

TDMS Platform Apps help you answer business questions like:

How much data am I holding? What’s my data footprint?
How is the data connected? Who/What is generating it?
How is it used?
What kind of data do I have?
What is the replication/redundancy of that data?
What is the rate of data growth?
What’s the opportunity to save cost on data storage?
What kind of data do I need to keep, or delete? What should be my organizational data policies? How are they applied?
How does the data flow out/in SaaS?
Who can access datasets, and datapoints? Who has accessed it in the past and how?
How many data personnel do I need?
What software version are we using to analyze that dataset (infrastructure metadata)?

This is an AI-driven platform logic. It discovers and provides answers to questions thanks to metadata coming from different data islands (data files, infrastructure metadata, pipeline parameters, SaaS data).

This is a solution feeding into our clients’ existing data architecture.

Traceable Data Management System

Delivery

Value

Related Insights

Data Governance Initiative with Informatica Intelligent Data...

High-throughput DNA-based Screening Liquid Biopsy to Detect ...

Data Storage and Analysis Solution for Computational Biology...

Starting a Computational Ecosystem from Scratch

Developing a Data Processing Pipeline

Porting an Existing Pipeline into NextFlow

R&D Data Governance & Management Transformation

Follow us on Social Media

Arrayo

Your new source for insights at the intersection of data, financial services, and life sciences.

Traceable Data Management System

Delivery

Value

Related Insights

Data Governance Initiative with Informatica Intelligent Data...

High-throughput DNA-based Screening Liquid Biopsy to Detect ...

Data Storage and Analysis Solution for Computational Biology...

Starting a Computational Ecosystem from Scratch

Developing a Data Processing Pipeline

Porting an Existing Pipeline into NextFlow

R&D Data Governance & Management Transformation

Follow us on Social Media

Join The ARRAYO Newsletter!

Arrayo