Intelligent Insight Engine and Data Extraction & Integration from Unstructured Documents

Our client has generated a tremendous volume of data which was processed and stored in a variety of unstructured documents, such as ELN outputs, Spreadsheets, and PDFs. With data embedded in these documents, it couldn’t be easily extracted or transformed and would end up in a document store (i.e., Sharepoint). The client needed a solution that would enable them to extract, aggregate, and index data from unstructured documents, particularly scientific data found in tables and graphs. The primary challenge was dealing with a variety of complex table layouts, such as merged cells, header types, and various locations of units of measurement.

Delivery

Arrayo provided a customized Machine Learning based Intelligent Insight Engine, leveraging NLP and controlled vocabularies, to extract data from tables and graphs of unstructured ELN output documents, PDF reports, PowerPoint presentations, and Excel spreadsheets. The Intelligent Insight Engine was delivered as a containerized application hosted on an Amazon Web Services (AWS) cloud environment, developed with custom Python code, and leveraged Open-Source Technologies. The Engine was delivered on the client’s choice of cloud platform and integrated successfully with their digital eco-system.

Value

The Intelligent Engine was successful in the identification of complex tables, extraction of scientific data, and storage of the extracted data in a database. The solution enabled the pharmaceutical company to perform downstream data science activities with the newly extracted data, such as AI modeling, downstream analysis, and visualizations. The Engine also gave our client the ability to search and explore data, as data were stored with key-value pairs.

Intelligent Insight Engine and Data Extraction & Integration from Unstructured Documents

Delivery

Value

Related Insights

Splicing Event Navigator

Exon Dashboard

Genomics Pipeline

Custom LIMS System

Scalable NGS Pipelines for Bioinformatics Groups

Molecular Structure Search Engine

Self-Guided Analytics Pipeline

Biomarker Informatics

Leveraging NLP for Improved Data Extraction

Building a Knowledge Graph from Audio Interview Data

Using NLP to Extract Biologics Data for R&D Support

Providing Laboratory Informatics System Services in a Benchl...

LIMS Software Integration

Follow us on Social Media

Arrayo

Your new source for insights at the intersection of data, financial services, and life sciences.

Intelligent Insight Engine and Data Extraction & Integration from Unstructured Documents

Delivery

Value

Related Insights

Splicing Event Navigator

Exon Dashboard

Genomics Pipeline

Custom LIMS System

Scalable NGS Pipelines for Bioinformatics Groups

Molecular Structure Search Engine

Self-Guided Analytics Pipeline

Biomarker Informatics

Leveraging NLP for Improved Data Extraction

Building a Knowledge Graph from Audio Interview Data

Using NLP to Extract Biologics Data for R&D Support

Providing Laboratory Informatics System Services in a Benchl...

LIMS Software Integration

Follow us on Social Media

Join The ARRAYO Newsletter!

Arrayo