INSIGHTS

Case Study

Genomics Pipeline

Our client had an existing data pipeline that they used to connect, clean, validate, annotate, and ingest genomic data from various sources. They faced challenges in their data harmonization efforts in terms of maintenance, updates for new data source versions, troubleshooting, and scalability.

Delivery

We started by reviewing all the processes together and depicting the pipeline from end-to-end, benchmarking it as a whole as well as per-step to (1) become familiar with the various processes and (2) understand where performances could be improved. Next, we studied potential solutions like Knime, Airflow, and Spark on Amazon Web Services (AWS) EMR and benchmarked them.

Value

Using Spark, the amount of code shrunk considerably, so maintenance became a minor issue and both version updates and scalability became a matter of configuration via the UI. Performances increased notably as the first benchmarks showed a significant improvement in processing time.

Finally, the output fed a NoSQL database that could be exploited using different querying tools, such as plugging in notebooks for python, R, Scala, or Athena depending on the profile of the end user and their way of accessing the data.

Related Insights

Life Sciences and Healthcare

Bioinformatics Services

Arrayo was asked to work on a pilot study to identify biomarkers and their functions affected by treatment as well as to iden...

Life Sciences and Healthcare

Biomarker Informatics

Our client had an ex vivo tumor evaluation platform that maintained the tumors’ cellular structure. The platform collected ...

Life Sciences and Healthcare

Exon Dashboard

Our client was looking to develop an easy-to-navigate and insightful exon dashboard.

Life Sciences and Healthcare

Scalable NGS Pipelines for Bioinformatics Groups

A company needed to run bioinformatics pipelines to quickly generate insights from experimental results. Additionally, they f...

Life Sciences and Healthcare

Machine Learning and Protein Design

A major pharmaceutical company wanted to develop novel vaccine antigens for the next generation of influenza vaccines and adv...

Life Sciences and Healthcare

Splicing Event Navigator

Our client found difficulties in visualizing, understanding, and annotating custom defined RNA splicing events through the op...

Life Sciences and Healthcare

Custom LIMS System

Our client was preparing for their new compound library and equipment coming in and needed the proper platform to support lab...

Life Sciences and Healthcare

Intelligent Insight Engine and Data Extraction & Integra...

Our client has generated a tremendous volume of data which was processed and stored in a variety of unstructured documents, s...

Life Sciences and Healthcare

Molecular Structure Search Engine

Our client developed and maintained a state-of-the-art search engine for its compounds. The client’s initial goal for Array...

Life Sciences and Healthcare

Self-Guided Analytics Pipeline

Our client needed us to develop a series of guided analytics pipelines centered around extracting valuable insight into possi...

Life Sciences and Healthcare

Leveraging NLP for Improved Data Extraction

One of our big pharmaceutical clients needed a custom solution that would enable them to aggregate and index data from a vari...

Life Sciences and Healthcare

Building a Knowledge Graph from Audio Interview Data

Our client needed data processing tools that would enable them to aggregate and index data from audio interviews of patients....

Life Sciences and Healthcare

Using NLP to Extract Biologics Data for R&D Support

Arrayo delivered a cloud-based solution comprised of the following components:

Life Sciences and Healthcare

Providing Laboratory Informatics System Services in a Benchl...

The client needed several Laboratory Informatics system services in Benchling and was not able to establish them with their i...

Life Sciences and Healthcare

LIMS Software Integration

Researchers devoted a great deal of time (50+ hours weekly) manually copying and pasting data from an animal study data manag...

Load More

Follow us on Social Media

© SteepConsult, Inc. dba Arrayo All Rights Reserved