Case Study |

Bioinformatics systems with increased performance

Context

As data are digitized and collected in usable, well architected data lakes, the true value and potential can be liberated by tools that allow for visualization and answering specific business and scientific questions. Many of the use cases that consume data from a data lake are non-visual processes where tertiary analytics and other imputations are made. When it comes to novel discovery and “-omics” type analysis, visual applications that allow the user to quickly and efficiently comb through data contained in the underlying lake make a significant difference for experimental design and leading research investigations. Our client was a large, pharmaceutical client with a mature data lake and service infrastructure to serve their proprietary data in visual applications as well as integrate public and partner Electronic Medical Records (EMR) data in a safe, regulated manner.


The Project

Data Visualization and Static Views

Exploratory processes for answering patterns and associations of data was accomplished by 3 main methods:

  • R / R-Shiny static figures and interactive elements
  • Spotfire
  • HTML/JS/CSS MVC Web Applications

 

Our client had legacy applications built with frameworks such as R-shiny, asp.net (single page, form submissions), and multiple other early 2000’s frameworks that were no longer supported or maintained. Our team worked alongside their engineers and programmers to determine the number of applications and their specific function. The types and sources of the data these applications needed to consume was analyzed and the applications prioritized. Based on these requirements the scope of this project was defined. Arrayo created microservice APIs that mirrored the data in the data lake and warehouse appropriately. To simplify and unify the bioinformatics tool suite already offered by our client, we designed a single page application with an entry point of a homepage that displayed the available tools cataloged by taxonomy (DNA, RNA, Protein, expression, global set enrichment analysis, etc.). The home page linked out to various single-page applications (SPA) that allowed users to select relevant data in the warehouse and lake to analyze. The home page was also integrated with Jira for external users outside of the bioinformatics group to submit requests on particular analysis that were in process.

We used a modern model-view-controller (MVC) approach to engineer our client applications to function with RESTful APIs that were built as part of the microservices. A common SPA wrapper was created for the various visualization frameworks. R-shiny was embedded directly within HTML and data was served to the application from RESTful APIs. The wrapper essentially constructed arguments and configurations necessary to instantiate applications with the appropriate data selected by a user.

Spotfire visualizations were added and maintained by our Arrayo team and served to users within another SPA wrapper that integrated these applications within the homepage / gateway.

Some of the SPAs we created were re-writes of the antiquated asp.net applications already in use at the client site. These applications were written in a common framework and served as part of our gateway. The server infrastructure was again modified and moved to a RESTful API / microservice environment.

Many of the tools needed the ability to export figure-quality images (for publication) as well as export the data in the proper format for publication submission. This would automatically handle obfuscating proprietary fields that are not required for publication. Another API was created to support exporting data as required by many peer-reviewed publication journals.

As with many of our clients, the RESTful APIs that served data to client applications / MVC applications were hosted in AWS ECS. The microservice APIs were dockerized, stored in the Elastic Container Registry and automatically deployed to ECS via a continuous integration and deployment pipeline implemented by Arrayo. Graphic User Interface applications were not included as part of the RESTful APIs (this would be a monolithic application) and were served using the HTTP protocols that come with AWS S3. We developed a successful build pipeline that performed minification of javascript, css, and html files. These files were served from respective AWS S3 buckets. Permission and VPCs were configured to allow these javascript applications to directly communicate with the RESTful APIs, and OAuth 2.0 was handled by implementing organizational single-sign-on services provided by SiteMinder.


Outcome

    • With a design thinking approach created intuitive graphic user interfaces that streamline scientist’s interaction with web applications.
    • Effectively modernized a variety of disparate systems and increased performance
    • Using SOLR web services created “google” like interface to search across studies, analysis tools and projects
    • These applications created a data/ data visualization as a service environment reducing ad-hoc requests to the translational bioinformatics team
    • Integrated various data assets and tools across groups allowing for permission-based access as well as tracking of tool utilization and asset use.
    • Created intuitive Graphic User Interfaces (GUIs) that reduce training time and streamlines laboratory work.