As data are digitized and collected in usable, well architected data lakes, the true value and potential can be liberated by tools that allow for visualization and answering specific business and scientific questions. Many of the use cases that consume data from a data lake are non-visual processes where tertiary analytics and other imputations are made. When it comes to novel discovery and “-omics” type analysis, visual applications that allow the user to quickly and efficiently comb through data contained in the underlying lake make a significant difference for experimental design and leading research investigations. Our client was a large, pharmaceutical client with a mature data lake and service infrastructure to serve their proprietary data in visual applications as well as integrate public and partner Electronic Medical Records (EMR) data in a safe, regulated manner.
Data Visualization and Static Views
Exploratory processes for answering patterns and associations of data was accomplished by 3 main methods:
- R / R-Shiny static figures and interactive elements
- HTML/JS/CSS MVC Web Applications
Our client had legacy applications built with frameworks such as R-shiny, asp.net (single page, form submissions), and multiple other early 2000’s frameworks that were no longer supported or maintained. Our team worked alongside their engineers and programmers to determine the number of applications and their specific function. The types and sources of the data these applications needed to consume was analyzed and the applications prioritized. Based on these requirements the scope of this project was defined. Arrayo created microservice APIs that mirrored the data in the data lake and warehouse appropriately. To simplify and unify the bioinformatics tool suite already offered by our client, we designed a single page application with an entry point of a homepage that displayed the available tools cataloged by taxonomy (DNA, RNA, Protein, expression, global set enrichment analysis, etc.). The home page linked out to various single-page applications (SPA) that allowed users to select relevant data in the warehouse and lake to analyze. The home page was also integrated with Jira for external users outside of the bioinformatics group to submit requests on particular analysis that were in process.
We used a modern model-view-controller (MVC) approach to engineer our client applications to function with RESTful APIs that were built as part of the microservices. A common SPA wrapper was created for the various visualization frameworks. R-shiny was embedded directly within HTML and data was served to the application from RESTful APIs. The wrapper essentially constructed arguments and configurations necessary to instantiate applications with the appropriate data selected by a user.
Spotfire visualizations were added and maintained by our Arrayo team and served to users within another SPA wrapper that integrated these applications within the homepage / gateway.
Some of the SPAs we created were re-writes of the antiquated asp.net applications already in use at the client site. These applications were written in a common framework and served as part of our gateway. The server infrastructure was again modified and moved to a RESTful API / microservice environment.
Many of the tools needed the ability to export figure-quality images (for publication) as well as export the data in the proper format for publication submission. This would automatically handle obfuscating proprietary fields that are not required for publication. Another API was created to support exporting data as required by many peer-reviewed publication journals.
- With a design thinking approach created intuitive graphic user interfaces that streamline scientist’s interaction with web applications.
- Effectively modernized a variety of disparate systems and increased performance
- Using SOLR web services created “google” like interface to search across studies, analysis tools and projects
- These applications created a data/ data visualization as a service environment reducing ad-hoc requests to the translational bioinformatics team
- Integrated various data assets and tools across groups allowing for permission-based access as well as tracking of tool utilization and asset use.
- Created intuitive Graphic User Interfaces (GUIs) that reduce training time and streamlines laboratory work.