Case Study

Developing a Data Processing Pipeline

The client required a data processing pipeline for instrument data, using an adaptive Design of Experiments solution to include process monitoring data. The merged data needed to be transformed with additional calculations, and exported in a standardized format that was suitable for downstream analysis using custom analytics and machine learning tools.


This solution was created as a modular data processing pipeline comprised of a data ingestion module for reading the input data files, a prototype of Parser and Data Mapping module for extraction of standardized data items, a computational module for data transformations, and an output/export module with configurable data format specification. Arrayo chose to build out in KNIME considering scalability and flexibility were critical. As part of the solution infrastructure, Arrayo used an industry-standard set of tools for project management, software code versioning, CI/CD, and documentation.


The initial project was designed to quickly demonstrate the concept and initiate further discussions on a larger follow-up development effort where a more flexible production grade solution would be built. After these discussions, Arrayo was able to build and scale up a quality KNIME pipeline. The new pipeline could process different files as they came and included a parser component that recognized several spreadsheet formats.

Related Insights