INSIGHTS
Case Study

Generative Therapeutics: Dry-Wet Lab Connectivity

For a leader in generative therapeutics, we married biological engineering and machine learning to change how medicines are made, designed, and developed. They had computational biologists overwhelmed with work from their wet lab users who needed results run, read, interpreted, and routed along to other teams, and no harmony in their approach to pipeline development. Arrayo and our client’s scientists worked together to develop a self-service pipeline system and a web platform for computational biologists and research scientists to make work faster.

Challenge

The client had been quickly ramping up its development ahead of some groundbreaking work that allowed for programmable generative protein modeling and synthesis, and it was clear that support needed to be built around their workflow processes.
Previously a protein scientist would perform a step in an experiment that required a machine learning engineer or a computational biologist to execute. This was a blocking step and relied on someone being available to compose and run a Python script against some results or values specific to that experiment’s parameters and return them to the protein scientist, however oftentimes the requester may not have all the inputs the engineer may need to run the step, and there would have to be a lot of back-and-forth before a step could complete. This was an unsustainable, unreliable, and brittle process that caused delays that we needed to make more adaptable and cut processing time.

Delivery

We worked closely with all stakeholders to create a tailored process, leverage the existing architecture and integrations, and implement significant automation.
We created a wrapper to Docker-ize these Python scripts so that they could be run as a container and executed on demand. We set up the wrapper so it could pipe in any defined parameter to use in the pipeline operation via a configuration file. This also allowed pipeline developers control over their pipeline’s inputs, operation, and error handling. We built a web form in the client’s existing ecosystem for protein scientists to define the values for parameters of the pipeline. Upon submission on this webform, it would invoke the lambda function, and run the container with hydrated values to complete the operation. When the operation finished, the pipeline requester was notified of their results as soon as they were ready. This Docker-ized wrapper also served as a generalizable and flexible template for all future containerized operations. In the pipeline’s CI/CD, we created a means to register new versions of a pipeline with new inputs to allow further flexibility for requesters and developers. We also integrated into the client’s existing architecture to enable access to AWS resources like S3, or the ELN, without the need for a user to be an AWS user and continue to limit access to AWS when appropriate.

Value

We created a single reliable platform for protein scientists to execute a crucial step to analyze an experiment. Containerizing these scripts means they could run in the background without blocking either party, that inputs were standardized and validated, and that results could arrive as quickly as possible. This platform was extended to use multiple container orchestration tools depending on the pipeline’s resource needs. Shortly after launch, there were a dozen sequencing, QA/QC, and other pipelines were registered. This freed up countless hours of engineering and development time, while still giving the computational engineering team control over their pipelines

Related Insights