Case Study

Building a Knowledge Graph from Audio Interview Data

Our client needed data processing tools that would enable them to aggregate and index data from audio interviews of patients. They needed this for a knowledge graph that would include temporal markers of events preceding the onset of disease.


Arrayo provided engineering services to create a data processing solution for extracting a knowledge graph from audio interview transcription. This solution was created as a cloud-based platform comprised of the following components:

  • Repository for collecting and managing patient interview data files and associated meta-data (medical history, demographics, etc.)
  • Audio files transcription service for data extraction into text format.
  • Data processing pipeline.
  • Knowledge graph repository.
  • Indexed content storage solution and a search engine infrastructure.
  • A custom RESTful application programming interface (API) for integration capabilities.

Arrayo developed Software components using Python and SQL to ensure seamless integration with standard libraries, document APIs, and custom code. To enable incorporation of custom parser capabilities, a standard RESTful API was delivered, accompanied with JSON schema specification and validation.

The system was delivered as a containerized application that was hosted on an AWS cloud environment. As part of the solution, services to support continuous integration and continuous deployments (CI/CD) were delivered as well.


To summarize, we delivered a data management system to enable consistent and reproducible data processing and extraction of knowledge graph; audio file processing to extract data into text transcript; transcript processing with medical NLP approaches to extract knowledge graph; and component-based architecture design using data processing pipelines.

Related Insights