Enhance Operational Efficiency and Key System SLA Compliance by Improving Workflow Orchestration

Background

Business approach

The client is a large, US-based healthcare analytics provider. The company’s solution helps customers with a variety of insights into patient satisfaction, health risks, and more. Overall, the particular product is used to deal with a large amount of data, which must be collected, analyzed, and transformed automatically.

The platform data ingestion flow was originally managed by the custom workflow orchestration tool. That tool was a bottleneck in the data pipeline, which resulted in regular data delays. The engineering team spent a lot of time fixing the issues, rather than focusing on business tasks.

After a deep market review and exploring options to address the issue, we decided to replace the existing tool. We chose Apache Airflow, and by implementing this tool we were able to achieve a stable and scalable data flow. The choice was based on the importance of retaining the ability to rollback any of our changes, because any decisions made may need to be checked.

The result was a system that could handle a large number of data pipeline tasks. The migration was made with the ability to rollback changes to the previous version. The number of delays and issues was reduced to the bare minimum. As a result, the client-side had more time to deal with business tasks.

Tech approach

The project was a data pipeline, which transferred data from OLTP data sources to the Business Intelligence platform for further analysis and reporting. The data pipeline was a set of 6 sub-pipelines (at the time when the project started), which were handled by a custom workflow orchestration tool.

Each sub-pipeline served a separate BI module and was a unit for work, which consisted of:

List of tasks – SQL queries, executable files, jars, etc.
List of steps – logical units of work declared via YAML files with custom structures.
Config files (JSON, XML, etc.) – additional configuration files required for ETL.

The main challenge was to replace the orchestration tool while maintaining the existing data pipeline jobs’ configuration that developers were accustomed to working with. In the case of any major changes and new sub-pipeline process implementations, the client-side team would need to get accustomed to completely new processes. We took this into account and proposed a solution, which we eventually implemented.

The Airflow we selected and installed allowed us to successfully migrate the entire system with minor changes and achieve the performance required of the data pipeline. Furthermore, Airflow allowed us to increase the number of automated data pipeline tasks and simplify monitoring and maintenance. This led to a reduction in new issues and a time reduction for the client’s specialists working with the data pipeline.

Challenges

As aforementioned, the custom workflow orchestration tool had numerous problems, namely:

Scalability – the tool performance slowed with the increase in the number of sub-pipelines.
Visibility – the UI of the instrument was extremely bad and didn’t allow the execution history to be saved.
Monitoring – the existing logging toolkit was insufficient, and it was almost impossible to set up any integration with third-party monitoring tools.
Automation – in the light of failures and problems, the system didn’t restart automatically and everything had to be done manually.

All these problems led to a very serious slowdown of the entire data pipeline system. Although it was critical to preserving the existing sub-pipelines, the implemented Airflow framework solved all the above problems. However, to implement this solution, our team faced several problems. 

We needed to expand Airflow functionality in order to support the client’s data pipeline. 
It was also obligatory to minimize the set of components that needed replacing during the migration process.
Of course, we couldn’t simply replace one system with the other. All existing solutions need to be tested and evaluated first. Therefore, our team implemented a Proof of Concept in order to evaluate the pros and cons of each framework, including Airflow. The final result was satisfactory.

Value delivery

Working on the project, our team achieved the following results:

Implementing the plugin to support configurations of existing data tasks in Airflow

It was critical to maintain the configuration of the existing data tasks in the new Airflow system. To do this, it wasn’t enough to simply implement Airflow, it was necessary to expand the functionality of the framework. Our team has implemented custom Airflow operators, so that they can read data pipeline steps and translate them into Airflow steps. As a result, this custom Airflow plugin allowed us not only to migrate existing sub-pipelines with minimal changes but also to increase the overall readability of the system.

Automating the manual restart operation

The existing workflow orchestration system didn’t have an automated restart feature in the event of any trouble or shutdown. As everything had to be done manually, engineers spent a lot of time on this. Our team decided to develop a new Airflow feature for custom operators and implemented an auto-restart mechanism with a configured number of retries. From now on, in the event of any unforeseen issues or delays, the system automatically continues the execution of the data pipeline.

Improving maintainability and troubleshooting

Our team added structured logs streamed to AWS, which allowed us to configure alerts, subscriptions, statistic dashboards based on logs, and improve overall monitoring.

Improving audit logging

By adding additional user information to the logs, our team managed to increase the overall security of the system.

GOT A HEALTH TECH
PROJECT IN MIND?

Together we can develop the great solution to maximize the return on your investment in data.

Let's Talk

Technical Info

Our team consisted of 5 engineers: 3 data developers and 2 data QA engineers. 

The technology stack included Apache Airflow, AWS EMR, AWS ECS, Vertica, and AWS RDS Postgres.

Prior to our participation, problems in the custom workflow orchestration tool had been arising with an approximate frequency of 1-2 per week. Within 3 months, our team completely replaced the existing workflow orchestration tool and implemented a new solution. This allowed us to solve all the issues with Data Pipeline without significant changes in client data tasks, and automate and optimize all processes within the project.

About the case study author

My name is Leonid Sokolov. For the last 5 years I’ve been leading architecture and implementation of enterprise-level BI and DWH solutions in the healthcare domain. For the last 10 years I’ve been managing Data Systems. If your data system is diagnosed with one of the common illnesses of today and looks like silos of DBs, or has a lack of scalability and agility, please fill in the contact form and I’d be glad to share recommendations of a good cure for your case over a virtual cup of coffee.

Technology Stack

Workflow Management

Apache Airflow

Data Engineering

AWS EMR

AWS Serverless

AWS ECS

Analytics and Databases

Vertica

AWS Serverless

AWS RDS Postgres

Case studies

Communicate with your data using natural language

We created an intuitive NLP-to-SQL system that simplifies database interaction by converting natural language inputs into SQL queries, and
making data access easy and effortless for non-tech-savvy users with the help of AI.