Success Story

Batch and streaming data pipeline for autonomous vehicle systems

  • Automotive
  • Data engineering

Our data engineering team built a scalable data pipeline that performs batch and stream processing of autonomous vehicle (AV) data with high speed and accuracy. To enable seamless analytics, AV performance monitoring, and operational metrics processing, we integrated an analytics platform that allows: 

  • Data analysts to do ad-hoc analysis and reporting, monitor and react to events in real-time, derive insights, and make data-driven decisions; 
  • Data engineers to ingest and transform new data channels for further data analysis, reporting, machine learning, and exports; 
  • Autonomy engineers & robotics engineers to receive data input faster, facilitating quicker iterations and improvements to hardware. 

Project numbers

minutes in batch processing

We reduced the time between uploading data from an AV to the system at the end of a shift and ingesting it into the platform from 24 hrs to an average of 15 min.

minute in stream processing

We reached the acceptance criteria in stream data processing of 1 minute between a vehicle and an expert to expedite AV system performance optimization cycles. 

Project details


United States


Autonomous vehicles

Expertise used:

Data engineering

Service provided:

  • Software and data architecture
  • Data pipeline development
  • Business intelligence, platform administration, and maintenance

Project background

The client is a self-driving car system supplier company that builds autonomous vehicles for cities. Their service is organized into shifts on defined routes in several cities. After an AV finishes its shift, it returns to the garage and uploads logs from its gauges, devices, sensors, and hardware to the system. Subsequently, the information uploaded is used by data analysts, data scientists, ML engineers, autonomy engineers, robotics engineers, and others to improve their vehicles continuously. 


Although the company had an in-house ETL to make the data available to its end users, it had significant flaws that forced it to invest in a new data pipeline. The main problem of the existing data infrastructure was its sequential operation, which handled one log at a time. It took at least one day to make the data available for further usage, i.e. ad-hoc analysis, reporting, ML, simulation, etc. The company’s use cases include real-time event detection, previously done manually. 

Technical challenges

  • Difficulty and costs in deploying new branches 
  • Manual effort in detecting real-time events 
  • Lack of a centralized analysis playground 
  • Drawbacks in data governance

Tech stack


Business challenges

Reaching ~100% vehicle autonomy 

Ensuring prompt data ingestion

Cutting high infrastructure costs

Solution delivered

The new data pipeline reduces the time needed to process and prepare AV data for further work. The data infrastructure has been modernized, and distributed data processing capabilities have been achieved. Scalability issues and delays in data processing have been cut down. Instead, the following capabilities were gained:

  • Data ingestion: Logs from vehicle gauges, devices, sensors, and hardware are ingested comprehensively, which ensures the capture of critical operational data. 
  • Data storage: The collected data is stored in a centralized data lake infrastructure, which provides easy access, retrieval, and a scalable repository for further processing. 
  • Batch & stream processing: Data processing occurs both in batch and through stream manners, which enables timely insights and accommodating varying data volumes. 
  • Thorough data cleaning: Advanced processing techniques, including filtering, cleansing, augmentation, and schema determination, are applied to prepare raw data for analysis. 
  • Data transformation: The cleaned data is transformed into business-ready formats, generating statistics, insights, and facts tailored to the requirements. 
  • Data accessibility: Business-ready data is available to data analysts, data scientists, ML engineers, autonomy engineers, robotics engineers, and others through the user-friendly platform ensuring strong data governance. 
  • Ad-hoc analysis, reporting, and ML: A dedicated platform for in-depth analysis is integrated, offering advanced analytics and ML toolkit. 
  • Legacy pipeline maintenance: While transitioning to the new pipeline, we ensured the continued maintenance of the legacy system to prevent disruptions in data processing.

Data analyzed

The company is interested in analyzing data coming from their AVs. This data provides valuable insights into their vehicles’ performance, safety, and efficiency, which is required for further work of autonomy engineers and robotics engineers. A non-exhaustive list of analyzed data sets is provided as follows: 

  • Other (traffic accidents, vehicle equipment failures, etc.).
  • Gauges (speed, distance, time, etc.); 
  • Run levels (autonomy, manual, etc.); 
  • Geo-location data (global positioning system and spatial position data); 
  • Lidar & object determination (obstacles, pedestrians, vehicles, etc.); 
  • Movement trajectory (intersection driving, route schedule, simulation map, etc.); 
  • Unexpected maneuvers (high lateral acceleration, sudden lane changes, etc.); 
  • Interventions (driving mode switching or autonomy mode interruption reasons);


Reduced time efforts

The developed data pipeline has significantly sped up data ingestion and availability for end users, which enhanced the ability to do prompt analysis, reporting, and system optimizations.

Scalable architecture

The new architecture allows the system to easily accommodate increasing data volumes without compromising performance, growing infrastructure costs, or maintenance needs. 

No more manual work

There is no longer a need for manually filled Excel spreadsheets while detecting real-time events since the pipeline automates calculations during streaming.

Actionable data insights

The availability of timely and accurate business-ready data and modern dashboarding capabilities enabled informed decision-making, strategic planning, and operational responsiveness.

Streamlined workflows

The analytics platform provided has streamlined data operations, which leads to smoother and faster workflows, achieving an excellent level of cooperation, solid security, and convenience in data governance.

Learn our clients’ experience

More case studies

Online property bidding platform 

View more →
We created a property auction product from scratch with only client's idea. Its goal was to streamline the property buying and selling process, make it intuitive, scalable, and transparent to all users.
  • MVP development
  • Real estate
  • 200 req/sec can be handled by new bidding website
  • 4 mos took to get MVP out + first agents enrolled
  • up to 1,000 users can use the site concurrently

AI-powered feedback analysis tool

View more →
We crafted TreviseAI, a customer feedback analysis tool, which helps to transform reviews into actions, get customized responses, monitor critical sentiment shifts, and discover actionable insights.
  • AdTech
  • AI transformation
  • eCommerce
  • Healthcare
  • It took us 2 months to release the MVP version
  • 5 teammates were involved in the process
  • 3,000 characters may be processed at a time

Tableau performance optimization audit

View more →
We conducted a Tableau implementation audit for the AdTech company to identify and resolve bottlenecks in their tool performance. The goal was to enhance the speed and efficiency of their Tableau dashboards and reports.
  • AdTech
  • BI & data analytics
  • Tech consulting
  • It took us 80 hours to issue the report
  • 3 data experts were assigned to audit Tableau
  • Tool was enhanced by 50% due to our suggestions

Copyright © 2024 GreenM, Inc. All rights reserved.