Success Story

Batch and streaming data pipeline for autonomous vehicle systems

Automotive
Data engineering

Our data engineering team built a scalable data pipeline that performs batch and stream processing of autonomous vehicle (AV) data with high speed and accuracy. To enable seamless analytics, AV performance monitoring, and operational metrics processing, we integrated an analytics platform that allows:

Data analysts to do ad-hoc analysis and reporting, monitor and react to events in real-time, derive insights, and make data-driven decisions;
Data engineers to ingest and transform new data channels for further data analysis, reporting, machine learning, and exports;
Autonomy engineers & robotics engineers to receive data input faster, facilitating quicker iterations and improvements to hardware.

Project numbers

minutes in batch processing

We reduced the time between uploading data from an AV to the system at the end of a shift and ingesting it into the platform from 24 hrs to an average of 15 min.

minute in stream processing

We reached the acceptance criteria in stream data processing of 1 minute between a vehicle and an expert to expedite AV system performance optimization cycles.

Project details

Country:

United States

Industry:

Autonomous vehicles

Expertise used:

Data engineering

Service provided:

Software and data architecture
Data pipeline development
Business intelligence, platform administration, and maintenance

Project background

The client is a self-driving car system supplier company that builds autonomous vehicles for cities. Their service is organized into shifts on defined routes in several cities. After an AV finishes its shift, it returns to the garage and uploads logs from its gauges, devices, sensors, and hardware to the system. Subsequently, the information uploaded is used by data analysts, data scientists, ML engineers, autonomy engineers, robotics engineers, and others to improve their vehicles continuously.

Although the company had an in-house ETL to make the data available to its end users, it had significant flaws that forced it to invest in a new data pipeline. The main problem of the existing data infrastructure was its sequential operation, which handled one log at a time. It took at least one day to make the data available for further usage, i.e. ad-hoc analysis, reporting, ML, simulation, etc. The company’s use cases include real-time event detection, previously done manually.

Technical challenges

Difficulty and costs in deploying new branches
Manual effort in detecting real-time events
Lack of a centralized analysis playground
Drawbacks in data governance

Tech stack

Python

AWS

Databricks

Business challenges

Reaching ~100% vehicle autonomy

Ensuring prompt data ingestion

Cutting high infrastructure costs

Solution delivered

The new data pipeline reduces the time needed to process and prepare AV data for further work. The data infrastructure has been modernized, and distributed data processing capabilities have been achieved. Scalability issues and delays in data processing have been cut down. Instead, the following capabilities were gained:

Data ingestion: Logs from vehicle gauges, devices, sensors, and hardware are ingested comprehensively, which ensures the capture of critical operational data.
Data storage: The collected data is stored in a centralized data lake infrastructure, which provides easy access, retrieval, and a scalable repository for further processing.

Batch & stream processing: Data processing occurs both in batch and through stream manners, which enables timely insights and accommodating varying data volumes.

Thorough data cleaning: Advanced processing techniques, including filtering, cleansing, augmentation, and schema determination, are applied to prepare raw data for analysis.
Data transformation: The cleaned data is transformed into business-ready formats, generating statistics, insights, and facts tailored to the requirements.
Data accessibility: Business-ready data is available to data analysts, data scientists, ML engineers, autonomy engineers, robotics engineers, and others through the user-friendly platform ensuring strong data governance.

Ad-hoc analysis, reporting, and ML: A dedicated platform for in-depth analysis is integrated, offering advanced analytics and ML toolkit.
Legacy pipeline maintenance: While transitioning to the new pipeline, we ensured the continued maintenance of the legacy system to prevent disruptions in data processing.

Data analyzed

The company is interested in analyzing data coming from their AVs. This data provides valuable insights into their vehicles’ performance, safety, and efficiency, which is required for further work of autonomy engineers and robotics engineers. A non-exhaustive list of analyzed data sets is provided as follows:

Other (traffic accidents, vehicle equipment failures, etc.).

Gauges (speed, distance, time, etc.);
Run levels (autonomy, manual, etc.);
Geo-location data (global positioning system and spatial position data);
Lidar & object determination (obstacles, pedestrians, vehicles, etc.);
Movement trajectory (intersection driving, route schedule, simulation map, etc.);
Unexpected maneuvers (high lateral acceleration, sudden lane changes, etc.);
Interventions (driving mode switching or autonomy mode interruption reasons);

Benefits

Reduced time efforts

The developed data pipeline has significantly sped up data ingestion and availability for end users, which enhanced the ability to do prompt analysis, reporting, and system optimizations.

Scalable architecture

The new architecture allows the system to easily accommodate increasing data volumes without compromising performance, growing infrastructure costs, or maintenance needs.

No more manual work

There is no longer a need for manually filled Excel spreadsheets while detecting real-time events since the pipeline automates calculations during streaming.

Actionable data insights

The availability of timely and accurate business-ready data and modern dashboarding capabilities enabled informed decision-making, strategic planning, and operational responsiveness.

Streamlined workflows

The analytics platform provided has streamlined data operations, which leads to smoother and faster workflows, achieving an excellent level of cooperation, solid security, and convenience in data governance.

Learn our clients’ experience

GreenM brings both deep expertise and a highly effective development team to every project they work on. In my time working with GreenM at NRCHealth, they not only delivered every project to spec and on time, but also elevated the level of our whole engineering department with their organizational and architectural best practices.

Alex Gallichotte

BI Department Lead, Fair

Great communication, fantastic partner, really smart about data and health data in particular. Senior Management are some of the best technical people I’ve ever worked with in more than 13 years. They consistently exceed expectations.

Nathan Seaman

VP of Product, Human API

GreenM team has a lot of experience with AWS. They have deployed several solutions. Their knowledge is up to date and I’d highly recommend them to anyone who needs to build BI/analytics leveraging AWS.

Leonid Nekhymchuk

Chief Technical Officer, VisiQuate Inc.

We have worked with Alexey and the team at GreenM on many projects and have consistently been impressed with the quality of their work. They hire very highly skilled individuals and strive to understand not just our immediate needs but the underlying issues and how we can improve the process.

Daniel Sherer

Chief Technical Officer, MedASTUTE Consulting, LLC

I’ve leveraged technical help from GreenM on numerous consulting projects from basic AWS setup and administration to implementing complex design using serverless managed AWS services for rapid development of scalable solutions to clients. GreenM has always delivered on-time and is a great partner to collaborate with.

BJ Choi

SVP Engineering, Quantive Radianse

GreenM is Starschema’s key partner from 2021. GreenM provided its services at a time when the market was looking for the most talented resources who are not only experienced but can also quickly manage the constantly changing technology world. GreenM quickly adapted to the Starschema working culture and high standards, and delivered technical professionals who could blend in easily. GreenM is a highly recommended partner for supporting the growth of any technical company with highly skilled and motivated professionals.

Istvan Kovacs

Delivery Lead, Starschema Ltd.

More case studies

Online property bidding platform

View more →

We created a property auction product from scratch with only client's idea. Its goal was to streamline the property buying and selling process, make it intuitive, scalable, and transparent to all users.

MVP development
Real estate

200 req/sec can be handled by new bidding website
4 mos took to get MVP out + first agents enrolled
up to 1,000 users can use the site concurrently

AI-powered feedback analysis tool

View more →

We crafted TreviseAI, a customer feedback analysis tool, which helps to transform reviews into actions, get customized responses, monitor critical sentiment shifts, and discover actionable insights.

AdTech
AI transformation
eCommerce
Healthcare

It took us 2 months to release the MVP version
5 teammates were involved in the process
3,000 characters may be processed at a time

Tableau performance optimization audit

View more →

We conducted a Tableau implementation audit for the AdTech company to identify and resolve bottlenecks in their tool performance. The goal was to enhance the speed and efficiency of their Tableau dashboards and reports.

AdTech
BI & data analytics
Tech consulting

It took us 80 hours to issue the report
3 data experts were assigned to audit Tableau
Tool was enhanced by 50% due to our suggestions

View all case studies