Success story

Deploying a local LLM aka ChatGPT for the enhanced workflow

AdTech
AI transformation
eCommerce
Healthcare

Results given by any Generative AI product are not that perfect all the time. There is always room for improvement and better client-specific responses.

So, we locally deployed an LLM Llama-3 for our AI sentiment analysis tool to boost the output quality and vanish data privacy concern. The goal was to:

boost the tool capacity by making its communication style more relevant based on the customers’ preferable answers,
and reduce the price for the solution.

Project numbers

0.005

US dollars

price for processing 1,000 tokens

8bn

parameters

to set up more complex responses

382

score points

for response quality of Llama-3:8b tuned

Business challenges

Data sensitivity and privacy concerns

High costs for cloud LLM

Lack of quality in responses

Project details

Expertise used:

AI and NLP integration

Duration:

3 weeks

Team composition:

Full Stack Developer
Data Engineer
QA Engineer

Service provided:

Data Engineering
AI transformation

Why a local LLM Llama 3?

Keeping in mind the intention of people to use a tool like ChatGPT in their business and their data privacy concern, our R&D team has figured out the way out, which lies in deploying a private LLM. The point is, if you launch your own GPT (LLM) model locally in your infrastructure, data never goes outside and stays safe within the company.

After comparing different LLMs and ChatGPT-4o, 3.5, we have picked Llama-3 since it is the best price-quality option. It is much cheaper, and the output quality is nearly the same as that of GPT-4o. As we have our AI sentiment analytics tool, we wanted to make its output as personalized as possible through fine-tuning Llama-3 launched in our own environment to that end.

So, the task was to launch a private Large Language Model and get it fine-tuned so that our sentiment tool provides the client-specific responses, summaries, alerts, and potential data leakage or breach concern is gone.

Technology challenges:

Lacking documentation for Llama 3 new model or guides how to launch this model in your environment.
Requiring more computing resources for the solution than expected.
Differentiating responses from the sentiment analysis tool for dataset.
Preparing dataset for fine-tuning (samples of bad and good responses) collected in the particular format.

Tech stack

Python

AWS

PyTorch

Nvidia cuda

Llama-3

Solution delivered

We internally delivered the cost-effective MVP, which formed the basis for and contributed to the flow of continuous improvement of our AI sentiment analytics tool’s output. Now, the higher output percentage is liked by customers. We just took into account responses related to the particular domain. This gave rise to the following capabilities:

Better response generation

By deploying Llama-3 and fine-tuning it with the dataset of the most suitable answers ranked by customers, we achieved more accurate and contextually relevant responses, enhancing the overall user experience. Now, our tool features attributable to auto-responses, categories breakdown, alert reasons, suggestions, and summary generation are more powerful than ever.

Private data storage

Ensuring data privacy was a critical aspect of the solution. By hosting the LLM within our infrastructure, all data processing occurs internally, eliminating risks associated with data operations such as entry, export, etc.

Controlled performance

Deploying the LLM locally allows us to allocate computing resources flexibly. We can scale the model’s performance up or down based on demand, ensuring optimal efficiency and cost-effectiveness. The more computing resources we set, the quicker the model works.

Continuous improvement of responses

The system is designed to learn and improve continuously. Through ongoing fine-tuning and updates, we can adapt the model to changing customer needs and improve the quality of the output over time. By incorporating feedback loops and regular updates based on user interactions, we ensure that the responses become increasingly accurate and appropriate.

Fine-tuning literally stands for the post-training of our model, which can be continuously updated with the desired dataset for the expected results.

How it works?

Our solution prioritizes security by leveraging a private AI model, Llama 3, deployed locally within our own infrastructure. This approach offers significant advantages over using public AI models hosted in the cloud. First, it means sensitive information never leaves our secure environment. Second, it is possible to maintain complete control over the large language processing model and its data processing activities, contributing to compliance assurance.

Speaking of data processing, fine-tuning exponentially increased the tool performance due to the integration of bad and good response samples. This customization makes the LLM capable of better understanding the nuances and context of specific client needs, resulting in more contextually relevant and higher-quality outputs.

Local LLM advantages

Cost savings

By launching a local LLM and using it repeatedly, expenses are reduced by much compared to cloud-based solutions.

Security

A local setup of the LLM eliminates the risk of data breaches and provides greater control over data security.

Usability

Even if disconnected from the Internet, the system can continue to provide results, enjoying uninterrupted service and reliability.

Compliance

By keeping all data processing activities internally, you can be sure you are fully compliant with data protection regulations.

It might be a private ChatGPT for healthcare or your domain

The future of this tool is promising. It can help you deal with data loads by composing emails, answering clients, getting summaries, providing quick responses, exempting you from the work-related and administrative tasks.
It might be a booster in healthcare for patient experience, in e-commerce for the sales process, or in your use case. Security concerns? No big deal. We have a secure solution!

Alexey Litvin

CEO

Learn our clients’ experience

GreenM brings both deep expertise and a highly effective development team to every project they work on. In my time working with GreenM at NRCHealth, they not only delivered every project to spec and on time, but also elevated the level of our whole engineering department with their organizational and architectural best practices.

Alex Gallichotte

BI Department Lead, Fair

Great communication, fantastic partner, really smart about data and health data in particular. Senior Management are some of the best technical people I’ve ever worked with in more than 13 years. They consistently exceed expectations.

Nathan Seaman

VP of Product, Human API

GreenM team has a lot of experience with AWS. They have deployed several solutions. Their knowledge is up to date and I’d highly recommend them to anyone who needs to build BI/analytics leveraging AWS.

Leonid Nekhymchuk

Chief Technical Officer, VisiQuate Inc.

We have worked with Alexey and the team at GreenM on many projects and have consistently been impressed with the quality of their work. They hire very highly skilled individuals and strive to understand not just our immediate needs but the underlying issues and how we can improve the process.

Daniel Sherer

Chief Technical Officer, MedASTUTE Consulting, LLC

I’ve leveraged technical help from GreenM on numerous consulting projects from basic AWS setup and administration to implementing complex design using serverless managed AWS services for rapid development of scalable solutions to clients. GreenM has always delivered on-time and is a great partner to collaborate with.

BJ Choi

SVP Engineering, Quantive Radianse

GreenM is Starschema’s key partner from 2021. GreenM provided its services at a time when the market was looking for the most talented resources who are not only experienced but can also quickly manage the constantly changing technology world. GreenM quickly adapted to the Starschema working culture and high standards, and delivered technical professionals who could blend in easily. GreenM is a highly recommended partner for supporting the growth of any technical company with highly skilled and motivated professionals.

Istvan Kovacs

Delivery Lead, Starschema Ltd.