Success story

Deploying a local LLM aka ChatGPT for the enhanced workflow

  • AdTech
  • AI transformation
  • eCommerce
  • Healthcare

Results given by any Generative AI product are not that perfect all the time. There is always room for improvement and better client-specific responses. 

So, we locally deployed an LLM Llama-3 for our AI sentiment analysis tool to boost the output quality and vanish data privacy concern. The goal was to: 

  • boost the tool capacity by making its communication style more relevant based on the customers’ preferable answers, 
  • and reduce the price for the solution. 

Project numbers

0.005
US dollars

price for processing 1,000 tokens

8bn
parameters

to set up more complex responses

382
score points

for response quality of Llama-3:8b tuned

Business challenges

Data sensitivity and privacy concerns

High costs for cloud LLM

Lack of quality in responses

Project details

Expertise used:

AI and NLP integration

Duration:

3 weeks

Team composition:

  • Full Stack Developer
  • Data Engineer
  • QA Engineer 

Service provided:

  • Data Engineering
  • AI transformation

Why a local LLM Llama 3? 

Keeping in mind the intention of people to use a tool like ChatGPT in their business and their data privacy concern, our R&D team has figured out the way out, which lies in deploying a private LLM. The point is, if you launch your own GPT (LLM) model locally in your infrastructure, data never goes outside and stays safe within the company. 

 

After comparing different LLMs and ChatGPT-4o, 3.5, we have picked Llama-3 since it is the best price-quality option. It is much cheaper, and the output quality is nearly the same as that of GPT-4o. As we have our AI sentiment analytics tool, we wanted to make its output as personalized as possible through fine-tuning Llama-3 launched in our own environment to that end. 

 

So, the task was to launch a private Large Language Model and get it fine-tuned so that our sentiment tool provides the client-specific responses, summaries, alerts, and potential data leakage or breach concern is gone. 

Technology challenges:

  • Lacking documentation for Llama 3 new model or guides how to launch this model in your environment. 
  • Requiring more computing resources for the solution than expected. 
  • Differentiating responses from the sentiment analysis tool for dataset. 
  • Preparing dataset for fine-tuning (samples of bad and good responses) collected in the particular format. 

Tech stack

Python
AWS
PyTorch
Nvidia cuda
Llama-3

Solution delivered

We internally delivered the cost-effective MVP, which formed the basis for and contributed to the flow of continuous improvement of our AI sentiment analytics tool’s output. Now, the higher output percentage is liked by customers. We just took into account responses related to the particular domain. This gave rise to the following capabilities:

Better response generation

By deploying Llama-3 and fine-tuning it with the dataset of the most suitable answers ranked by customers, we achieved more accurate and contextually relevant responses, enhancing the overall user experience. Now, our tool features attributable to auto-responses, categories breakdown, alert reasons, suggestions, and summary generation are more powerful than ever. 

Private data storage

Ensuring data privacy was a critical aspect of the solution. By hosting the LLM within our infrastructure, all data processing occurs internally, eliminating risks associated with data operations such as entry, export, etc.

Controlled performance

Deploying the LLM locally allows us to allocate computing resources flexibly. We can scale the model’s performance up or down based on demand, ensuring optimal efficiency and cost-effectiveness. The more computing resources we set, the quicker the model works. 

Continuous improvement of responses

The system is designed to learn and improve continuously. Through ongoing fine-tuning and updates, we can adapt the model to changing customer needs and improve the quality of the output over time. By incorporating feedback loops and regular updates based on user interactions, we ensure that the responses become increasingly accurate and appropriate. 

Fine-tuning literally stands for the post-training of our model, which can be continuously updated with the desired dataset for the expected results. 

How it works?

Our solution prioritizes security by leveraging a private AI model, Llama 3, deployed locally within our own infrastructure. This approach offers significant advantages over using public AI models hosted in the cloud. First, it means sensitive information never leaves our secure environment. Second, it is possible to maintain complete control over the large language processing model and its data processing activities, contributing to compliance assurance. 

 

Speaking of data processing, fine-tuning exponentially increased the tool performance due to the integration of bad and good response samples. This customization makes the LLM capable of better understanding the nuances and context of specific client needs, resulting in more contextually relevant and higher-quality outputs. 

Local LLM advantages

Cost savings

By launching a local LLM and using it repeatedly, expenses are reduced by much compared to cloud-based solutions.

Security

A local setup of the LLM eliminates the risk of data breaches and provides greater control over data security. 

Usability

Even if disconnected from the Internet, the system can continue to provide results, enjoying uninterrupted service and reliability.

Compliance

By keeping all data processing activities internally, you can be sure you are fully compliant with data protection regulations. 

It might be a private ChatGPT for healthcare or your domain

The future of this tool is promising. It can help you deal with data loads by composing emails, answering clients, getting summaries, providing quick responses, exempting you from the work-related and administrative tasks. 

It might be a booster in healthcare for patient experience, in e-commerce for the sales process, or in your use case. Security concerns? No big deal. We have a secure solution!

 

Alexey Litvin

CEO

 

 

Learn our clients’ experience

Copyright © 2025 GreenM, Inc. All rights reserved.