Success story

Communicate with your data using natural language

  • AI transformation
  • Data engineering
  • eCommerce

In today’s data-driven business environment, organizations deal with loads of data stored, handled, and processed in complex databases. However, accessing this valuable data typically requires technical expertise in writing SQL queries.

So, we created an intuitive NLP-to-SQL system that simplifies database interaction by:  

  • converting natural language inputs into SQL queries, and 
  • making data access easy and effortless for non-tech-savvy users with the help of AI. 

Project numbers

0.10
US dollars

price for processing one query

90
percent

in saving time and resources

5-7
seconds

took the system to process a query

Business challenges

Data access obstacles and delays  

Data retrieval bottlenecks 

Dependency on technical staff 

Project details

Expertise used:

NLP, SQL, AI

Duration:

4 weeks

Team composition:

  • AI Engineer
  • Back-end Developer
  • QA Engineer 

Service provided:

  • Data Engineering
  • Data Analysis
  • AI transformation

Project background 

Every business manages vast amounts of data within their database, but not everyone can extract this valuable data using SQL queries. This presents a barrier for non-technical users who need to retrieve specific information quickly and efficiently. Also, this gap between technical capabilities and business needs often leads to inefficiencies and delays in decision-making processes. 

 

The idea for the NLP-to-SQL system emerged from this need to streamline data access. By enabling users to input queries in everyday language and receive accurate SQL queries and results in return, the system aims to empower non-technical users to retrieve valuable insights from their data seamlessly. This project represents a significant step towards making advanced data tools more accessible to a broader audience within an organization, ultimately enhancing productivity and decision-making capabilities.

Technology challenges:

  • Integrating vector embeddings for database schema representation; 
  • Managing issues with tokenization and embedding model; and 
  • Optimizing PostgreSQL for efficient vector similarity searches. 

Tech Stack

FastAPI
PostgreSQL
ChatGPT-4o
Docker

Solution delivered

Leveraging the latest advancements in natural language processing and machine learning, particularly GPT-4о, we developed a system that not only generates accurate SQL queries but also provides a user-friendly interface through Swagger UI. Key system features and capabilities include: 

  • Natural language query processing: Users can input questions or queries in plain English through an intuitive Swagger UI, eliminating the need for any prior SQL knowledge. 
  • NLP-to-SQL conversion: GPT-4о generates SQL queries based on the user’s question and the selected tables, returning both the query and its result in JSON format. 
  • Intelligent SQL generation: Utilizing GPT-4о, the system converts the user’s natural language input into precise SQL queries. The process includes adapting the query based on the top N relevant tables identified during the similarity search, ensuring accurate and contextually appropriate SQL output. 

 

  • Efficient similarity search: By leveraging the pgvector extension within PostgreSQL, the system performs fast and accurate similarity searches using cosine similarity. This feature ensures that the most relevant tables are identified and used for query generation. 

When a query is made, the system performs a cosine similarity search within this vector database to identify the most relevant tables. These tables, along with the user’s query, are then used to generate a prompt for the NLP-to-SQL conversion. The prompt is processed by GPT-4о, which generates the corresponding SQL query in JSON format. Finally, this SQL query is returned to the user, ready for execution against the database.

Let’s take one example in practice:

 

How it works: 

The system begins with users inputting natural language queries, which are then processed to retrieve the relevant database schema via a direct connection. This schema is embedded using the text-embedding-ada-002 model, and the resulting vector representations are stored in a PostgreSQL Vector DB

 

 

“I need the first names, last names, and email addresses of all contacts who have the role Project Manager.”

Result:

{ “query”: “/* Retrieve first names, last names, and email addresses of all contacts who have the role Project Manager */\nSELECT first_name, last_name, true__e_mail__phantombuster \nFROM contacts \nWHERE role = ‘Project Manager’;”}

NLP-to-SQL system advantages

Increased data accessibility

The system allows non-technical users to access data using natural language without knowing SQL.

Accelerated data workflow

Queries are processed within 5-7 seconds, significantly speeding up data retrieval and analysis. 

Enhanced decision-making

Rapid access to accurate data empowers users to make more informed business decisions.

Saved resources and efforts

The tool frees up technical and other staff to focus on higher-value tasks, improving productivity.

Learn our clients’ experience

Copyright © 2024 GreenM, Inc. All rights reserved.