Chat with your SQL Database using Llama 3

Chat with your SQL database using llama3 using Vanna.ai

Arslan Shahid
4 min readApr 22, 2024
Image by Author

On 18th April Meta released their open-source Large Language Model called Llama 3. This post is about how using Ollama and Vanna.ai you can build a SQL chat-bot powered by Llama 3.

Image by the Author — Preview of Vanna Flask App
Image by Author

Looking for someone to solve your problem related to Large Language Models & data science? Click here:

https://form.jotform.com/240744327173051

Getting Started

#importing Ollama, which allows users to connect with any open-source LLM
from vanna.ollama import Ollama
from vanna.vannadb import VannaDB_VectorStore

class MyVanna(VannaDB_VectorStore, Ollama):
def __init__(self, config=None):
MY_VANNA_MODEL = # Your model name from https://vanna.ai/account/profile
VannaDB_VectorStore.__init__(self, vanna_model=MY_VANNA_MODEL, vanna_api_key=MY_VANNA_API_KEY, config=config)
Ollama.__init__(self, config=config)

# use llama3:70B for the 70B model
vn = MyVanna(config={'model': 'llama3'})

Connecting to a Database

Vanna has built-in connectors for these 8 databases (you can connect to other database with a few additional lines of code):

  1. Postgres SQL
  2. Oracle
  3. DuckDB
  4. MySQL
  5. SQLite
  6. Big Query
  7. Snowflake
  8. Microsoft SQL

By looking at the documentation, you can figure out how to connect your specific database. For the purposes of this post, I would be connecting to DuckDB StackOverFlow Database. The database can found here!

#This is how you can connect to a DuckDB database
vn.connect_to_duckdb(url='motherduck:[<database_name>]?motherduck_token=<token>&saas_mode=true')

Training

Image by Author — StackOverFlow DB schema

Training on Plan (Information Schema)

# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS")

# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)
plan

# If you like the plan, then uncomment this and run it to train
vn.train(plan=plan)

Training on DDL

# In duckDB the describe statement can fetch the DDL for any table
vn.train(ddl="DESCRIBE SELECT * FROM Stackoverflow.users;")

Training on SQL statements

# here is an example of training on SQL statements
vn.train(
question="What are the top 10 users with highest amount of Badges?"
,sql="""SELECT UserId, COUNT(*) AS badge_count
FROM stackoverflow.main.badges
GROUP BY UserId
ORDER BY badge_count DESC
LIMIT 10
""")
# Another example
vn.train(
question="What is the difference in total answers for the user who answered the most answers and the user who answered the least questions?",
,sql="SELECT MAX(answer_count) - MIN(answer_count) AS difference
FROM (
SELECT OwnerUserId, COUNT(*) AS answer_count
FROM stackoverflow.main.posts
WHERE PostTypeId = 2
GROUP BY OwnerUserId
) AS answer_counts;
")

Training on Documentation

# You can feed in contextual information using documentation
vn.train(documentation="We call the user with the highest answers in a year the Grand master")

You can view your training data using vn.get_training_data()

# vn.ask is runs these following functions in sequence, which can be run individually
# 1. vn.generate_ql
# 2. vn.run_sql
# 3. vn.generate_plotly_code
# 4. vn.get_plotly_figure

# this is how you can ask Vanna question's post training
vn.ask('Find the top 10 users with the highest amount of Badges?')
Image by Author — Result of the Query
Image by Author — Visualization created by the SQL query

Using the Flask App

Vanna comes with a built-in UI Flask app. Which can be launched inside a jupyter notebook or Python script.

from vanna.flask import VannaFlaskApp
app = VannaFlaskApp(vn)
app.run()
GIF Showcasing the FlaskApp

This is how you can use LLaMa 3 for text-to-SQL.

--

--

No responses yet