Chat with your SQL Database using Llama 3
Chat with your SQL database using llama3 using Vanna.ai
On 18th April Meta released their open-source Large Language Model called Llama 3. This post is about how using Ollama and Vanna.ai you can build a SQL chat-bot powered by Llama 3.
Looking for someone to solve your problem related to Large Language Models & data science? Click here:
https://form.jotform.com/240744327173051
Getting Started
#importing Ollama, which allows users to connect with any open-source LLM
from vanna.ollama import Ollama
from vanna.vannadb import VannaDB_VectorStore
class MyVanna(VannaDB_VectorStore, Ollama):
def __init__(self, config=None):
MY_VANNA_MODEL = # Your model name from https://vanna.ai/account/profile
VannaDB_VectorStore.__init__(self, vanna_model=MY_VANNA_MODEL, vanna_api_key=MY_VANNA_API_KEY, config=config)
Ollama.__init__(self, config=config)
# use llama3:70B for the 70B model
vn = MyVanna(config={'model': 'llama3'})
Connecting to a Database
Vanna has built-in connectors for these 8 databases (you can connect to other database with a few additional lines of code):
- Postgres SQL
- Oracle
- DuckDB
- MySQL
- SQLite
- Big Query
- Snowflake
- Microsoft SQL
By looking at the documentation, you can figure out how to connect your specific database. For the purposes of this post, I would be connecting to DuckDB StackOverFlow Database. The database can found here!
#This is how you can connect to a DuckDB database
vn.connect_to_duckdb(url='motherduck:[<database_name>]?motherduck_token=<token>&saas_mode=true')
Training
Training on Plan (Information Schema)
# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS")
# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)
plan
# If you like the plan, then uncomment this and run it to train
vn.train(plan=plan)
Training on DDL
# In duckDB the describe statement can fetch the DDL for any table
vn.train(ddl="DESCRIBE SELECT * FROM Stackoverflow.users;")
Training on SQL statements
# here is an example of training on SQL statements
vn.train(
question="What are the top 10 users with highest amount of Badges?"
,sql="""SELECT UserId, COUNT(*) AS badge_count
FROM stackoverflow.main.badges
GROUP BY UserId
ORDER BY badge_count DESC
LIMIT 10
""")
# Another example
vn.train(
question="What is the difference in total answers for the user who answered the most answers and the user who answered the least questions?",
,sql="SELECT MAX(answer_count) - MIN(answer_count) AS difference
FROM (
SELECT OwnerUserId, COUNT(*) AS answer_count
FROM stackoverflow.main.posts
WHERE PostTypeId = 2
GROUP BY OwnerUserId
) AS answer_counts;
")
Training on Documentation
# You can feed in contextual information using documentation
vn.train(documentation="We call the user with the highest answers in a year the Grand master")
You can view your training data using vn.get_training_data()
# vn.ask is runs these following functions in sequence, which can be run individually
# 1. vn.generate_ql
# 2. vn.run_sql
# 3. vn.generate_plotly_code
# 4. vn.get_plotly_figure
# this is how you can ask Vanna question's post training
vn.ask('Find the top 10 users with the highest amount of Badges?')
Using the Flask App
Vanna comes with a built-in UI Flask app. Which can be launched inside a jupyter notebook or Python script.
from vanna.flask import VannaFlaskApp
app = VannaFlaskApp(vn)
app.run()
This is how you can use LLaMa 3 for text-to-SQL.