Overview
Background
Text-to-SQL is a common use-case for LLMs, especially useful for chatbots that work with structured data, such as CSV files or databases like Postgres, Snowflake, and Redshift.
This method works by having the LLM convert a user’s question into an SQL query. For example:
- A user queries: “How many customers are there in each US state?”
- The LLM generates an SQL statement:
SELECT state, COUNT(*) FROM customers GROUP BY state
- The SQL command is executed on the database.
- Results from the database are then displayed to the user.
An additional step is possible where the LLM can interpret the SQL results and provide a summary in plain English.
Text-to-SQL Risk
While Text-to-SQL is highly useful, its biggest risk is that attackers can misuse it to modify SQL queries, potentially leading to unauthorized access or data manipulation.
The potential threats in Text-to-SQL systems include:
- Database Manipulation: Attackers can craft prompts leading to SQL commands like INSERT, UPDATE, DELETE, DROP, or other forms of db manipulation. This might result in data corruption or loss.
- Data Leakage: Attackers can form prompts that result in unauthorized access to sensitive, restricted data.
- Sandbox Escaping: By crafting specific prompts, attackers might be able to run code on the host machine, sidestepping security protocols.
- Denial of Service (DoS): Through specially designed prompts, attackers can generate SQL queries that overburden the system, causing severe slowdowns or crashes.
It’s important to note that long-running queries could also occur accidentally by legitimate users, which can significantly impact the user experience.
Mitigation
The policies in this category are designed to automatically inspect and review SQL code generated by LLMs, ensuring security and preventing risks. This includes:
- Database Manipulation Prevention: Block any SQL command that could result in unauthorized data modification, including INSERT, UPDATE, DELETE, CREATE, DROP, and others.
- Restrict Data Access: Access is limited to certain tables and columns using an allowlist or blocklist. This secures sensitive data within the database.
- Prevent Script Execution: Block the execution of any non-SQL code, for example, scripts executed via the PL/Python extension. This step is crucial in preventing the running of harmful scripts.
- DoS Prevention: Block SQL elements that could lead to long-running or resource-intensive queries, including excessive joins, recursive CTEs, making sure there’s a LIMIT clause, and so on.
Policies
Allowed Tables
Detects SQL operations on tables that are not within the limits set in the policy.
Restrcted Tables
Detects the generation of SQL statements with access to specific tables that are considered sensitive.
Load Limit
Detects SQL statements that are likely to cause significant system load and affect performance.
Read-Only Access
Detects any attempt to use SQL operations that require more than read-only access.