SQL 에이전트 구축하기

개요

이 튜토리얼에서는 LangChain 에이전트를 사용하여 SQL 데이터베이스에 대한 질문에 답변할 수 있는 에이전트를 구축하는 방법을 배웁니다. 높은 수준에서 에이전트는 다음 작업을 수행합니다:

데이터베이스에서 사용 가능한 테이블과 스키마 가져오기

질문과 관련된 테이블 결정하기

관련 테이블의 스키마 가져오기

질문과 스키마 정보를 기반으로 쿼리 생성하기

LLM을 사용하여 일반적인 실수가 있는지 쿼리 재확인하기

쿼리 실행하고 결과 반환하기

쿼리가 성공할 때까지 데이터베이스 엔진에서 발생한 오류 수정하기

결과를 기반으로 응답 작성하기

SQL 데이터베이스에 대한 Q&A 시스템을 구축하려면 모델이 생성한 SQL 쿼리를 실행해야 합니다. 이 작업에는 본질적인 위험이 있습니다. 에이전트의 필요에 따라 데이터베이스 연결 권한을 항상 가능한 한 좁게 범위를 지정해야 합니다. 이렇게 하면 모델 기반 시스템 구축의 위험을 완전히 제거할 수는 없지만 완화할 수 있습니다.

개념

다음 개념을 다룹니다:

SQL 데이터베이스에서 읽기 위한 도구
LangChain 에이전트
Human-in-the-loop 프로세스

설정

설치

pip install langchain  langgraph  langchain-community

LangSmith

LangSmith를 설정하여 체인이나 에이전트 내부에서 어떤 일이 발생하는지 검사하세요. 그런 다음 다음 환경 변수를 설정합니다:

export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."

1. LLM 선택하기

도구 호출을 지원하는 모델을 선택하세요:

OpenAI
Anthropic
Azure
Google Gemini
AWS Bedrock

👉 Read the OpenAI chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("openai:gpt-4.1")

👉 Read the Anthropic chat model integration docs

pip install -U "langchain[anthropic]"

import os
from langchain.chat_models import init_chat_model

os.environ["ANTHROPIC_API_KEY"] = "sk-..."

model = init_chat_model("anthropic:claude-sonnet-4-5")

👉 Read the Azure chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."
os.environ["OPENAI_API_VERSION"] = "2025-03-01-preview"

model = init_chat_model(
    "azure_openai:gpt-4.1",
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
)

👉 Read the Google GenAI chat model integration docs

pip install -U "langchain[google-genai]"

import os
from langchain.chat_models import init_chat_model

os.environ["GOOGLE_API_KEY"] = "..."

model = init_chat_model("google_genai:gemini-2.5-flash-lite")

👉 Read the AWS Bedrock chat model integration docs

pip install -U "langchain[aws]"

from langchain.chat_models import init_chat_model

# Follow the steps here to configure your credentials:
# https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html

model = init_chat_model(
    "anthropic.claude-3-5-sonnet-20240620-v1:0",
    model_provider="bedrock_converse",
)

아래 예제에 표시된 출력은 OpenAI를 사용했습니다.

2. 데이터베이스 구성하기

이 튜토리얼에서는 SQLite 데이터베이스를 생성합니다. SQLite는 설정하고 사용하기 쉬운 경량 데이터베이스입니다. 디지털 미디어 스토어를 나타내는 샘플 데이터베이스인 chinook 데이터베이스를 로드합니다. 편의를 위해 공개 GCS 버킷에 데이터베이스(Chinook.db)를 호스팅했습니다.

import requests, pathlib

url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db"
local_path = pathlib.Path("Chinook.db")

if local_path.exists():
    print(f"{local_path} already exists, skipping download.")
else:
    response = requests.get(url)
    if response.status_code == 200:
        local_path.write_bytes(response.content)
        print(f"File downloaded and saved as {local_path}")
    else:
        print(f"Failed to download the file. Status code: {response.status_code}")

데이터베이스와 상호작용하기 위해 langchain_community 패키지에서 제공하는 편리한 SQL 데이터베이스 래퍼를 사용합니다. 이 래퍼는 SQL 쿼리를 실행하고 결과를 가져오는 간단한 인터페이스를 제공합니다:

from langchain_community.utilities import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///Chinook.db")

print(f"Dialect: {db.dialect}")
print(f"Available tables: {db.get_usable_table_names()}")
print(f'Sample output: {db.run("SELECT * FROM Artist LIMIT 5;")}')

Dialect: sqlite
Available tables: ['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']
Sample output: [(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains')]

3. 데이터베이스 상호작용을 위한 도구 추가하기

데이터베이스와 상호작용하기 위해 langchain_community 패키지에서 제공하는 SQLDatabase 래퍼를 사용합니다. 이 래퍼는 SQL 쿼리를 실행하고 결과를 가져오는 간단한 인터페이스를 제공합니다:

from langchain_community.agent_toolkits import SQLDatabaseToolkit

toolkit = SQLDatabaseToolkit(db=db, llm=model)

tools = toolkit.get_tools()

for tool in tools:
    print(f"{tool.name}: {tool.description}\n")

sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.

sql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3

sql_db_list_tables: Input is an empty string, output is a comma-separated list of tables in the database.

sql_db_query_checker: Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!

5. `create_agent` 사용하기

create_agent를 사용하여 최소한의 코드로 ReAct 에이전트를 구축합니다. 에이전트는 요청을 해석하고 SQL 명령을 생성하며, 도구가 이를 실행합니다. 명령에 오류가 있으면 오류 메시지가 모델로 반환됩니다. 그러면 모델은 원래 요청과 새 오류 메시지를 검토하고 새 명령을 생성할 수 있습니다. 이는 LLM이 명령을 성공적으로 생성하거나 종료 횟수에 도달할 때까지 계속될 수 있습니다. 이 경우 오류 메시지와 같은 피드백을 모델에 제공하는 패턴은 매우 강력합니다. 에이전트의 동작을 사용자 정의하기 위해 설명적인 시스템 프롬프트로 에이전트를 초기화합니다:

system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.

To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.

Then you should query the schema of the most relevant tables.
""".format(
    dialect=db.dialect,
    top_k=5,
)

이제 모델, 도구 및 프롬프트로 에이전트를 생성합니다:

from langchain.agents import create_agent


agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
)

6. 에이전트 실행하기

샘플 쿼리에서 에이전트를 실행하고 동작을 관찰합니다:

question = "Which genre on average has the longest tracks?"

for step in agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

================================ Human Message =================================

Which genre on average has the longest tracks?
================================== Ai Message ==================================
Tool Calls:
  sql_db_list_tables (call_BQsWg8P65apHc8BTJ1NPDvnM)
 Call ID: call_BQsWg8P65apHc8BTJ1NPDvnM
  Args:
================================= Tool Message =================================
Name: sql_db_list_tables

Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
================================== Ai Message ==================================
Tool Calls:
  sql_db_schema (call_i89tjKECFSeERbuACYm4w0cU)
 Call ID: call_i89tjKECFSeERbuACYm4w0cU
  Args:
    table_names: Track, Genre
================================= Tool Message =================================
Name: sql_db_schema


CREATE TABLE "Genre" (
	"GenreId" INTEGER NOT NULL,
	"Name" NVARCHAR(120),
	PRIMARY KEY ("GenreId")
)

/*
3 rows from Genre table:
GenreId	Name
1	Rock
2	Jazz
3	Metal
*/


CREATE TABLE "Track" (
	"TrackId" INTEGER NOT NULL,
	"Name" NVARCHAR(200) NOT NULL,
	"AlbumId" INTEGER,
	"MediaTypeId" INTEGER NOT NULL,
	"GenreId" INTEGER,
	"Composer" NVARCHAR(220),
	"Milliseconds" INTEGER NOT NULL,
	"Bytes" INTEGER,
	"UnitPrice" NUMERIC(10, 2) NOT NULL,
	PRIMARY KEY ("TrackId"),
	FOREIGN KEY("MediaTypeId") REFERENCES "MediaType" ("MediaTypeId"),
	FOREIGN KEY("GenreId") REFERENCES "Genre" ("GenreId"),
	FOREIGN KEY("AlbumId") REFERENCES "Album" ("AlbumId")
)

/*
3 rows from Track table:
TrackId	Name	AlbumId	MediaTypeId	GenreId	Composer	Milliseconds	Bytes	UnitPrice
1	For Those About To Rock (We Salute You)	1	1	1	Angus Young, Malcolm Young, Brian Johnson	343719	11170334	0.99
2	Balls to the Wall	2	2	1	U. Dirkschneider, W. Hoffmann, H. Frank, P. Baltes, S. Kaufmann, G. Hoffmann	342562	5510424	0.99
3	Fast As a Shark	3	2	1	F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman	230619	3990994	0.99
*/
================================== Ai Message ==================================
Tool Calls:
  sql_db_query_checker (call_G64yYm6R6UauiVPCXJZMA49b)
 Call ID: call_G64yYm6R6UauiVPCXJZMA49b
  Args:
    query: SELECT Genre.Name, AVG(Track.Milliseconds) AS AverageLength FROM Track INNER JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AverageLength DESC LIMIT 5;
================================= Tool Message =================================
Name: sql_db_query_checker

SELECT Genre.Name, AVG(Track.Milliseconds) AS AverageLength FROM Track INNER JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AverageLength DESC LIMIT 5;
================================== Ai Message ==================================
Tool Calls:
  sql_db_query (call_AnO3SrhD0ODJBxh6dHMwvHwZ)
 Call ID: call_AnO3SrhD0ODJBxh6dHMwvHwZ
  Args:
    query: SELECT Genre.Name, AVG(Track.Milliseconds) AS AverageLength FROM Track INNER JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AverageLength DESC LIMIT 5;
================================= Tool Message =================================
Name: sql_db_query

[('Sci Fi & Fantasy', 2911783.0384615385), ('Science Fiction', 2625549.076923077), ('Drama', 2575283.78125), ('TV Shows', 2145041.0215053763), ('Comedy', 1585263.705882353)]
================================== Ai Message ==================================

On average, the genre with the longest tracks is "Sci Fi & Fantasy" with an average track length of approximately 2,911,783 milliseconds. This is followed by "Science Fiction," "Drama," "TV Shows," and "Comedy."

에이전트는 쿼리를 올바르게 작성하고, 쿼리를 확인하고, 실행하여 최종 응답을 알렸습니다.

위 실행의 모든 측면(수행된 단계, 호출된 도구, LLM이 본 프롬프트 등)을 LangSmith 트레이스에서 검사할 수 있습니다.

(선택 사항) Studio 사용하기

Studio는 “클라이언트 측” 루프와 메모리를 제공하므로 이를 채팅 인터페이스로 실행하고 데이터베이스를 쿼리할 수 있습니다. “데이터베이스 스키마를 알려주세요” 또는 “상위 5명의 고객에 대한 인보이스를 보여주세요”와 같은 질문을 할 수 있습니다. 생성된 SQL 명령과 결과 출력이 표시됩니다. 시작 방법에 대한 자세한 내용은 아래를 참조하세요.

Studio에서 에이전트 실행하기

이전에 언급한 패키지 외에도 다음이 필요합니다:

pip install -U langgraph-cli[inmem]>=0.4.0

실행할 디렉토리에 다음 내용으로 langgraph.json 파일이 필요합니다:

{
  "dependencies": ["."],
  "graphs": {
      "agent": "./sql_agent.py:agent",
      "graph": "./sql_agent_langgraph.py:graph"
  },
  "env": ".env"
}

sql_agent.py 파일을 생성하고 다음을 삽입합니다:

#sql_agent.py for studio
import pathlib

from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
import requests


# Initialize an LLM
model = init_chat_model("openai:gpt-4.1")

# Get the database, store it locally
url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db"
local_path = pathlib.Path("Chinook.db")

if local_path.exists():
    print(f"{local_path} already exists, skipping download.")
else:
    response = requests.get(url)
    if response.status_code == 200:
        local_path.write_bytes(response.content)
        print(f"File downloaded and saved as {local_path}")
    else:
        print(f"Failed to download the file. Status code: {response.status_code}")

db = SQLDatabase.from_uri("sqlite:///Chinook.db")

# Create the tools
toolkit = SQLDatabaseToolkit(db=db, llm=model)

tools = toolkit.get_tools()

for tool in tools:
    print(f"{tool.name}: {tool.description}\n")

# Use create_agent
system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.

To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.

Then you should query the schema of the most relevant tables.
""".format(
    dialect=db.dialect,
    top_k=5,
)

agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
)

6. Human-in-the-loop 검토 구현하기

에이전트의 SQL 쿼리가 실행되기 전에 의도하지 않은 작업이나 비효율성이 있는지 확인하는 것이 현명할 수 있습니다. LangChain 에이전트는 에이전트 도구 호출에 대한 감독을 추가하기 위해 내장된 human-in-the-loop 미들웨어를 지원합니다. sql_db_query 도구 호출 시 사람의 검토를 위해 일시 중지하도록 에이전트를 구성해 보겠습니다:

from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware 
from langgraph.checkpoint.memory import InMemorySaver 


agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
    middleware=[ 
        HumanInTheLoopMiddleware( 
            interrupt_on={"sql_db_query": True}, 
            description_prefix="Tool execution pending approval", 
        ), 
    ], 
    checkpointer=InMemorySaver(), 
)

에이전트에 체크포인터를 추가하여 실행을 일시 중지하고 재개할 수 있습니다. 이에 대한 자세한 내용과 사용 가능한 미들웨어 구성은 human-in-the-loop 가이드를 참조하세요.

에이전트를 실행하면 이제 sql_db_query 도구를 실행하기 전에 검토를 위해 일시 중지됩니다:

question = "Which genre on average has the longest tracks?"
config = {"configurable": {"thread_id": "1"}} 

for step in agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    config, 
    stream_mode="values",
):
    if "messages" in step:
        step["messages"][-1].pretty_print()
    elif "__interrupt__" in step: 
        print("INTERRUPTED:") 
        interrupt = step["__interrupt__"][0] 
        for request in interrupt.value: 
            print(request["description"]) 
    else:
        pass

...

INTERRUPTED:
Tool execution pending approval

Tool: sql_db_query
Args: {'query': 'SELECT g.Name AS Genre, AVG(t.Milliseconds) AS AvgTrackLength FROM Track t JOIN Genre g ON t.GenreId = g.GenreId GROUP BY g.Name ORDER BY AvgTrackLength DESC LIMIT 1;'}

Command를 사용하여 이 경우 쿼리를 수락하고 실행을 재개할 수 있습니다:

from langgraph.types import Command 

for step in agent.stream(
    Command(resume=[{"type": "accept"}]), 
    config,
    stream_mode="values",
):
    if "messages" in step:
        step["messages"][-1].pretty_print()
    elif "__interrupt__" in step:
        print("INTERRUPTED:")
        interrupt = step["__interrupt__"][0]
        for request in interrupt.value:
            print(request["description"])
    else:
        pass

================================== Ai Message ==================================
Tool Calls:
  sql_db_query (call_7oz86Epg7lYRqi9rQHbZPS1U)
 Call ID: call_7oz86Epg7lYRqi9rQHbZPS1U
  Args:
    query: SELECT Genre.Name, AVG(Track.Milliseconds) AS AvgDuration FROM Track JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AvgDuration DESC LIMIT 5;
================================= Tool Message =================================
Name: sql_db_query

[('Sci Fi & Fantasy', 2911783.0384615385), ('Science Fiction', 2625549.076923077), ('Drama', 2575283.78125), ('TV Shows', 2145041.0215053763), ('Comedy', 1585263.705882353)]
================================== Ai Message ==================================

The genre with the longest average track length is "Sci Fi & Fantasy" with an average duration of about 2,911,783 milliseconds, followed by "Science Fiction" and "Drama."

자세한 내용은 human-in-the-loop 가이드를 참조하세요.

다음 단계

더 깊은 사용자 정의를 위해 LangGraph 프리미티브를 사용하여 직접 SQL 에이전트를 구현하는 이 튜토리얼을 확인하세요.

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Tutorials

Conceptual overviews

Additional resources

개요

개념

설정

설치

LangSmith

1. LLM 선택하기

2. 데이터베이스 구성하기

3. 데이터베이스 상호작용을 위한 도구 추가하기

5. `create_agent` 사용하기

6. 에이전트 실행하기

(선택 사항) Studio 사용하기

6. Human-in-the-loop 검토 구현하기

다음 단계

Tutorials

Conceptual overviews

Additional resources

​개요

​개념

​설정

​설치

​LangSmith

​1. LLM 선택하기

​2. 데이터베이스 구성하기

​3. 데이터베이스 상호작용을 위한 도구 추가하기

​5. create_agent 사용하기

​6. 에이전트 실행하기

​(선택 사항) Studio 사용하기

​6. Human-in-the-loop 검토 구현하기

​다음 단계

개요

개념

설정

설치

LangSmith

1. LLM 선택하기

2. 데이터베이스 구성하기

3. 데이터베이스 상호작용을 위한 도구 추가하기

5. `create_agent` 사용하기

6. 에이전트 실행하기

(선택 사항) Studio 사용하기

6. Human-in-the-loop 검토 구현하기

다음 단계