Skip to main content
Needle은 최소한의 노력으로 RAG 파이프라인을 쉽게 생성할 수 있도록 합니다. 자세한 내용은 API 문서를 참조하세요.

개요

Needle Document Loader는 Needle 컬렉션을 LangChain과 통합하기 위한 유틸리티입니다. Retrieval-Augmented Generation (RAG) 워크플로우를 위한 문서의 원활한 저장, 검색 및 활용을 가능하게 합니다. 이 예제는 다음을 보여줍니다:
  • Needle 컬렉션에 문서 저장하기
  • 문서를 가져오는 리트리버 설정하기
  • Retrieval-Augmented Generation (RAG) 파이프라인 구축하기

설정

시작하기 전에 다음 환경 변수가 설정되어 있는지 확인하세요:
  • NEEDLE_API_KEY: Needle 인증을 위한 API 키
  • OPENAI_API_KEY: 언어 모델 작업을 위한 OpenAI API 키
import os
os.environ["NEEDLE_API_KEY"] = ""
os.environ["OPENAI_API_KEY"] = ""

초기화

NeedleLoader를 초기화하려면 다음 매개변수가 필요합니다:
  • needle_api_key: Needle API 키 (또는 환경 변수로 설정)
  • collection_id: 작업할 Needle 컬렉션의 ID

인스턴스화

from langchain_community.document_loaders.needle import NeedleLoader

collection_id = "clt_01J87M9T6B71DHZTHNXYZQRG5H"

# 컬렉션에 문서를 저장하기 위해 NeedleLoader 초기화
document_loader = NeedleLoader(
    needle_api_key=os.getenv("NEEDLE_API_KEY"),
    collection_id=collection_id,
)

로드

Needle 컬렉션에 파일을 추가하려면:
files = {
    "tech-radar-30.pdf": "https://www.thoughtworks.com/content/dam/thoughtworks/documents/radar/2024/04/tr_technology_radar_vol_30_en.pdf"
}

document_loader.add_files(files=files)
# 컬렉션의 문서 표시
# collections_documents = document_loader.load()

지연 로드

lazy_load 메서드를 사용하면 Needle 컬렉션에서 문서를 반복적으로 로드할 수 있으며, 가져온 각 문서를 생성합니다:
# 컬렉션의 문서 표시
# collections_documents = document_loader.lazy_load()

사용법

체인 내에서 사용

다음은 체인 내에서 Needle을 사용하여 RAG 파이프라인을 설정하는 완전한 예제입니다:
import os

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.retrievers.needle import NeedleRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)

# Needle 리트리버 초기화 (Needle API 키가 환경 변수로 설정되어 있는지 확인)
retriever = NeedleRetriever(
    needle_api_key=os.getenv("NEEDLE_API_KEY"),
    collection_id="clt_01J87M9T6B71DHZTHNXYZQRG5H",
)

# 어시스턴트를 위한 시스템 프롬프트 정의
system_prompt = """
    You are an assistant for question-answering tasks.
    Use the following pieces of retrieved context to answer the question.
    If you don't know, say so concisely.\n\n{context}
    """

prompt = ChatPromptTemplate.from_messages(
    [("system", system_prompt), ("human", "{input}")]
)

# 문서 체인(stuff chain)과 리트리버를 사용하여 질문-응답 체인 정의
question_answer_chain = create_stuff_documents_chain(llm, prompt)

# 리트리버와 질문-응답 체인을 결합하여 RAG (Retrieval-Augmented Generation) 체인 생성
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# 입력 쿼리 정의
query = {"input": "Did RAG move to accepted?"}

response = rag_chain.invoke(query)

response
{'input': 'Did RAG move to accepted?',
 'context': [Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.')],
 'answer': 'Yes, RAG has been adopted as the preferred pattern for improving the quality of responses generated by a large language model.'}

API 레퍼런스

모든 Needle 기능 및 구성에 대한 자세한 문서는 API 레퍼런스를 참조하세요: docs.needle-ai.com
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
I