RunPod Chat Model

RunPod 채팅 모델을 시작하세요.

개요

이 가이드는 RunPod Serverless에 호스팅된 채팅 모델과 상호 작용하기 위해 LangChain ChatRunPod 클래스를 사용하는 방법을 다룹니다.

설정

패키지 설치:
```
pip install -qU langchain-runpod
```
Chat Model Endpoint 배포: RunPod Provider Guide의 설정 단계를 따라 RunPod Serverless에 호환 가능한 채팅 모델 엔드포인트를 배포하고 Endpoint ID를 얻으세요.
환경 변수 설정: RUNPOD_API_KEY와 RUNPOD_ENDPOINT_ID(또는 특정 RUNPOD_CHAT_ENDPOINT_ID)가 설정되어 있는지 확인하세요.

import getpass
import os

# Make sure environment variables are set (or pass them directly to ChatRunPod)
if "RUNPOD_API_KEY" not in os.environ:
    os.environ["RUNPOD_API_KEY"] = getpass.getpass("Enter your RunPod API Key: ")

if "RUNPOD_ENDPOINT_ID" not in os.environ:
    os.environ["RUNPOD_ENDPOINT_ID"] = input(
        "Enter your RunPod Endpoint ID (used if RUNPOD_CHAT_ENDPOINT_ID is not set): "
    )

# Optionally use a different endpoint ID specifically for chat models
# if "RUNPOD_CHAT_ENDPOINT_ID" not in os.environ:
#     os.environ["RUNPOD_CHAT_ENDPOINT_ID"] = input("Enter your RunPod Chat Endpoint ID (Optional): ")

chat_endpoint_id = os.environ.get(
    "RUNPOD_CHAT_ENDPOINT_ID", os.environ.get("RUNPOD_ENDPOINT_ID")
)
if not chat_endpoint_id:
    raise ValueError(
        "No RunPod Endpoint ID found. Please set RUNPOD_ENDPOINT_ID or RUNPOD_CHAT_ENDPOINT_ID."
    )

인스턴스화

ChatRunPod 클래스를 초기화합니다. model_kwargs를 통해 모델별 매개변수를 전달하고 폴링 동작을 구성할 수 있습니다.

from langchain_runpod import ChatRunPod

chat = ChatRunPod(
    runpod_endpoint_id=chat_endpoint_id,  # Specify the correct endpoint ID
    model_kwargs={
        "max_new_tokens": 512,
        "temperature": 0.7,
        "top_p": 0.9,
        # Add other parameters supported by your endpoint handler
    },
    # Optional: Adjust polling
    # poll_interval=0.2,
    # max_polling_attempts=150
)

호출

표준 LangChain .invoke() 및 .ainvoke() 메서드를 사용하여 모델을 호출합니다. .stream() 및 .astream()을 통한 스트리밍도 지원됩니다(RunPod /stream 엔드포인트를 폴링하여 시뮬레이션됨).

from langchain.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="What is the RunPod Serverless API flow?"),
]

# Invoke (Sync)
try:
    response = chat.invoke(messages)
    print("--- Sync Invoke Response ---")
    print(response.content)
except Exception as e:
    print(
        f"Error invoking Chat Model: {e}. Ensure endpoint ID/API key are correct and endpoint is active/compatible."
    )

# Stream (Sync, simulated via polling /stream)
print("\n--- Sync Stream Response ---")
try:
    for chunk in chat.stream(messages):
        print(chunk.content, end="", flush=True)
    print()  # Newline
except Exception as e:
    print(
        f"\nError streaming Chat Model: {e}. Ensure endpoint handler supports streaming output format."
    )

### Async Usage

# AInvoke (Async)
try:
    async_response = await chat.ainvoke(messages)
    print("--- Async Invoke Response ---")
    print(async_response.content)
except Exception as e:
    print(f"Error invoking Chat Model asynchronously: {e}.")

# AStream (Async)
print("\n--- Async Stream Response ---")
try:
    async for chunk in chat.astream(messages):
        print(chunk.content, end="", flush=True)
    print()  # Newline
except Exception as e:
    print(
        f"\nError streaming Chat Model asynchronously: {e}. Ensure endpoint handler supports streaming output format.\n"
    )

체이닝

채팅 모델은 LangChain Expression Language (LCEL) 체인과 원활하게 통합됩니다.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant."),
        ("human", "{input}"),
    ]
)

parser = StrOutputParser()

chain = prompt | chat | parser

try:
    chain_response = chain.invoke(
        {"input": "Explain the concept of serverless computing in simple terms."}
    )
    print("--- Chain Response ---")
    print(chain_response)
except Exception as e:
    print(f"Error running chain: {e}")


# Async chain
try:
    async_chain_response = await chain.ainvoke(
        {"input": "What are the benefits of using RunPod for AI/ML workloads?"}
    )
    print("--- Async Chain Response ---")
    print(async_chain_response)
except Exception as e:
    print(f"Error running async chain: {e}")

모델 기능 (엔드포인트 의존적)

고급 기능의 가용성은 RunPod 엔드포인트 핸들러의 특정 구현에 크게 의존합니다. ChatRunPod 통합은 기본 프레임워크를 제공하지만, 핸들러가 기본 기능을 지원해야 합니다.

기능	통합 지원	엔드포인트 의존적?	비고
도구 호출	❌	✅	핸들러가 도구 정의를 처리하고 도구 호출을 반환해야 함 (예: OpenAI 형식). 통합에 파싱 로직 필요.
구조화된 출력	❌	✅	핸들러가 구조화된 출력 강제를 지원해야 함 (JSON 모드, 함수 호출). 통합에 파싱 로직 필요.
JSON 모드	❌	✅	핸들러가 `json_mode` 매개변수 (또는 유사한 것)를 받아들이고 JSON 출력을 보장해야 함.
이미지 입력	❌	✅	이미지 데이터를 받아들이는 멀티모달 핸들러 필요 (예: base64). 통합은 멀티모달 메시지를 지원하지 않음.
오디오 입력	❌	✅	오디오 데이터를 받아들이는 핸들러 필요. 통합은 오디오 메시지를 지원하지 않음.
비디오 입력	❌	✅	비디오 데이터를 받아들이는 핸들러 필요. 통합은 비디오 메시지를 지원하지 않음.
토큰 수준 스트리밍	✅ (시뮬레이션)	✅	`/stream`을 폴링. 핸들러가 상태 응답에서 토큰 청크로 `stream` 목록을 채워야 함 (예: `[{"output": "token"}]`). 실제 저지연 스트리밍은 기본 제공되지 않음.
네이티브 비동기	✅	✅	핵심 `ainvoke`/`astream` 구현됨. 엔드포인트 핸들러 성능에 의존.
토큰 사용량	❌	✅	핸들러가 최종 응답에서 `prompt_tokens`, `completion_tokens`를 반환해야 함. 통합이 현재 이것을 파싱하지 않음.
로그 확률	❌	✅	핸들러가 로그 확률을 반환해야 함. 통합이 현재 이것을 파싱하지 않음.

주요 내용: 엔드포인트가 기본 RunPod API 규칙을 따르는 경우 표준 채팅 호출 및 시뮬레이션된 스트리밍이 작동합니다. 고급 기능은 특정 핸들러 구현이 필요하며 이 통합 패키지를 확장하거나 사용자 지정해야 할 수 있습니다.

API 레퍼런스

ChatRunPod 클래스, 매개변수 및 메서드에 대한 자세한 문서는 소스 코드 또는 생성된 API 레퍼런스(사용 가능한 경우)를 참조하세요. 소스 코드 링크: https://github.com/runpod/langchain-runpod/blob/main/langchain_runpod/chat_models.py

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

개요

설정

인스턴스화

호출

체이닝

모델 기능 (엔드포인트 의존적)

API 레퍼런스

Popular Providers

Integrations by component

​개요

​설정

​인스턴스화

​호출

​체이닝

​모델 기능 (엔드포인트 의존적)

​API 레퍼런스

개요

설정

인스턴스화

호출

체이닝

모델 기능 (엔드포인트 의존적)

API 레퍼런스