LlamaEdge

LlamaEdge를 사용하면 GGUF 형식의 LLM과 로컬 및 채팅 서비스를 통해 채팅할 수 있습니다.

LlamaEdgeChatService는 개발자에게 HTTP 요청을 통해 LLM과 채팅할 수 있는 OpenAI API 호환 서비스를 제공합니다.
LlamaEdgeChatLocal은 개발자가 로컬에서 LLM과 채팅할 수 있도록 합니다 (곧 제공 예정).

LlamaEdgeChatService와 LlamaEdgeChatLocal 모두 WasmEdge Runtime이 구동하는 인프라에서 실행되며, LLM 추론 작업을 위한 경량 및 휴대용 WebAssembly 컨테이너 환경을 제공합니다.

API Service를 통한 채팅

LlamaEdgeChatService는 llama-api-server에서 작동합니다. llama-api-server quick-start의 단계를 따라 자체 API 서비스를 호스팅하면 인터넷만 사용 가능하면 언제 어디서나 원하는 모델과 채팅할 수 있습니다.

from langchain_community.chat_models.llama_edge import LlamaEdgeChatService
from langchain.messages import HumanMessage, SystemMessage

비스트리밍 모드에서 LLM과 채팅

# service url
service_url = "https://b008-54-186-154-209.ngrok-free.app"

# create wasm-chat service instance
chat = LlamaEdgeChatService(service_url=service_url)

# create message sequence
system_message = SystemMessage(content="You are an AI assistant")
user_message = HumanMessage(content="What is the capital of France?")
messages = [system_message, user_message]

# chat with wasm-chat service
response = chat.invoke(messages)

print(f"[Bot] {response.content}")

[Bot] Hello! The capital of France is Paris.

스트리밍 모드에서 LLM과 채팅

# service url
service_url = "https://b008-54-186-154-209.ngrok-free.app"

# create wasm-chat service instance
chat = LlamaEdgeChatService(service_url=service_url, streaming=True)

# create message sequence
system_message = SystemMessage(content="You are an AI assistant")
user_message = HumanMessage(content="What is the capital of Norway?")
messages = [
    system_message,
    user_message,
]

output = ""
for chunk in chat.stream(messages):
    # print(chunk.content, end="", flush=True)
    output += chunk.content

print(f"[Bot] {output}")

[Bot]   Hello! I'm happy to help you with your question. The capital of Norway is Oslo.

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

API Service를 통한 채팅

비스트리밍 모드에서 LLM과 채팅

스트리밍 모드에서 LLM과 채팅

Popular Providers

Integrations by component

​API Service를 통한 채팅

​비스트리밍 모드에서 LLM과 채팅

​스트리밍 모드에서 LLM과 채팅

API Service를 통한 채팅

비스트리밍 모드에서 LLM과 채팅

스트리밍 모드에서 LLM과 채팅