Xinference is a powerful and versatile library designed to serve LLMs,
speech recognition models, and multimodal models, even on your laptop.
With Xorbits Inference, you can effortlessly deploy and serve your or
state-of-the-art built-in models using just a single command.
Installation and Setup
Xinference can be installed via pip from PyPI:LLM
Xinference supports various models compatible with GGML, including chatglm, baichuan, whisper, vicuna, and orca. To view the built-in models, run the command:Wrapper for Xinference
You can start a local instance of Xinference by running:Usage
For more information and detailed examples, refer to the example for xinference LLMsEmbeddings
Xinference also supports embedding queries and documents. See example for xinference embeddings for a more detailed demo.Xinference LangChain partner package install
Install the integration package with:Chat Models
LLM
Connect these docs programmatically to Claude, VSCode, and more via MCP for    real-time answers.