TitanML helps businesses build and deploy better, smaller, cheaper, and faster NLP models through our training, compression, and inference optimization platform.
Our inference server, Titan Takeoff enables deployment of LLMs locally on your hardware in a single command. Most embedding models are supported out of the box, if you experience trouble with a specific model, please let us know at [email protected].
Example usage
Here are some helpful examples to get started using Titan Takeoff Server. You need to make sure Takeoff Server has been started in the background before running these commands. For more information see docs page for launching Takeoff.Example 1
Basic use assuming Takeoff is running on your machine using its default ports (ie localhost:3000).Example 2
Starting readers using TitanTakeoffEmbed Python Wrapper. If you haven’t created any readers with first launching Takeoff, or you want to add another you can do so when you initialize the TitanTakeoffEmbed object. Just pass a list of models you want to start as themodels parameter.
You can use embed.query_documents to embed multiple documents at once. The expected input is a list of strings, rather than just a string expected for the embed_query method.
Connect these docs programmatically to Claude, VSCode, and more via MCP for    real-time answers.