UnstructuredExcelLoader is used to load Microsoft Excel files. The loader works with both .xlsx and .xls files. The page content will be the raw text of the Excel file. If you use the loader in "elements" mode, an HTML representation of the Excel file will be available in the document metadata under the text_as_html key.
Please see this guide for more instructions on setting up Unstructured locally, including setting up required system dependencies.
Using Azure AI Document Intelligence
Azure AI Document Intelligence (formerly known asThis current implementation of a loader usingAzure Form Recognizer) is machine-learning based service that extracts texts (including handwriting), tables, document structures (e.g., titles, section headings, etc.) and key-value-pairs from digital or scanned PDFs, images, Office and HTML files. Document Intelligence supportsJPEG/JPG,PNG,BMP,TIFF,HEIF,DOCX,XLSX,PPTXandHTML.
Document Intelligence can incorporate content page-wise and turn it into LangChain documents. The default output format is markdown, which can be easily chained with MarkdownHeaderTextSplitter for semantic document chunking. You can also use mode="single" or mode="page" to return pure texts in a single page or document split by page.
Prerequisite
An Azure AI Document Intelligence resource in one of the 3 preview regions: East US, West US2, West Europe - follow this document to create one if you don’t have. You will be passing<endpoint> and <key> as parameters to the loader.
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.