Docx files

The DocxLoader allows you to extract text data from Microsoft Word documents. It supports both the modern .docx format and the legacy .doc format. Depending on the file type, additional dependencies are required.

Setup

To use DocxLoader, you’ll need the @langchain/community integration along with either mammoth or word-extractor package:

mammoth: For processing .docx files.
word-extractor: For handling .doc files.

Installation

For `.docx` Files

npm

npm install @langchain/community @langchain/core mammoth

For `.doc` Files

npm

npm install @langchain/community @langchain/core word-extractor

Usage

Loading `.docx` Files

For .docx files, there is no need to explicitly specify any parameters when initializing the loader:

import { DocxLoader } from "@langchain/community/document_loaders/fs/docx";

const loader = new DocxLoader(
  "src/document_loaders/tests/example_data/attention.docx"
);

const docs = await loader.load();

Loading `.doc` Files

For .doc files, you must explicitly specify the type as doc when initializing the loader:

import { DocxLoader } from "@langchain/community/document_loaders/fs/docx";

const loader = new DocxLoader(
  "src/document_loaders/tests/example_data/attention.doc",
  {
    type: "doc",
  }
);

const docs = await loader.load();

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

General integrations

RAG integrations

Setup

Installation

For `.docx` Files

For `.doc` Files

Usage

Loading `.docx` Files

Loading `.doc` Files

Popular Providers

General integrations

RAG integrations

​Setup

​Installation

​For .docx Files

​For .doc Files

​Usage

​Loading .docx Files

​Loading .doc Files

Setup

Installation

For `.docx` Files

For `.doc` Files

Usage

Loading `.docx` Files

Loading `.doc` Files