Interface
Each document loader may define its own parameters, but they share a common API:.load(): Loads all documents at once..loadAndSplit(): Loads all documents at once and splits them into smaller documents.
By category
LangChain.js categorizes document loaders in two different ways:- File loaders, which load data into LangChain formats from your local filesystem.
- Web loaders, which load data from remote sources.
File loaders
If you’d like to contribute an integration, see Contributing integrations.
PDFs
| Document Loader | Description | Package/API |
|---|---|---|
| PDFLoader | Load and parse PDF files using pdf-parse | Package |
Common File Types
| Document Loader | Description | Package/API |
|---|---|---|
| CSV | Load data from CSV files with configurable column extraction | Package |
| JSON | Load JSON files using JSON pointer to target specific keys | Package |
| JSONLines | Load data from JSONLines/JSONL files | Package |
| Text | Load plain text files | Package |
| DOCX | Load Microsoft Word documents (.docx and .doc formats) | Package |
| EPUB | Load EPUB files with optional chapter splitting | Package |
| PPTX | Load PowerPoint presentations | Package |
| Subtitles | Load subtitle files (.srt format) | Package |
Specialized File Loaders
| Document Loader | Description | Package/API |
|---|---|---|
| DirectoryLoader | Load all files from a directory with custom loader mappings | Package |
| UnstructuredLoader | Load multiple file types using Unstructured API | API |
| MultiFileLoader | Load data from multiple individual file paths | Package |
| ChatGPT | Load ChatGPT conversation exports | Package |
| Notion Markdown | Load Notion pages exported as Markdown | Package |
| OpenAI Whisper Audio | Transcribe audio files using OpenAI Whisper API | API |
Web loaders
Webpages
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Cheerio | Load webpages using Cheerio (lightweight, no JavaScript execution) | ✅ | Package |
| Playwright | Load dynamic webpages using Playwright (supports JavaScript rendering) | ❌ | Package |
| Puppeteer | Load dynamic webpages using Puppeteer (headless Chrome) | ❌ | Package |
| FireCrawl | Crawl and convert websites into LLM-ready markdown | ✅ | API |
| Spider | Fast crawler that converts websites into HTML, markdown, or text | ✅ | API |
| RecursiveUrlLoader | Recursively load webpages following links | ❌ | Package |
| Sitemap | Load all pages from a sitemap.xml | ✅ | Package |
| Browserbase | Load webpages using managed headless browsers with stealth mode | ✅ | API |
| WebPDFLoader | Load PDF files in web environments | ✅ | Package |
Cloud Providers
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| S3 | Load files from AWS S3 buckets | ❌ | Package |
| Azure Blob Storage Container | Load all files from Azure Blob Storage container | ❌ | Package |
| Azure Blob Storage File | Load individual files from Azure Blob Storage | ❌ | Package |
| Google Cloud Storage | Load files from Google Cloud Storage buckets | ❌ | Package |
| Google Cloud SQL for PostgreSQL | Load documents from Cloud SQL PostgreSQL databases | ✅ | Package |
Productivity Tools
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Notion API | Load Notion pages and databases via API | ✅ | API |
| Figma | Load Figma file data | ✅ | API |
| Confluence | Load pages from Confluence spaces | ❌ | API |
| GitHub | Load files from GitHub repositories | ✅ | API |
| GitBook | Load GitBook documentation pages | ✅ | Package |
| Jira | Load issues from Jira projects | ❌ | API |
| Airtable | Load records from Airtable bases | ✅ | API |
| Taskade | Load Taskade project data | ✅ | API |
Search & Data APIs
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| SearchAPI | Load web search results from SearchAPI (Google, YouTube, etc.) | ✅ | API |
| SerpAPI | Load web search results from SerpAPI | ✅ | API |
| Apify Dataset | Load scraped data from Apify platform | ✅ | API |
Audio & Video
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| YouTube | Load YouTube video transcripts | ✅ | Package |
| AssemblyAI | Transcribe audio and video files using AssemblyAI API | ✅ | API |
| Sonix | Transcribe audio files using Sonix API | ❌ | API |
Other
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Couchbase | Load documents from Couchbase database using SQL++ queries | ✅ | Package |
| LangSmith | Load datasets and traces from LangSmith | ✅ | API |
| Hacker News | Load Hacker News threads and comments | ✅ | Package |
| IMSDB | Load movie scripts from Internet Movie Script Database | ✅ | Package |
| College Confidential | Load college information from College Confidential | ✅ | Package |
| Blockchain Data | Load blockchain data (NFTs, transactions) via Sort.xyz API | ✅ | API |
All document loaders
Airtable
Apify Dataset
AssemblyAI Audio Transcript
Azure Blob Storage Container
Azure Blob Storage File
Blockchain Data
Browserbase
ChatGPT
Cheerio
College Confidential
Confluence
Couchbase
CSV
DirectoryLoader
DOCX
EPUB
Figma
FireCrawl
GitHub
GitBook
Google Cloud SQL for PostgreSQL
Google Cloud Storage
Hacker News
IMSDB
Jira
JSON
JSONLines
LangSmith
MultiFileLoader
Notion API
Notion Markdown
OpenAI Whisper Audio
PDFLoader
Playwright
PPTX
Puppeteer
RecursiveUrlLoader
S3
SearchAPI
SerpAPI
Sitemap
Sonix Audio
Spider
Subtitles
Taskade
Text
UnstructuredLoader
WebPDFLoader
YouTube
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.