Apify Dataset is a scalable append-only storage with sequential access built for storing structured web scraping results, such as a list of products or Google SERPs, and then export them to various formats like JSON, CSV, or Excel. Datasets are mainly used to save results of Apify Actors—serverless cloud programs for various web scraping, crawling, and data extraction use cases.This notebook shows how to load Apify datasets to LangChain.
Prerequisites
You need to have an existing dataset on the Apify platform. This example shows how to load a dataset produced by the Website Content Crawler.ApifyDatasetLoader into your source code:
Document format.
For example, if your dataset items are structured like this:
Document format, so that you can use them further with any LLM model (e.g. for question answering).
An example with question answering
In this example, we use data from a dataset to answer a question.Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.