Apify Actors are cloud programs designed for a wide range of web scraping, crawling, and data extraction tasks. These actors facilitate automated data gathering from the web, enabling users to extract, process, and store information efficiently. Actors can be used to perform tasks like scraping e-commerce sites for product details, monitoring price changes, or gathering search engine results. They integrate seamlessly with Apify Datasets, allowing the structured data collected by actors to be stored, managed, and exported in formats like JSON, CSV, or Excel for further analysis or use.
Overview
This notebook walks you through using Apify Actors with LangChain to automate web scraping and data extraction. Thelangchain-apify package integrates Apify’s cloud-based tools with LangChain agents, enabling efficient data collection and processing for AI applications.
Setup
This integration lives in the langchain-apify package. The package can be installed using pip.Prerequisites
- Apify account: Register your free Apify account here.
- Apify API token: Learn how to get your API token in the Apify documentation.
Instantiation
Here we instantiate theApifyActorsTool to be able to call RAG Web Browser Apify Actor. This Actor provides web browsing functionality for AI and LLM applications, similar to the web browsing feature in ChatGPT. Any Actor from the Apify Store can be used in this way.
Invocation
TheApifyActorsTool takes a single argument, which is run_input - a dictionary that is passed as a run input to the Actor. Run input schema documentation can be found in the input section of the Actor details page. See RAG Web Browser input schema.
Chaining
We can provide the created tool to an agent. When asked to search for information, the agent will call the Apify Actor, which will search the web, and then retrieve the search results.API reference
For more information on how to use this integration, see the git repository or the Apify integration documentation.Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.