Azure Blob Storage is Microsoft’s object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn’t adhere to a particular data model or definition, such as text or binary data.
Azure Blob Storage is designed for:
- Serving images or documents directly to a browser.
- Storing files for distributed access.
- Streaming video and audio.
- Writing to log files.
- Storing data for backup and restore, disaster recovery, and archiving.
- Storing data for analysis by an on-premises or Azure-hosted service.
Azure Blob Storage. For more detailed documentation on the document loader, see the Azure Blob Storage Loader API Reference.
It is recommended to use this new loader over the previous
AzureBlobStorageFileLoader and AzureBlobStorageContainerLoader from langchain_community. For detailed instructions on migrating to the new loader, refer to the migration guideSetup
Load from container
TheAzureBlobStorageLoader loads all blobs from a given container in Azure Blob Storage and requires an account URL and container name. The loader returns Document objects containing the blob content (defaulting to UTF-8 encoding) and metadata including the blob URL, as shown in the example below.
No explicit credential configuration is needed, as it uses DefaultAzureCredential, which automatically retrieves Microsoft Entra ID tokens based on your current environment.
Load from container by blob name
You can load documents from a list of blob names, which uses only the blobs provided instead of an API call to list blobs.Override default credentials
By default, the document loader uses theDefaultAzureCredential. The examples below show how to override this:
Customize blob content parsing
Currently, the default when parsing each blob is to return the content as a singleDocument object with UTF-8 encoding regardless of the file type. For file types that require specific parsing (e.g., PDFs, CSVs, etc.) or when you want to control the document content format, you can provide the loader_factory argument to take in an already existing document loader (e.g., PyPDFLoader, CSVLoader, etc.) or a customized loader.
This works by downloading the blob content to a temporary file. The loader_factory then gets called with the filepath to use the specified document loader to load/parse the file and return the Document object(s).
Below shows how to override the default loader used to parse blobs as PDFs using the PyPDFLoader:
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.