Overview
Integration details
| Class | Package | Local | Serializable | JS support |
|---|---|---|---|---|
| PowerScaleDocumentLoader | powerscale-rag-connector | ✅ | ❌ | ❌ |
| PowerScaleUnstructuredLoader | powerscale-rag-connector | ✅ | ❌ | ❌ |
Loader features
| Source | Document Lazy Loading | Native Async Support |
|---|---|---|
| PowerScaleDocumentLoader | ✅ | ✅ |
| PowerScaleUnstructuredLoader | ✅ | ✅ |
Setup
This document loader requires the use of a Dell PowerScale system with MetadataIQ enabled. Additional information can be found on our github page: https://github.com/dell/powerscale-rag-connectorInstallation
The document loader lives in an external pip package and can be installed using standard toolingInitialization
Now we can instantiate document loader:Generic Document Loader
Our generic document loader can be used to incrementally load all files from PowerScale in the following manner:UnstructuredLoader Loader
Optionally, thePowerScaleUnstructuredLoader can be used to locate the changed files and automatically process the files producing elements of the source file. This is done using LangChain’s UnstructuredLoader class.
es_host_urlis the endpoint to MetadataIQ Elasticsearch databasees_index_indexis the name of the index where PowerScale writes it file system metadataes_api_keyis the encoded version of your elasticsearch API keyfolder_pathis the path on PowerScale to be queried for changes
Load
Internally, all code is asynchronous with PowerScale and MetadataIQ and the load and lazy load methods will return a python generator. We recommend using the lazy load function.Returned Object
Both document loaders will keep track of what files were previously returned to your application. When called again, the document loader will only return new or modified files since your previous run.- The
metadatafields in the returnedDocumentwill return the path on PowerScale that contains the modified file. You will use this path to read the data via NFS (or S3) and process the data in your application (e.g.: create chunks and embedding). - The
sourcefield is the path on PowerScale and not necessarily on your local system (depending on your mount strategy); OneFS expresses the entire storage system as a single tree rooted at/ifs. - The
change_typesproperty will inform you on what change occurred since the last one - e.g.: new, modified or delete.
change_types to add, update or delete entries your chunk and vector store.
When using PowerScaleUnstructuredLoader the page_content field will be filled with data from the Unstructured Loader
Lazy Load
Internally, all code is asynchronous with PowerScale and MetadataIQ and the load and lazy load methods will return a python generator. We recommend using the lazy load function.Document is returned as the load function with all the same properties mentioned above.
Additional Examples
Additional examples and code can be found on our public github webpage: https://github.com/dell/powerscale-rag-connector/tree/main/examples that provide full working examples.- PowerScale LangChain Document Loader - Working example of our standard document loader
- PowerScale LangChain Unstructured Loader - Working example of our standard document loader using unstructured loader for chunking and embedding
- PowerScale NVIDIA Retriever Microservice Loader - Working example of our document loader with NVIDIA NeMo Retriever microservices for chunking and embedding
API reference
For detailed documentation of all PowerScale Document Loader features and configurations head to the github page: https://github.com/dell/powerscale-rag-connector/Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.