GITHUB_ACCESS_TOKEN environment variable to a GitHub access token to increase the rate limit and access private repositories.
Setup
The GitHub loader requires the ignore npm package as a peer dependency. Install it like this:npm
Usage
Using .gitignore Syntax
To ignore specific files, you can pass in anignorePaths array into the constructor:
Using a Different GitHub Instance
You may want to target a different GitHub instance thangithub.com, e.g. if you have a GitHub Enterprise instance for your company.
For this you need two additional parameters:
baseUrl- the base URL of your GitHub instance, so the githubUrl matches<baseUrl>/<owner>/<repo>/...apiUrl- the URL of the API endpoint of your GitHub instance
Dealing with Submodules
In case your repository has submodules, you have to decide if the loader should follow them or not. You can control this with the booleanprocessSubmodules parameter. By default, submodules are not processed.
Note that processing submodules works only in conjunction with setting the recursive parameter to true.
Stream large repository
For situations where processing large repositories in a memory-efficient manner is required. You can use theloadAsStream method to asynchronously streams documents from the entire GitHub repository.
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.