YouTube is an online video sharing and social media platform created by Google.This notebook covers how to load documents from
YouTube transcripts.
Add video info
Add language preferences
Language param : It’s a list of language codes in a descending priority,en by default.
translation param : It’s a translate preference, you can translate available transcript to your preferred language.
Get transcripts as timestamped chunks
Get one or moreDocument objects, each containing a chunk of the video transcript. The length of the chunks, in seconds, may be specified. Each chunk’s metadata includes a URL of the video on YouTube, which will start the video at the beginning of the specific chunk.
transcript_format param: One of the langchain_community.document_loaders.youtube.TranscriptFormat values. In this case, TranscriptFormat.CHUNKS.
chunk_size_seconds param: An integer number of video seconds to be represented by each chunk of transcript data. Default is 120 seconds.
YouTube loader from Google Cloud
Prerequisites
- Create a Google Cloud project or use an existing project
- Enable the Youtube Api
- Authorize credentials for desktop app
pip install -U google-api-python-client google-auth-httplib2 google-auth-oauthlib youtube-transcript-api
🧑 Instructions for ingesting your Google Docs data
By default, theGoogleDriveLoader expects the credentials.json file to be ~/.credentials/credentials.json, but this is configurable using the credentials_file keyword argument. Same thing with token.json. Note that token.json will be created automatically the first time you use the loader.
GoogleApiYoutubeLoader can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:
Note depending on your set up, the service_account_path needs to be set up. See here for more details.
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.