Dedoc is an open-source library/service that extracts texts, tables, attached files and document structure (e.g., titles, list items, etc.) from files of various formats.
Dedoc supports DOCX, XLSX, PPTX, EML, HTML, PDF, images and more.
Full list of supported formats can be found here.
Installation and Setup
Dedoc library
You can installDedoc using pip.
In this case, you will need to install dependencies,
please go here
to get more information.
Dedoc API
If you are going to useDedoc API, you don’t need to install dedoc library.
In this case, you should run the Dedoc service, e.g. Docker container (please see
the documentation
for more details):
Document Loader
-
For handling files of any formats (supported by
Dedoc), you can useDedocFileLoader: -
For handling PDF files (with or without a textual layer), you can use
DedocPDFLoader: -
For handling files of any formats without library installation,
you can use
Dedoc APIwithDedocAPIFileLoader:
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.