Skip to main content
Polaris AI DataInsight is a document parser that extracts document elements (text, images, complex tables, charts, etc.) from various file formats into structured JSON, making them easy to integrate into RAG systems.

Installation

Install langchain-polaris-ai-datainsight package.
pip install langchain-polaris-ai-datainsight

Environment Setup

Make sure to set the following environment variables:

Usage

import getpass
import os

os.environ["POLARIS_AI_DATA_INSIGHT_API_KEY"] = getpass.getpass(
    "Enter your PolarisAIDataInsight API key: "
)
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader

loader = PolarisAIDataInsightLoader(
    file_path="example_data/polaris_ai_example.docx",
    resources_dir="example_data/tmp",
    mode="page",  # "element", "page", or "single". (default is "single")
)

docs = loader.load()  # or loader.lazy_load()

for doc in docs[:3]:
    print(" --------- < Page Content > --------- ")
    print(doc.page_content)
    print(" --------- < Metadata > --------- ")
    print(doc.metadata)
    print("\n")
Then, you will see the extracted content and metadata from the document as below:
--------- < Page Content > ---------
2025 Seed Program Application

I. Funding Information by Track

1. Beginning and Advanced Track Comparison Overview

<table><tbody><tr><td>Category</td><td>Beginning Track*</td><td>Advanced Track*</td></tr><tr><td>Funding target</td><td>A university located outside Korea that has a Central Grant Management Department, an existing Korean Studies infrastructure, and plans to establish an education foundation.</td><td>A non-Korean university with a Central Grant Management Department, at least one full-time Korean Studies faculty member, an undergraduate Korean Studies major or department, and commitment to supporting Korean Studies.</td></tr><tr><td>Funding period</td><td>3 years</td><td>5 years<3+2years></td></tr><tr><td>Funding size</td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 200 million</td></tr><tr><td>B</td><td>Up to KRW 50 million</td></tr></tbody></table></td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 150 million</td></tr><tr><td>B</td><td>Up to KRW 90 million</td></tr></tbody></table></td></tr><tr><td>Required project content</td><td>·	Fund 2 or more scholarship students<br>·	Offer 1 or more regular Korean Studies lecture courses (Excluding Korean language courses)<br>·	Hold 1 or more workshops per year in which that students may participate</td><td>·	Hire 1 or more Korean Studies full-time faculty<br>·	Fund 1 or more scholarship student for Korean Studies<br>·	Offer 2 or more regular graduate-level Korean Studies lecture courses (Excluding Korean language courses)<br>·	Hold 1 or more international Korean Studies conference<br>·	Establish and manage a website, blog, or social media relating to the program </td></tr><tr><td>Recommended content</td><td>·	Foster talent (education)<br>·	Establish a Korean Studies research institute/center<br>·	Establish Korean Studies undergraduate department/major & program<br>·	Develop Korean Studies textbooks<br>·	Hold academic activities</td><td>·	Foster talent (education)<br>·	Establish a Korean Studies research institute/center<br>·	Establish Korean Studies M.A/Ph.D. department/major & program<br>·	Develop Korean Studies textbooks<br>·	Hold academic activities</td></tr></tbody></table>

<img id="di.image.im12" data-category="image"/>

 2 / 3


 --------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te0': {'id': 'di.text.te0', 'type': 'text'}, 'di.text.te2': {'id': 'di.text.te2', 'type': 'text'}, 'di.table.ta9': {'id': 'di.table.ta9', 'type': 'table'}, 'di.image.im12': {'id': 'di.image.im12', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image12.png'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}


 --------- < Page Content > ---------
2025 Seed Program Application

II. Review and Selection

1. Review Process

<img id="di.image.im13" data-category="image"/>





Review of whether the basic requirements for application have been met







Review of the Project Proposal

Admistered by the Expert Review Team







Final review and decision

Admistered by the Comprehensive Review Committee



1. Preliminary Review



2. Content Review (80 pts)



3. Comprehensive Review (20 pts)

2. Review Stages and Content

Stage 1: Preliminary Review

Conducted by Main Department

●	Verifies document submission, eligibility, and overlapping support.

●	Applications missing required documents, signatures, or failing to meet eligibility do not proceed.

●	Applications with Indirect Expenses over 10% of Direct Expenses (including Labor Expenses) are rejected.

Stage 2: Content Review

Conducted by Expert Review Team

●	Online review: Points given individually

●	Panel review: Points determined by consensus

●	Assesses leadership potential, capacity, and project plans.

●	Items and scores assigned for evaluation.

<table><tbody><tr><td>Areas</td><td>Items (Points)</td><td>Content</td></tr></tbody></table>

 2 / 3


 --------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te10': {'id': 'di.text.te10', 'type': 'text'}, 'di.text.te12': {'id': 'di.text.te12', 'type': 'text'}, 'di.image.im13': {'id': 'di.image.im13', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image13.png'}, 'di.text.sh15': {'id': 'di.text.sh15', 'type': 'text'}, 'di.text.sh16': {'id': 'di.text.sh16', 'type': 'text'}, 'di.text.sh16te0': {'id': 'di.text.sh16te0', 'type': 'text'}, 'di.text.sh17': {'id': 'di.text.sh17', 'type': 'text'}, 'di.text.sh18': {'id': 'di.text.sh18', 'type': 'text'}, 'di.text.sh19': {'id': 'di.text.sh19', 'type': 'text'}, 'di.text.sh19te0': {'id': 'di.text.sh19te0', 'type': 'text'}, 'di.text.sh19te1': {'id': 'di.text.sh19te1', 'type': 'text'}, 'di.text.sh20': {'id': 'di.text.sh20', 'type': 'text'}, 'di.text.sh21': {'id': 'di.text.sh21', 'type': 'text'}, 'di.text.sh22': {'id': 'di.text.sh22', 'type': 'text'}, 'di.text.sh22te0': {'id': 'di.text.sh22te0', 'type': 'text'}, 'di.text.sh22te1': {'id': 'di.text.sh22te1', 'type': 'text'}, 'di.text.sh23': {'id': 'di.text.sh23', 'type': 'text'}, 'di.text.sh23te0': {'id': 'di.text.sh23te0', 'type': 'text'}, 'di.text.sh24': {'id': 'di.text.sh24', 'type': 'text'}, 'di.text.sh24te0': {'id': 'di.text.sh24te0', 'type': 'text'}, 'di.text.sh25': {'id': 'di.text.sh25', 'type': 'text'}, 'di.text.sh25te0': {'id': 'di.text.sh25te0', 'type': 'text'}, 'di.text.te15': {'id': 'di.text.te15', 'type': 'text'}, 'di.text.te16': {'id': 'di.text.te16', 'type': 'text'}, 'di.text.te17': {'id': 'di.text.te17', 'type': 'text'}, 'di.text.te18': {'id': 'di.text.te18', 'type': 'text'}, 'di.text.te19': {'id': 'di.text.te19', 'type': 'text'}, 'di.text.te20': {'id': 'di.text.te20', 'type': 'text'}, 'di.text.te21': {'id': 'di.text.te21', 'type': 'text'}, 'di.text.te22': {'id': 'di.text.te22', 'type': 'text'}, 'di.text.te23': {'id': 'di.text.te23', 'type': 'text'}, 'di.text.te24': {'id': 'di.text.te24', 'type': 'text'}, 'di.text.te25': {'id': 'di.text.te25', 'type': 'text'}, 'di.text.te26': {'id': 'di.text.te26', 'type': 'text'}, 'di.table.ta26': {'id': 'di.table.ta26', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}


 --------- < Page Content > ---------
2025 Seed Program Application

<table><tbody><tr><td rowspan="3">Evaluation of the Basis for the Project (40)</td><td>Potential to lead Korean Studies (20)</td><td>- Assess whether the university has a distinguished reputation in terms of history and academic disciplines.<br>- Evaluate the strength of the network between the Project Director and local researchers.</td></tr><tr><td>Performance capacity (20)<br>Eligibility criteria (10)</td><td>- Determine if the project director possesses the skills and commitment to execute the project (e.g., Korean language proficiency, influence within the institution, management skills).<br>- Review the achievements of collaborative researchers in Korean Studies.<br>- Confirm whether personnel (Beginning/Advanced) or coursework (Advanced) meet eligibility criteria.</td></tr><tr><td>University support (10)</td><td>- Measure the institution's willingness to support Korean Studies (financial, spatial, and human resources, appropriate indirect expense ratio).<br>- Assess the competency of the Central Grant Management Department.</td></tr><tr><td rowspan="2">Evaluation of the Project Content (40)</td><td>Project plans (30)</td><td>- Ensure that the project objectives are realistic and well-defined.<br>- Verify that the plan aligns with local conditions.<br>- Review the suitability of the Project Team’s structure.<br>- Assess whether the budget plan reflects local price levels.</td></tr></tbody></table>

 2 / 3


 --------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.table.ta29': {'id': 'di.table.ta29', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
I