Polaris AI DataInsight is a document parser that extracts document elements (text, images, complex tables, charts, etc.) from various file formats into structured JSON, making them easy to integrate into RAG systems.
Installation
Installlangchain-polaris-ai-datainsight package.
Copy
pip install langchain-polaris-ai-datainsight
Environment Setup
Make sure to set the following environment variables:POLARIS_AI_DATA_INSIGHT_API_KEY: Your Polaris AI DataInsight API key. Read Polaris AI DataInsight Documentation to get your API key.
Usage
Copy
import getpass
import os
os.environ["POLARIS_AI_DATA_INSIGHT_API_KEY"] = getpass.getpass(
"Enter your PolarisAIDataInsight API key: "
)
Copy
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader
loader = PolarisAIDataInsightLoader(
file_path="example_data/polaris_ai_example.docx",
resources_dir="example_data/tmp",
mode="page", # "element", "page", or "single". (default is "single")
)
docs = loader.load() # or loader.lazy_load()
for doc in docs[:3]:
print(" --------- < Page Content > --------- ")
print(doc.page_content)
print(" --------- < Metadata > --------- ")
print(doc.metadata)
print("\n")
Copy
--------- < Page Content > ---------
2025 Seed Program Application
I. Funding Information by Track
1. Beginning and Advanced Track Comparison Overview
<table><tbody><tr><td>Category</td><td>Beginning Track*</td><td>Advanced Track*</td></tr><tr><td>Funding target</td><td>A university located outside Korea that has a Central Grant Management Department, an existing Korean Studies infrastructure, and plans to establish an education foundation.</td><td>A non-Korean university with a Central Grant Management Department, at least one full-time Korean Studies faculty member, an undergraduate Korean Studies major or department, and commitment to supporting Korean Studies.</td></tr><tr><td>Funding period</td><td>3 years</td><td>5 years<3+2years></td></tr><tr><td>Funding size</td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 200 million</td></tr><tr><td>B</td><td>Up to KRW 50 million</td></tr></tbody></table></td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 150 million</td></tr><tr><td>B</td><td>Up to KRW 90 million</td></tr></tbody></table></td></tr><tr><td>Required project content</td><td>· Fund 2 or more scholarship students<br>· Offer 1 or more regular Korean Studies lecture courses (Excluding Korean language courses)<br>· Hold 1 or more workshops per year in which that students may participate</td><td>· Hire 1 or more Korean Studies full-time faculty<br>· Fund 1 or more scholarship student for Korean Studies<br>· Offer 2 or more regular graduate-level Korean Studies lecture courses (Excluding Korean language courses)<br>· Hold 1 or more international Korean Studies conference<br>· Establish and manage a website, blog, or social media relating to the program </td></tr><tr><td>Recommended content</td><td>· Foster talent (education)<br>· Establish a Korean Studies research institute/center<br>· Establish Korean Studies undergraduate department/major & program<br>· Develop Korean Studies textbooks<br>· Hold academic activities</td><td>· Foster talent (education)<br>· Establish a Korean Studies research institute/center<br>· Establish Korean Studies M.A/Ph.D. department/major & program<br>· Develop Korean Studies textbooks<br>· Hold academic activities</td></tr></tbody></table>
<img id="di.image.im12" data-category="image"/>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te0': {'id': 'di.text.te0', 'type': 'text'}, 'di.text.te2': {'id': 'di.text.te2', 'type': 'text'}, 'di.table.ta9': {'id': 'di.table.ta9', 'type': 'table'}, 'di.image.im12': {'id': 'di.image.im12', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image12.png'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
--------- < Page Content > ---------
2025 Seed Program Application
II. Review and Selection
1. Review Process
<img id="di.image.im13" data-category="image"/>
Review of whether the basic requirements for application have been met
Review of the Project Proposal
Admistered by the Expert Review Team
Final review and decision
Admistered by the Comprehensive Review Committee
1. Preliminary Review
2. Content Review (80 pts)
3. Comprehensive Review (20 pts)
2. Review Stages and Content
Stage 1: Preliminary Review
Conducted by Main Department
● Verifies document submission, eligibility, and overlapping support.
● Applications missing required documents, signatures, or failing to meet eligibility do not proceed.
● Applications with Indirect Expenses over 10% of Direct Expenses (including Labor Expenses) are rejected.
Stage 2: Content Review
Conducted by Expert Review Team
● Online review: Points given individually
● Panel review: Points determined by consensus
● Assesses leadership potential, capacity, and project plans.
● Items and scores assigned for evaluation.
<table><tbody><tr><td>Areas</td><td>Items (Points)</td><td>Content</td></tr></tbody></table>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te10': {'id': 'di.text.te10', 'type': 'text'}, 'di.text.te12': {'id': 'di.text.te12', 'type': 'text'}, 'di.image.im13': {'id': 'di.image.im13', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image13.png'}, 'di.text.sh15': {'id': 'di.text.sh15', 'type': 'text'}, 'di.text.sh16': {'id': 'di.text.sh16', 'type': 'text'}, 'di.text.sh16te0': {'id': 'di.text.sh16te0', 'type': 'text'}, 'di.text.sh17': {'id': 'di.text.sh17', 'type': 'text'}, 'di.text.sh18': {'id': 'di.text.sh18', 'type': 'text'}, 'di.text.sh19': {'id': 'di.text.sh19', 'type': 'text'}, 'di.text.sh19te0': {'id': 'di.text.sh19te0', 'type': 'text'}, 'di.text.sh19te1': {'id': 'di.text.sh19te1', 'type': 'text'}, 'di.text.sh20': {'id': 'di.text.sh20', 'type': 'text'}, 'di.text.sh21': {'id': 'di.text.sh21', 'type': 'text'}, 'di.text.sh22': {'id': 'di.text.sh22', 'type': 'text'}, 'di.text.sh22te0': {'id': 'di.text.sh22te0', 'type': 'text'}, 'di.text.sh22te1': {'id': 'di.text.sh22te1', 'type': 'text'}, 'di.text.sh23': {'id': 'di.text.sh23', 'type': 'text'}, 'di.text.sh23te0': {'id': 'di.text.sh23te0', 'type': 'text'}, 'di.text.sh24': {'id': 'di.text.sh24', 'type': 'text'}, 'di.text.sh24te0': {'id': 'di.text.sh24te0', 'type': 'text'}, 'di.text.sh25': {'id': 'di.text.sh25', 'type': 'text'}, 'di.text.sh25te0': {'id': 'di.text.sh25te0', 'type': 'text'}, 'di.text.te15': {'id': 'di.text.te15', 'type': 'text'}, 'di.text.te16': {'id': 'di.text.te16', 'type': 'text'}, 'di.text.te17': {'id': 'di.text.te17', 'type': 'text'}, 'di.text.te18': {'id': 'di.text.te18', 'type': 'text'}, 'di.text.te19': {'id': 'di.text.te19', 'type': 'text'}, 'di.text.te20': {'id': 'di.text.te20', 'type': 'text'}, 'di.text.te21': {'id': 'di.text.te21', 'type': 'text'}, 'di.text.te22': {'id': 'di.text.te22', 'type': 'text'}, 'di.text.te23': {'id': 'di.text.te23', 'type': 'text'}, 'di.text.te24': {'id': 'di.text.te24', 'type': 'text'}, 'di.text.te25': {'id': 'di.text.te25', 'type': 'text'}, 'di.text.te26': {'id': 'di.text.te26', 'type': 'text'}, 'di.table.ta26': {'id': 'di.table.ta26', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
--------- < Page Content > ---------
2025 Seed Program Application
<table><tbody><tr><td rowspan="3">Evaluation of the Basis for the Project (40)</td><td>Potential to lead Korean Studies (20)</td><td>- Assess whether the university has a distinguished reputation in terms of history and academic disciplines.<br>- Evaluate the strength of the network between the Project Director and local researchers.</td></tr><tr><td>Performance capacity (20)<br>Eligibility criteria (10)</td><td>- Determine if the project director possesses the skills and commitment to execute the project (e.g., Korean language proficiency, influence within the institution, management skills).<br>- Review the achievements of collaborative researchers in Korean Studies.<br>- Confirm whether personnel (Beginning/Advanced) or coursework (Advanced) meet eligibility criteria.</td></tr><tr><td>University support (10)</td><td>- Measure the institution's willingness to support Korean Studies (financial, spatial, and human resources, appropriate indirect expense ratio).<br>- Assess the competency of the Central Grant Management Department.</td></tr><tr><td rowspan="2">Evaluation of the Project Content (40)</td><td>Project plans (30)</td><td>- Ensure that the project objectives are realistic and well-defined.<br>- Verify that the plan aligns with local conditions.<br>- Review the suitability of the Project Team’s structure.<br>- Assess whether the budget plan reflects local price levels.</td></tr></tbody></table>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.table.ta29': {'id': 'di.table.ta29', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.