Tools

Tools: Docling CLI to parse PDFs and export it to multiple formats (2026)

2026-03-28 0 views admin

What is Docling ???

I'll be taking you through the process of parsing PDFs into structured formats.

Step 1: Set up

Step 2: Installing docling

Step 3: Creating input and outputs folders

Step 4: Changing the pdfs into html format

Step 5: Changing the pdfs into other formats

1. Markdown

2. Json

3. Plain text

4. yaml

5. html_split_page

6. DOCtags

7. vtt

Step 6: Analyzing the result findings.

1. Pdf with tables

2. Pdf with text and images

3. Pdf with tables and paragraphs Docling is an open source document processing library that converts various document formats into structured outputs.

Docling plays an important part in the RAG pipeline. Check the docling's version I used three types of pdfss;

one with tables, the other with text and images and the other had tables and paragraphs. Here are my key findings; Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

mkdir docling_cli cd docling_cli mkdir docling_cli cd docling_cli mkdir docling_cli cd docling_cli pip install docling docling --version pip install docling docling --version pip install docling docling --version docling --to html *.pdf --output ~Documents/docling_cli/outputs/html_outputs docling --to html *.pdf --output ~Documents/docling_cli/outputs/html_outputs docling --to html *.pdf --output ~Documents/docling_cli/outputs/html_outputs - Create the project structure in your terminal; - Create your virtual environment and activate it. Fedora - create a folder called data where you will stored your desired pdfs. - create a new folder and name it outputs then inside the folders create new folders called; markdown outputs, html outputs and json outputs. - In HTML, the rows and columns came out better than they were in the original pdf. - Markdown outputs were good too as it wrote the tables in markdown format without losing anything. - JSON was broke everything down into nested objects - Plain text was good too but not as compared to markdown. - HTML lost the color of the images. - Paragraphs in all formats came out nicely as texts.

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsdoclingparseexportmultipleformats

More from Tools

Tools: Gas-Aware Trading: Execute Only When Gas Is Cheap (2026)

2026-03-30 0

Tools: Grafana k6 Has a Free API That Load Tests Your APIs With JavaScript - Full Analysis

2026-03-30 0

Tools: Caddy Has a Free API That Gives You Automatic HTTPS With Zero Configuration (2026)

2026-03-30 0

Tools: Fly.io Has a Free API That Deploys Docker Apps Globally With Edge Hosting (2026)

2026-03-30 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Docling CLI to parse PDFs and export it to multiple formats (2026)

What is Docling ???

I'll be taking you through the process of parsing PDFs into structured formats.

Step 1: Set up

Step 2: Installing docling

Step 3: Creating input and outputs folders

Step 4: Changing the pdfs into html format

Step 5: Changing the pdfs into other formats

1. Markdown

2. Json

3. Plain text

4. yaml

5. html_split_page

6. DOCtags

7. vtt

Step 6: Analyzing the result findings.

1. Pdf with tables

2. Pdf with text and images

3. Pdf with tables and paragraphs Docling is an open source document processing library that converts various document formats into structured outputs.

🏷️ Tags

More from Tools

Tools: Gas-Aware Trading: Execute Only When Gas Is Cheap (2026)

Tools: Grafana k6 Has a Free API That Load Tests Your APIs With JavaScript - Full Analysis

Tools: Caddy Has a Free API That Gives You Automatic HTTPS With Zero Configuration (2026)

Tools: Fly.io Has a Free API That Deploys Docker Apps Globally With Edge Hosting (2026)

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting