Roundup

Best PDF to Markdown converters (2026)

There is no single best tool – there is a best tool for your situation. Below is an honest, criteria-based roundup of the hosted, open-source and enterprise options, a side-by-side table, and who each one is for. Full disclosure: pdf2md.dev is one of them, so we kept the criteria explicit.

Short answer

Pick by what you need

For free, instant conversion with no setup, use pdf2md.dev. To self-host, look at Marker, Docling or MinerU. For RAG inside LlamaIndex, LlamaParse. For enterprise procurement, Adobe PDF Extract. For OCR with bounding boxes and confidence scores, Mistral OCR. For a tiny local multi-format library, Microsoft MarkItDown. The table and the ranked notes below explain the trade-offs.

At a glance

The converters, side by side

The fastest way to narrow the field: shape, the free path, built-in OCR and how you automate it.

Tool Shape Free path Built-in OCR API / agent
pdf2md.dev Hosted Free anonymous Yes REST + hosted MCP
Marker Self-host library Free (self-host) Yes (Surya) Datalab API (paid)
Docling Self-host library Free (MIT) Yes Library
MinerU Self-host library Free (open-source) Yes Library
LlamaParse Cloud API 10k credits/mo Yes REST + SDK
Adobe PDF Extract Enterprise SaaS 500 tx/mo (≤2,500 pages) Yes REST API
Mistral OCR Cloud API Pay per page Yes REST API
MarkItDown Self-host library Free (MIT) No (plugin) Library

Competitor details from each project's public documentation; pdf2md.dev values are the current free-tier limits. All of these evolve – check each source for the latest.

In detail

Who each one is best for

Ranked by how broadly they fit, with the deep one-to-one comparison linked where we have one.

1

pdf2md.dev – best for free, instant conversion

Hosted, so there is nothing to install and no GPU. OCR, real Markdown tables and formulas are built in, it is free to use anonymously in the browser, and the same conversion is a REST API and a hosted MCP for agents. Convert a PDF now.

2

Marker – best self-hosted quality

An open-source library with excellent output, an optional --use_llm flag for near-perfect results on messy pages, and broad format support. Built for a GPU; the licence is free under $2M revenue. pdf2md.dev vs Marker.

3

Docling – best open-source for clean docs and tables

IBM's MIT-licensed library with a strong table-structure model. Fast and reliable on clean documents. pdf2md.dev runs Docling as one of its engines, so you can get it hosted. See extracting tables to Markdown.

4

MinerU – best open-source for dense, complex layouts

Robust on heavy, multi-column and formula-rich pages. Open-source and self-hosted. pdf2md.dev also runs MinerU as an engine, so dense documents convert well without you operating it. See scanned PDF to Markdown.

5

LlamaParse – best for RAG inside LlamaIndex

A GenAI-native cloud parser that plugs straight into LlamaIndex pipelines, with agentic parse modes for complex documents. Credit-metered; needs an account and an API key. pdf2md.dev vs LlamaParse.

6

Adobe PDF Extract – best enterprise SaaS

A polished, vendor-backed service that preserves reading order, links, images and tables in Markdown. Paid pricing is quote-only and documents are processed in Adobe's cloud. pdf2md.dev vs Adobe.

7

Mistral OCR – best for OCR with structural metadata

A pay-per-page API returning Markdown plus bounding boxes, confidence scores and block labels, with broad multilingual coverage. Needs an account and an API key. pdf2md.dev vs Mistral OCR.

8

Microsoft MarkItDown – best tiny local library

An MIT-licensed Python library that converts PDF, DOCX, XLSX, PPTX and more to Markdown. Lightweight, but no built-in OCR and limited on complex tables. pdf2md.dev vs MarkItDown.

Note on engines: Docling and MinerU are the open-source engines pdf2md.dev runs under the hood, so picking pdf2md.dev gets you both hosted, with a free anonymous tier on top.

How we judged

The criteria

Every tool here was weighed on the same things that actually matter for turning a PDF into usable Markdown.

Setup & access

Can you convert without installing, provisioning a GPU, or signing up?

Built-in OCR

Does it read scanned and image-only PDFs out of the box, across languages?

Tables & formulas

Are real Markdown tables and mathematical notation preserved, not flattened?

API & agents

Is there a REST API or a hosted MCP so code and agents can call it?

Cost model

Free, flat tiers, per-page credits, or quote-only enterprise pricing?

Self-hosting

Can you run it entirely on your own machines if you need to?

Want to try the hosted option?

pdf2md.dev converts in the browser for free, and exposes the same conversion as a REST API and a hosted MCP. No install, no GPU, no account to start. See the guides and the Python tutorial.

FAQ

Common questions

What is the best PDF to Markdown converter?

It depends on the need. For free, instant conversion with no setup, pdf2md.dev. For self-hosting, Marker, Docling or MinerU. For RAG inside LlamaIndex, LlamaParse. For enterprise procurement, Adobe PDF Extract. For OCR with structural metadata, Mistral OCR. For a tiny local multi-format library, Microsoft MarkItDown.

What is the best free PDF to Markdown tool?

pdf2md.dev is free to use anonymously in the browser with no account. The open-source libraries (Marker, Docling, MinerU and MarkItDown) are free to self-host if you can run them, though you provide the compute.

Which converts scanned PDFs best?

Tools with built-in OCR handle scanned PDFs well: pdf2md.dev, Marker (via Surya OCR), Docling, MinerU and Mistral OCR. MarkItDown needs a separate OCR plugin that calls an LLM Vision API. See converting scanned PDFs.

Which is best for RAG pipelines?

pdf2md.dev (clean, chunk-friendly Markdown plus a REST API and a hosted MCP any framework can call) and LlamaParse (tightly LlamaIndex-native) are both strong. Pick by whether you want a no-account hosted MCP or tight LlamaIndex integration.

How were these converters ranked?

On free access, no-setup use, built-in OCR, tables and formulas, API and agent access, and self-hosting. pdf2md.dev is one of the tools listed, so the criteria are stated explicitly and each competitor's strengths are noted.