Scanned & image-only PDFs

Convert scanned PDF to Markdown with OCR

Drop an image-only or scanned PDF and get clean, selectable Markdown. Built-in OCR reads Cyrillic and English, rebuilds real tables and keeps formulas – no account, no separate OCR step.

Short answer

Yes – a scan becomes selectable Markdown

A scanned PDF is just images of pages, so plain copy-paste returns nothing or garbled characters. PDF to Markdown detects image-only pages and runs OCR (optical character recognition) automatically, turning the pictures of text into real, selectable Markdown – headings, lists, tables and all. It works on documents scanned in Cyrillic, English or a mix of both, and you can convert in the browser without signing up.

How to

Convert a scanned PDF in 4 steps

No account needed. OCR runs automatically, or force it when a PDF has a bad text layer.

1

Open the converter

Install the Chrome extension or open the web app. Both work anonymously.

2

Add the scanned PDF

Drag in the file, pick it from disk, or paste a direct PDF URL. OCR runs automatically on image-only pages; toggle force OCR when the existing text layer is wrong.

3

Wait for the job

Status goes queued, processing, ready. OCR is heavier than reading digital text, so scans take longer than native PDFs.

4

Copy or download

Preview the rendered Markdown and the raw source, then copy it to your clipboard or download a .md file.

Tip: automating bulk scans? Skip the UI and call the REST API or hosted MCP – same OCR, driven from your own code or agent.

What OCR preserves

More than just plain text

Recognizing characters is the easy part. The converter rebuilds the document structure a scan loses, so the Markdown is usable by people and models alike.

Cyrillic & English

Reads Russian and English scans, including mixed-language pages, into selectable text.

Real tables

Scanned columns become genuine Markdown tables instead of a jumble of misaligned lines.

Formulas kept

Mathematical notation is preserved rather than flattened into garbled characters.

Force OCR

Override a bad or partial text layer and re-read the page images when the embedded text is wrong.

Links & footnotes

Where present, hyperlinks and footnotes carry over as Markdown links instead of being dropped.

Engine choice

Convert with MinerU or Docling, depending on the document and the result you want.

What to expect

Free tier limits & long scans

Free tier limits

Active slots (queue depth)3
Max PDF size10 MB
Time budget per document15 min
Ready result retention1 hour

Paid tiers raise every limit and add a longer time budget for heavy scans. Compare plans →

Long or low-quality scans

Partial results are flagged. If a long scan hits the time budget, you get what was processed, marked truncated, instead of an error. Split the file or use a longer paid budget.
Legibility matters. OCR accuracy follows the scan: a clean, straight, reasonably high-resolution page reads far better than a faint or skewed one.
Private by default. Files are auto-deleted after the retention window and are never used for advertising or to train models.

Converting scans at scale?

The same OCR pipeline is a REST API and a hosted MCP endpoint, with machine-readable discovery so scripts and agents can drive it directly.

FAQ

Common questions

Can it convert a scanned PDF to Markdown?

Yes. Image-only and scanned PDFs are OCR'd automatically into selectable Markdown – no separate OCR step and no setup. Just drop the file in the extension or web app.

Does the OCR support Cyrillic and Russian?

Yes. It reads Cyrillic and English, including mixed-language documents, and turns the recognized text into Markdown.

The PDF has a bad text layer – can I force OCR?

Yes. Turn on force OCR so the converter re-reads the page images instead of trusting the embedded text, which fixes garbled or missing characters.

Are tables and formulas kept when converting a scan?

Yes. Scanned columns are rebuilt as real Markdown tables instead of jumbled lines, and mathematical notation is preserved rather than flattened.

Why is my result marked truncated?

OCR is slow, so a very long scan can hit the per-document time budget. The converter returns what it processed, flagged as a partial (truncated) result. A paid tier has a longer budget, or you can split the file.

Is it free and private?

Yes. The free tier gives 3 slots, 10 MB files, a 15-minute time budget and 1-hour retention – anonymous in the browser, no card. Files are auto-deleted after the retention window and are never used to train models.