What if the PDF already has a bad or partial text layer?

You can force OCR so the converter re-reads the page images instead of trusting the existing text layer, which fixes garbled or missing text.

Convert Scanned PDF to Markdown with OCR (Cyrillic & English)

Q: Does the OCR support Cyrillic and Russian?

Yes. The OCR reads Cyrillic and English, including mixed-language documents, and turns the recognized text into Markdown.

Q: Are tables and formulas kept when converting a scan?

Yes. Columns are rebuilt as real Markdown tables instead of jumbled lines, and mathematical notation is preserved rather than flattened.

Q: Why is my scanned PDF result marked truncated?

OCR is slow, so very long scans can hit the per-document time budget. The converter returns what it processed, flagged as a partial (truncated) result. A paid tier has a longer time budget, or you can split the file.

Q: Is it free and private?

Yes. Convert anonymously in the browser with no account: 3 slots, 10 MB files, a 15-minute time budget and 1-hour retention on the free tier. Files are auto-deleted after the retention window and are never used to train models.

Short answer

Yes – a scan becomes selectable Markdown

A scanned PDF is just images of pages, so plain copy-paste returns nothing or garbled characters. PDF to Markdown detects image-only pages and runs OCR (optical character recognition) automatically, turning the pictures of text into real, selectable Markdown – headings, lists, tables and all. It works on documents scanned in Cyrillic, English or a mix of both, and you can convert in the browser without signing up.

How to

Convert a scanned PDF in 4 steps

No account needed. OCR runs automatically, or force it when a PDF has a bad text layer.

1

Open the converter

Install the Chrome extension or open the web app. Both work anonymously.

2

Add the scanned PDF

Drag in the file, pick it from disk, or paste a direct PDF URL. OCR runs automatically on image-only pages; toggle force OCR when the existing text layer is wrong.

3

Wait for the job

Status goes queued, processing, ready. OCR is heavier than reading digital text, so scans take longer than native PDFs.

4

Copy or download

Preview the rendered Markdown and the raw source, then copy it to your clipboard or download a .md file.

Tip: automating bulk scans? Skip the UI and call the REST API or hosted MCP – same OCR, driven from your own code or agent.

What OCR preserves

More than just plain text

Recognizing characters is the easy part. The converter rebuilds the document structure a scan loses, so the Markdown is usable by people and models alike.

Cyrillic & English

Reads Russian and English scans, including mixed-language pages, into selectable text.

Real tables

Scanned columns become genuine Markdown tables instead of a jumble of misaligned lines.

Formulas kept

Mathematical notation is preserved rather than flattened into garbled characters.

Force OCR

Override a bad or partial text layer and re-read the page images when the embedded text is wrong.

Links & footnotes

Where present, hyperlinks and footnotes carry over as Markdown links instead of being dropped.

Engine choice

Convert with MinerU or Docling, depending on the document and the result you want.

What to expect

Free tier limits & long scans

Free tier limits

Active slots (queue depth)3

Max PDF size10 MB

Time budget per document15 min

Ready result retention1 hour

Paid tiers raise every limit and add a longer time budget for heavy scans. Compare plans →

Long or low-quality scans

Partial results are flagged. If a long scan hits the time budget, you get what was processed, marked truncated, instead of an error. Split the file or use a longer paid budget.

Legibility matters. OCR accuracy follows the scan: a clean, straight, reasonably high-resolution page reads far better than a faint or skewed one.

Private by default. Files are auto-deleted after the retention window and are never used for advertising or to train models.

Converting scans at scale?

The same OCR pipeline is a REST API and a hosted MCP endpoint, with machine-readable discovery so scripts and agents can drive it directly.

Developer hub OpenAPI For AI & LLMs

FAQ

Common questions

Can it convert a scanned PDF to Markdown?

Yes. Image-only and scanned PDFs are OCR'd automatically into selectable Markdown – no separate OCR step and no setup. Just drop the file in the extension or web app.

Does the OCR support Cyrillic and Russian?

Yes. It reads Cyrillic and English, including mixed-language documents, and turns the recognized text into Markdown.

The PDF has a bad text layer – can I force OCR?

Yes. Turn on force OCR so the converter re-reads the page images instead of trusting the embedded text, which fixes garbled or missing characters.

Are tables and formulas kept when converting a scan?

Yes. Scanned columns are rebuilt as real Markdown tables instead of jumbled lines, and mathematical notation is preserved rather than flattened.

Why is my result marked truncated?

OCR is slow, so a very long scan can hit the per-document time budget. The converter returns what it processed, flagged as a partial (truncated) result. A paid tier has a longer budget, or you can split the file.

Is it free and private?

Yes. The free tier gives 3 slots, 10 MB files, a 15-minute time budget and 1-hour retention – anonymous in the browser, no card. Files are auto-deleted after the retention window and are never used to train models.

Convert scanned PDF to Markdown with OCR

Yes – a scan becomes selectable Markdown

Convert a scanned PDF in 4 steps

Open the converter

Add the scanned PDF

Wait for the job

Copy or download

More than just plain text

Cyrillic & English

Real tables

Formulas kept

Force OCR

Links & footnotes

Engine choice

Free tier limits & long scans

Free tier limits

Long or low-quality scans

Converting scans at scale?

Common questions