Extract Tables from PDF to Markdown

Q: Do multi-page and multi-column tables stay intact?

Yes. Multi-column layouts are read in the right order and a table that continues across pages is joined into one Markdown table. Very complex merged-cell tables can need light cleanup.

Short answer

Yes – real Markdown tables, not pictures

Copying a table out of a PDF usually collapses it into misaligned lines, because a PDF stores characters by position, not as a table. PDF to Markdown reconstructs the rows and columns and writes them as a genuine Markdown table – pipes, a header row and aligned cells – so the numbers stay editable, diffable and searchable. It handles multi-column pages, tables that run across several pages, formulas, and tables on scanned pages, with nothing to set up.

How to

Convert a PDF table in 4 steps

No account needed. The whole document is converted, tables included.

1

Open the converter

Install the Chrome extension or open the web app. Both work anonymously.

2

Add the PDF

Drag in the file, pick it from disk, or paste a direct PDF URL.

3

Wait for the job

Status goes queued, processing, ready. Columns are rebuilt into aligned Markdown tables.

4

Copy or download

Preview the rendered Markdown and the raw source, then copy the table or download a .md file.

Tip: need just the tables from many files? Drive the same conversion from the REST API or hosted MCP and parse the Markdown tables in your own code.

What survives

Structure kept, not screenshots

The converter rebuilds the table a PDF only draws visually, so the result is data you can use.

Aligned rows & headers

Columns become genuine Markdown table cells with a header row, instead of misaligned text.

Multi-column order

Two- and three-column page layouts are read in the correct reading order, so cells land in the right place.

Multi-page tables

A table that continues across several pages is joined into one Markdown table instead of being split.

Formulas inside cells

Mathematical notation in or beside a table is preserved rather than flattened into garbled characters.

Scanned tables

Image-only and scanned tables are OCR'd into selectable Markdown tables.

Engine choice

MinerU is robust on dense, complex tables; Docling is fast on clean, simple ones.

A page table comes back as plain Markdown you can paste into a doc, a spreadsheet importer or an LLM prompt:

| Region | Units | Revenue |
| ------ | ----- | ------- |
| North  | 1,240 | $312K   |
| South  |   980 | $244K   |
| EMEA   | 1,610 | $402K   |

Because it is plain text, the table goes anywhere: paste it into Google Sheets or Excel, drop it into a Markdown doc, diff it in Git, or hand it to an LLM as clean context. No manual realignment, and the numbers stay editable instead of locked in a picture.

Complex tables: dense tables with merged or nested header cells, or tables rotated sideways on the page, convert too but can need a quick visual check. Straightforward grids are ready to use as-is.

Math & formulas

Formulas survive the conversion

Most PDF extractors turn equations into nonsense. Here mathematical notation is kept, so technical and scientific documents stay usable.

Why it matters

A flattened formula is worse than useless in an LLM prompt or a knowledge base: the symbols scramble and the meaning is lost. Keeping the notation means equations next to your tables, and the values they produce, stay readable for both people and models.

Good to know

Inline and block math in and around tables is preserved during conversion.

Scanned equations go through OCR like the rest of the page.

Very dense math can need a quick visual check, as with any converter.

Extracting tables at scale?

The same converter is a REST API and a hosted MCP endpoint. Convert a PDF and parse the Markdown tables in your own code or from an agent – see the Python tutorial on the developer hub.

Developer hub Markdown for RAG OpenAPI

FAQ

Common questions

Can it convert PDF tables to Markdown?

Yes. Columns are rebuilt into real, aligned Markdown tables (pipes and rows) rather than a screenshot or a jumble of lines, so the data stays editable and searchable.

Do multi-page and multi-column tables stay intact?

Yes. Multi-column layouts are read in the right order, and a table that continues across pages is joined into one Markdown table. Very complex merged-cell tables can need light cleanup.

Are formulas kept?

Yes. Mathematical notation is preserved rather than flattened into garbled characters, so formulas in and around tables survive the conversion.

What about tables in scanned PDFs?

Scanned and image-only tables are OCR'd into selectable Markdown tables. See converting scanned PDFs for the full OCR walkthrough.

Can I get the tables through an API?

Yes. The REST API and hosted MCP return the full Markdown, tables included, so you can extract tables programmatically or from an agent.

Can I open the extracted tables in Excel or Google Sheets?

Yes. A Markdown table pastes cleanly into a sheet, or you can turn the pipes into CSV in a couple of lines. Because the cells are real text, not an image, the data is immediately editable.

Why do columns break when I copy-paste a table from a PDF?

A PDF stores characters by position, not as a table, so copy-paste loses the column structure and everything collapses into misaligned lines. Converting to Markdown reconstructs the rows and columns instead.

Is it free?

Yes. Convert anonymously in the browser with no account on the free tier (3 slots, 10 MB files, a 15-minute time budget, 1-hour retention). Paid tiers raise every limit.

Extract tables from PDF to Markdown

Yes – real Markdown tables, not pictures

Convert a PDF table in 4 steps

Open the converter

Add the PDF

Wait for the job

Copy or download

Structure kept, not screenshots

Aligned rows & headers

Multi-column order

Multi-page tables

Formulas inside cells

Scanned tables

Engine choice

Formulas survive the conversion

Why it matters

Good to know

Extracting tables at scale?

Common questions