Convert PDF to Markdown in Python

Q: How do I convert a PDF to Markdown in Python?

Call the REST API with a bearer key: POST the PDF to /api/v2/jobs, poll GET /api/v2/jobs/{id} until ready, then GET /api/v2/jobs/{id}/download for the Markdown. The full requests example is on this page.

Q: Do I need an API key?

Yes for the API. A free Google account lets you create an API key and use the hosted MCP. The browser extension and web app stay anonymous and need no key.

Q: How do I handle errors and timeouts?

A failed job returns status error with a machine-readable error_code (processing_timeout or conversion_failed) and a safe error_message. A long document can come back ready with truncated=true; check that flag.

Q: Can I avoid polling?

On paid tiers, register a webhook or pass callback_url when creating the job, and the service POSTs you on ready or error so you do not poll.

Q: How do I convert a local file instead of a URL?

POST the file bytes as multipart/form-data with a file field to /api/v2/jobs, instead of a JSON body with a url. The poll and download steps are identical.

Short answer

One key, three calls

To convert a PDF to Markdown in Python you call a small REST API with a bearer key: POST /api/v2/jobs to create a job, GET /api/v2/jobs/{id} to poll until it is ready, then GET /api/v2/jobs/{id}/download for the Markdown. There is nothing to host and no heavy library to install – just requests. The same lifecycle is exposed as a hosted MCP for agents, and the output is the same Markdown the extension and web app produce, so you can prototype in the browser and then automate with confidence.

How to

Set up in four steps

1

Get an API key

Sign in with a free Google account and create an API key in your account; it is shown once. Send it as Authorization: Bearer p2m_your_key.

2

Create a job

POST the PDF URL (or uploaded bytes) to /api/v2/jobs. You get back a job id and a status.

3

Poll until ready

GET /api/v2/jobs/{id} until status is ready or error.

4

Download the Markdown

GET /api/v2/jobs/{id}/download for the Markdown text. Honor truncated and pages.

Full example

The complete Python script

Standard library plus requests. Create from a PDF URL, poll, then save the Markdown.

# pip install requests
import requests, time

API = "https://pdf2md.dev/api/v2"
H = {"Authorization": "Bearer p2m_your_key"}

# 1) create a job from a PDF URL
r = requests.post(f"{API}/jobs", headers={**H, "Idempotency-Key": "report-2026-01"},
                  json={"url": "https://example.com/report.pdf"})
r.raise_for_status()
jid = r.json()["job_id"]

# 2) poll until ready or error
while True:
    job = requests.get(f"{API}/jobs/{jid}", headers=H).json()
    if job["status"] in ("ready", "error"):
        break
    time.sleep(3)

if job["status"] == "error":
    raise SystemExit(f"conversion failed: {job.get('error_code')} {job.get('error_message')}")

# 3) download the Markdown
md = requests.get(f"{API}/jobs/{jid}/download", headers=H).text
if job.get("truncated"):
    print("note: partial result (hit the time budget)")
open("report.md", "w").write(md)

Uploading a local file instead of a URL? POST the bytes as multipart/form-data with a file field to the same endpoint. The full request and response shapes are in the OpenAPI spec.

To convert many PDFs, create one job per file and poll them in parallel up to your slot limit (3 on the free tier, more on paid). Keep a distinct Idempotency-Key per file so retries never duplicate work, and back off on 429 using the Retry-After header.

Make it robust

Errors, retries and webhooks

Read the error code

A failed job returns status: error with a machine-readable error_code (processing_timeout or conversion_failed) and a safe error_message – branch on the code, not the text.

Handle truncated

A long document can come back ready with truncated=true. Check the flag and split the file or use a longer paid budget.

Idempotency-Key

Send an Idempotency-Key header so a retried create returns the same job instead of duplicating work.

Webhooks over polling

On paid tiers, register a webhook or pass callback_url and get a POST on ready/error instead of polling.

Respect 429

On 429, wait for the Retry-After seconds before retrying; do not hammer the queue.

Keep keys server-side

The key is a secret: store it server-side, send over TLS, and rotate or revoke it any time.

Other languages

Not on Python? Same three calls

It is a plain HTTPS API, so any language works. The same create / poll / download in Node:

// Node 18+ (global fetch)
const API = "https://pdf2md.dev/api/v2";
const H = { Authorization: "Bearer p2m_your_key" };

let r = await fetch(`${API}/jobs`, { method: "POST",
  headers: { ...H, "Content-Type": "application/json" },
  body: JSON.stringify({ url: "https://example.com/report.pdf" }) });
let { job_id } = await r.json();

let job;
do { await new Promise(s => setTimeout(s, 3000));
     job = await (await fetch(`${API}/jobs/${job_id}`, { headers: H })).json();
} while (!["ready", "error"].includes(job.status));

const md = await (await fetch(`${API}/jobs/${job_id}/download`, { headers: H })).text();

Prefer agent tools or RAG?

The same lifecycle is a hosted MCP for ChatGPT, Claude and agent frameworks. For ingestion and chunking, see the RAG guide.

Hosted MCP Markdown for RAG Developer hub

FAQ

Common questions

How do I convert a PDF to Markdown in Python?

Call the REST API with a bearer key: POST the PDF to /api/v2/jobs, poll GET /api/v2/jobs/{id} until ready, then GET the /download for the Markdown. The full requests example is above.

Do I need an API key?

Yes for the API. A free Google account lets you create a key and use the hosted MCP. The browser extension and web app stay anonymous and need no key.

How do I handle errors and timeouts?

A failed job returns status: error with a machine-readable error_code (processing_timeout or conversion_failed) and a safe error_message. A long document can come back ready with truncated=true; check that flag.

Can I avoid polling?

On paid tiers, register a webhook or pass callback_url when creating the job, and the service POSTs you on ready or error so you do not poll.

Does it work from Node or other languages?

Yes. It is a plain HTTPS API, so any language works. A short Node example is above, and the full contract is in the OpenAPI spec.

How do I convert a local file instead of a URL?

POST the file bytes as multipart/form-data with a file field to /api/v2/jobs, instead of a JSON body with a url. The poll and download steps are identical.

Can I convert many PDFs at once?

Create a job per file and poll them in parallel up to your slot limit (3 on the free tier, more on paid). Paid tiers add a batch-create endpoint and webhooks so you do not poll at all.

Is the output the same as the extension and web app?

Yes. Every surface uses the same conversion engine, so the API returns the same Markdown you would get in the browser.

Is it free?

The free tier gives 3 slots, 10 MB files, a 15-minute time budget and 1-hour retention. Paid tiers raise every limit and add webhooks and batch create.