Convert PDF to Markdown in Python
A step-by-step tutorial using the REST API: get a key, create a job, poll, and download clean Markdown – with a full, copyable example and proper error handling.
One key, three calls
To convert a PDF to Markdown in Python you call a small REST API with a bearer key: POST /api/v2/jobs to create a job, GET /api/v2/jobs/{id} to poll until it is ready, then GET /api/v2/jobs/{id}/download for the Markdown. There is nothing to host and no heavy library to install – just requests. The same lifecycle is exposed as a hosted MCP for agents, and the output is the same Markdown the extension and web app produce, so you can prototype in the browser and then automate with confidence.
Set up in four steps
Get an API key
Sign in with a free Google account and create an API key in your account; it is shown once. Send it as Authorization: Bearer p2m_your_key.
Create a job
POST the PDF URL (or uploaded bytes) to /api/v2/jobs. You get back a job id and a status.
Poll until ready
GET /api/v2/jobs/{id} until status is ready or error.
Download the Markdown
GET /api/v2/jobs/{id}/download for the Markdown text. Honor truncated and pages.
The complete Python script
Standard library plus requests. Create from a PDF URL, poll, then save the Markdown.
# pip install requests import requests, time API = "https://pdf2md.dev/api/v2" H = {"Authorization": "Bearer p2m_your_key"} # 1) create a job from a PDF URL r = requests.post(f"{API}/jobs", headers={**H, "Idempotency-Key": "report-2026-01"}, json={"url": "https://example.com/report.pdf"}) r.raise_for_status() jid = r.json()["job_id"] # 2) poll until ready or error while True: job = requests.get(f"{API}/jobs/{jid}", headers=H).json() if job["status"] in ("ready", "error"): break time.sleep(3) if job["status"] == "error": raise SystemExit(f"conversion failed: {job.get('error_code')} {job.get('error_message')}") # 3) download the Markdown md = requests.get(f"{API}/jobs/{jid}/download", headers=H).text if job.get("truncated"): print("note: partial result (hit the time budget)") open("report.md", "w").write(md)
Uploading a local file instead of a URL? POST the bytes as multipart/form-data with a file field to the same endpoint. The full request and response shapes are in the OpenAPI spec.
To convert many PDFs, create one job per file and poll them in parallel up to your slot limit (3 on the free tier, more on paid). Keep a distinct Idempotency-Key per file so retries never duplicate work, and back off on 429 using the Retry-After header.
Errors, retries and webhooks
Read the error code
A failed job returns status: error with a machine-readable error_code (processing_timeout or conversion_failed) and a safe error_message – branch on the code, not the text.
Handle truncated
A long document can come back ready with truncated=true. Check the flag and split the file or use a longer paid budget.
Idempotency-Key
Send an Idempotency-Key header so a retried create returns the same job instead of duplicating work.
Webhooks over polling
On paid tiers, register a webhook or pass callback_url and get a POST on ready/error instead of polling.
Respect 429
On 429, wait for the Retry-After seconds before retrying; do not hammer the queue.
Keep keys server-side
The key is a secret: store it server-side, send over TLS, and rotate or revoke it any time.
Not on Python? Same three calls
It is a plain HTTPS API, so any language works. The same create / poll / download in Node:
// Node 18+ (global fetch) const API = "https://pdf2md.dev/api/v2"; const H = { Authorization: "Bearer p2m_your_key" }; let r = await fetch(`${API}/jobs`, { method: "POST", headers: { ...H, "Content-Type": "application/json" }, body: JSON.stringify({ url: "https://example.com/report.pdf" }) }); let { job_id } = await r.json(); let job; do { await new Promise(s => setTimeout(s, 3000)); job = await (await fetch(`${API}/jobs/${job_id}`, { headers: H })).json(); } while (!["ready", "error"].includes(job.status)); const md = await (await fetch(`${API}/jobs/${job_id}/download`, { headers: H })).text();
Prefer agent tools or RAG?
The same lifecycle is a hosted MCP for ChatGPT, Claude and agent frameworks. For ingestion and chunking, see the RAG guide.
Common questions
How do I convert a PDF to Markdown in Python?
Call the REST API with a bearer key: POST the PDF to /api/v2/jobs, poll GET /api/v2/jobs/{id} until ready, then GET the /download for the Markdown. The full requests example is above.
Do I need an API key?
Yes for the API. A free Google account lets you create a key and use the hosted MCP. The browser extension and web app stay anonymous and need no key.
How do I handle errors and timeouts?
A failed job returns status: error with a machine-readable error_code (processing_timeout or conversion_failed) and a safe error_message. A long document can come back ready with truncated=true; check that flag.
Can I avoid polling?
On paid tiers, register a webhook or pass callback_url when creating the job, and the service POSTs you on ready or error so you do not poll.
Does it work from Node or other languages?
Yes. It is a plain HTTPS API, so any language works. A short Node example is above, and the full contract is in the OpenAPI spec.
How do I convert a local file instead of a URL?
POST the file bytes as multipart/form-data with a file field to /api/v2/jobs, instead of a JSON body with a url. The poll and download steps are identical.
Can I convert many PDFs at once?
Create a job per file and poll them in parallel up to your slot limit (3 on the free tier, more on paid). Paid tiers add a batch-create endpoint and webhooks so you do not poll at all.
Is the output the same as the extension and web app?
Yes. Every surface uses the same conversion engine, so the API returns the same Markdown you would get in the browser.
Is it free?
The free tier gives 3 slots, 10 MB files, a 15-minute time budget and 1-hour retention. Paid tiers raise every limit and add webhooks and batch create.