Convert PDF to Markdown in Go
A typed, dependency-free way to turn PDFs into Markdown from Go. The example uses only net/http and encoding/json from the standard library, decodes the job into a struct, and compiles to a single static binary you can drop into a service.
One struct, three requests
Model the job as a Go struct with json tags, then make three net/http calls: POST the PDF to /api/v2/jobs to get a job_id, poll /api/v2/jobs/{job_id} until Status is ready, and GET /api/v2/jobs/{job_id}/download for the Markdown. No SDK, no cgo, no GPU – the conversion, OCR and table work all happen server-side.
The full program
Save as convert.go and run go run convert.go. Standard library only.
// Go 1.21+, standard library only. go run convert.go
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"time"
)
const api = "https://pdf2md.dev/api/v2"
type job struct {
JobID string `json:"job_id"`
Status string `json:"status"`
ErrorCode string `json:"error_code"`
ErrorMessage string `json:"error_message"`
Truncated bool `json:"truncated"`
}
func authed(ctx context.Context, method, url string, body io.Reader) *http.Request {
req, _ := http.NewRequestWithContext(ctx, method, url, body)
req.Header.Set("Authorization", "Bearer p2m_your_key")
return req
}
func main() {
ctx := context.Background()
// 1) create a job from a PDF URL
payload, _ := json.Marshal(map[string]string{"url": "https://example.com/report.pdf"})
req := authed(ctx, "POST", api+"/jobs", bytes.NewReader(payload))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Idempotency-Key", "report-2026-01")
res, err := http.DefaultClient.Do(req)
if err != nil {
panic(err)
}
var j job
json.NewDecoder(res.Body).Decode(&j)
res.Body.Close()
// 2) poll until ready or error
for j.Status != "ready" && j.Status != "error" {
time.Sleep(3 * time.Second)
res, _ := http.DefaultClient.Do(authed(ctx, "GET", api+"/jobs/"+j.JobID, nil))
json.NewDecoder(res.Body).Decode(&j)
res.Body.Close()
}
if j.Status == "error" {
fmt.Fprintf(os.Stderr, "conversion failed: %s %s\n", j.ErrorCode, j.ErrorMessage)
os.Exit(1)
}
// 3) download the Markdown
res, _ = http.DefaultClient.Do(authed(ctx, "GET", api+"/jobs/"+j.JobID+"/download", nil))
md, _ := io.ReadAll(res.Body)
res.Body.Close()
os.WriteFile("report.md", md, 0o644)
fmt.Println("saved report.md")
}
Why a struct: decoding into a typed job gives you compile-time safety on Status, ErrorCode and Truncated, and the decoder simply ignores any response fields you do not model.
Convert many PDFs at once
Where Go shines is fan-out. To convert a batch, run each file's create-poll-download sequence in a goroutine and bound them with a semaphore so you stay friendly to the API.
Bounded worker pool
Use a buffered channel as a semaphore: sem := make(chan struct{}, 5). Each goroutine takes a slot before it starts and releases it when done, so at most five conversions run concurrently. Collect results with a sync.WaitGroup.
Context and timeouts
NewRequestWithContext cancels a stuck request cleanly.processing_timeout or conversion_failed tells you why a job failed; truncated flags a partial result.Prefer another language? See the Node.js tutorial, the Python tutorial, or the cURL recipe.
Wiring it into a service?
The conversion is also a hosted MCP endpoint, so a Go-based agent can call it as a tool. The full reference and the OpenAPI spec are on the developer hub.
A bounded worker pool
Wrap the single-file flow from above in convertOne, then fan out over a list of PDFs with a semaphore that caps how many run at once.
// convertOne runs the create -> poll -> download flow for one URL.
func convertAll(ctx context.Context, urls []string) {
sem := make(chan struct{}, 5) // at most 5 conversions in flight
var wg sync.WaitGroup
for _, u := range urls {
wg.Add(1)
go func(url string) {
defer wg.Done()
sem <- struct{}{} // take a slot (blocks if full)
defer func() { <-sem }() // release it
if err := convertOne(ctx, url); err != nil {
log.Printf("skip %s: %v", url, err)
}
}(u)
}
wg.Wait()
}
The buffered channel is the entire rate limiter: a goroutine cannot start real work until it has pushed into sem, and only five tokens exist. Increase the buffer for more throughput on a higher tier, decrease it to be gentle on the free tier. Pair this with a per-file Idempotency-Key so a rerun after a crash does not reconvert what already finished, and the same pool handles a folder of ten PDFs or ten thousand.
Common questions
Does the Go example need any dependencies?
No. It uses only the standard library (net/http and encoding/json), so go run convert.go works with no modules to add and compiles to a single static binary.
How do I map the JSON response in Go?
Define a struct with json tags for job_id, status, error_code, error_message and truncated, then decode with json.NewDecoder. Fields you do not list are simply ignored.
How do I convert many PDFs concurrently in Go?
Run the create, poll and download sequence inside goroutines and bound them with a buffered channel used as a semaphore, so you convert several files at once without flooding the API.
How do I add a timeout or cancellation?
Use context.WithTimeout with http.NewRequestWithContext so a stuck request is cancelled. The server also enforces its own time budget and returns error_code processing_timeout when a document is too large.
How do I upload a local PDF file in Go?
Build a multipart/form-data body with mime/multipart, write the PDF into a file field, set the Content-Type to the multipart boundary, and POST it to /api/v2/jobs instead of the JSON url body.