Go tutorial

Convert PDF to Markdown in Go

A typed, dependency-free way to turn PDFs into Markdown from Go. The example uses only net/http and encoding/json from the standard library, decodes the job into a struct, and compiles to a single static binary you can drop into a service.

Short answer

One struct, three requests

Model the job as a Go struct with json tags, then make three net/http calls: POST the PDF to /api/v2/jobs to get a job_id, poll /api/v2/jobs/{job_id} until Status is ready, and GET /api/v2/jobs/{job_id}/download for the Markdown. No SDK, no cgo, no GPU – the conversion, OCR and table work all happen server-side.

How to

The full program

Save as convert.go and run go run convert.go. Standard library only.

// Go 1.21+, standard library only. go run convert.go
package main

import (
	"bytes"
	"context"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
	"os"
	"time"
)

const api = "https://pdf2md.dev/api/v2"

type job struct {
	JobID        string `json:"job_id"`
	Status       string `json:"status"`
	ErrorCode    string `json:"error_code"`
	ErrorMessage string `json:"error_message"`
	Truncated    bool   `json:"truncated"`
}

func authed(ctx context.Context, method, url string, body io.Reader) *http.Request {
	req, _ := http.NewRequestWithContext(ctx, method, url, body)
	req.Header.Set("Authorization", "Bearer p2m_your_key")
	return req
}

func main() {
	ctx := context.Background()

	// 1) create a job from a PDF URL
	payload, _ := json.Marshal(map[string]string{"url": "https://example.com/report.pdf"})
	req := authed(ctx, "POST", api+"/jobs", bytes.NewReader(payload))
	req.Header.Set("Content-Type", "application/json")
	req.Header.Set("Idempotency-Key", "report-2026-01")
	res, err := http.DefaultClient.Do(req)
	if err != nil {
		panic(err)
	}
	var j job
	json.NewDecoder(res.Body).Decode(&j)
	res.Body.Close()

	// 2) poll until ready or error
	for j.Status != "ready" && j.Status != "error" {
		time.Sleep(3 * time.Second)
		res, _ := http.DefaultClient.Do(authed(ctx, "GET", api+"/jobs/"+j.JobID, nil))
		json.NewDecoder(res.Body).Decode(&j)
		res.Body.Close()
	}
	if j.Status == "error" {
		fmt.Fprintf(os.Stderr, "conversion failed: %s %s\n", j.ErrorCode, j.ErrorMessage)
		os.Exit(1)
	}

	// 3) download the Markdown
	res, _ = http.DefaultClient.Do(authed(ctx, "GET", api+"/jobs/"+j.JobID+"/download", nil))
	md, _ := io.ReadAll(res.Body)
	res.Body.Close()
	os.WriteFile("report.md", md, 0o644)
	fmt.Println("saved report.md")
}

Why a struct: decoding into a typed job gives you compile-time safety on Status, ErrorCode and Truncated, and the decoder simply ignores any response fields you do not model.

Go's strength

Convert many PDFs at once

Where Go shines is fan-out. To convert a batch, run each file's create-poll-download sequence in a goroutine and bound them with a semaphore so you stay friendly to the API.

Bounded worker pool

Use a buffered channel as a semaphore: sem := make(chan struct{}, 5). Each goroutine takes a slot before it starts and releases it when done, so at most five conversions run concurrently. Collect results with a sync.WaitGroup.

Context and timeouts

context.WithTimeout plus NewRequestWithContext cancels a stuck request cleanly.
error_code processing_timeout or conversion_failed tells you why a job failed; truncated flags a partial result.

Prefer another language? See the Node.js tutorial, the Python tutorial, or the cURL recipe.

Wiring it into a service?

The conversion is also a hosted MCP endpoint, so a Go-based agent can call it as a tool. The full reference and the OpenAPI spec are on the developer hub.

Concurrency in practice

A bounded worker pool

Wrap the single-file flow from above in convertOne, then fan out over a list of PDFs with a semaphore that caps how many run at once.

// convertOne runs the create -> poll -> download flow for one URL.
func convertAll(ctx context.Context, urls []string) {
	sem := make(chan struct{}, 5) // at most 5 conversions in flight
	var wg sync.WaitGroup
	for _, u := range urls {
		wg.Add(1)
		go func(url string) {
			defer wg.Done()
			sem <- struct{}{}        // take a slot (blocks if full)
			defer func() { <-sem }() // release it
			if err := convertOne(ctx, url); err != nil {
				log.Printf("skip %s: %v", url, err)
			}
		}(u)
	}
	wg.Wait()
}

The buffered channel is the entire rate limiter: a goroutine cannot start real work until it has pushed into sem, and only five tokens exist. Increase the buffer for more throughput on a higher tier, decrease it to be gentle on the free tier. Pair this with a per-file Idempotency-Key so a rerun after a crash does not reconvert what already finished, and the same pool handles a folder of ten PDFs or ten thousand.

FAQ

Common questions

Does the Go example need any dependencies?

No. It uses only the standard library (net/http and encoding/json), so go run convert.go works with no modules to add and compiles to a single static binary.

How do I map the JSON response in Go?

Define a struct with json tags for job_id, status, error_code, error_message and truncated, then decode with json.NewDecoder. Fields you do not list are simply ignored.

How do I convert many PDFs concurrently in Go?

Run the create, poll and download sequence inside goroutines and bound them with a buffered channel used as a semaphore, so you convert several files at once without flooding the API.

How do I add a timeout or cancellation?

Use context.WithTimeout with http.NewRequestWithContext so a stuck request is cancelled. The server also enforces its own time budget and returns error_code processing_timeout when a document is too large.

How do I upload a local PDF file in Go?

Build a multipart/form-data body with mime/multipart, write the PDF into a file field, set the Content-Type to the multipart boundary, and POST it to /api/v2/jobs instead of the JSON url body.