How to use PDFs with a local AI model without uploading anything
Running a local LLM but still sending PDFs to a cloud API for extraction? Here is a fully local pipeline that keeps your documents on your machine.
Running a language model on your own machine has become genuinely approachable. Tools like Ollama, LM Studio, and llama.cpp let you run capable models on your own hardware without sending anything to an API. But there is a gap that does not get talked about enough: even when the model is local, your documents often still leave your machine during the preparation step.
This post covers a simple workflow that keeps the whole pipeline on your device, from the original PDF to the model's answer.
The gap: local model, cloud document prep
Most people who run local language models still reach for a web-based tool or a cloud API when they need to extract text from a PDF before feeding it to the model. That extraction step is where the file quietly moves off your device.
For a public research paper it probably does not matter. For a contract, an internal report, or any document you wouldn't hand to a stranger, that step is the leak in an otherwise private setup.
What a local model actually needs from a PDF
Raw PDF text is messy. Extraction often pulls in repeated headers and footers, page numbers, line breaks in the middle of sentences, and characters that don't map cleanly. Feed that directly into a model and the quality of the answers drops noticeably.
What actually helps the model is clean, structured text:
- Repeated headers and footers removed.
- Line breaks that snap mid-sentence joined back together.
- Section structure preserved so the model knows where topics change.
- One coherent block per topic instead of one fragment per page.
Markdown works well as the intermediate format here. It keeps enough structure to stay useful, without adding noise that fights the model.
A fully local pipeline
Here is a workflow where nothing leaves your machine:
- Open a browser-based PDF tool that processes locally. The extraction runs inside your browser tab. Nothing is uploaded.
- Convert the PDF to Markdown. This runs in the browser, not on a server.
- Save the
.mdfile, or copy the Markdown text directly. - Paste it into your local model's context window, or load it as a file if your LLM tool supports file input.
- Run inference locally as you normally would.
The PDF never reaches a server. The extraction, the cleanup, and the inference all happen on your hardware.
Why the extraction step matters more than most people think
The biggest variable in how well a local model handles a document is not the model size. It is the quality of what you feed in. A model that produces mediocre output on a raw PDF dump can produce noticeably better answers on the same content after a clean extraction.
This compounds on longer documents. A 40-page report where every page starts with a repeated header and ends with a page number adds a lot of noise over 40 repetitions. The model spends attention on that wrapper instead of the content you actually care about.
Scanned documents work too
Scanned PDFs are images, not text. Getting readable text from them requires OCR, optical character recognition. Historically that meant uploading the file somewhere.
That is no longer necessary. OCR can now run in the browser using WebAssembly, which means even a scanned document does not have to leave your device. Set OCR mode to auto and the extraction handles both text-layer and image-layer pages in one pass, locally.
Which model size works for documents
Local models vary in how well they handle longer documents. For question answering or summarization, a larger context window matters more than raw model size. Models in the 7B to 14B range, running on a machine with at least 16 GB of RAM, handle most single-document tasks without trouble.
The clean extraction usually matters more than adding a bigger model. Test it: take a document that gives your model trouble, run a clean extraction, and feed it again. The difference is often larger than switching to a bigger model.
PDFShore handles the extraction step locally, whether the source is a text PDF or a scanned document. The output is clean Markdown ready to drop into whatever local AI setup you use.