What your PDF reveals before anyone reads it
Author name, software, edit timestamps: the fields hiding in every PDF's info dictionary, and why they matter before sharing with clients or AI tools.
You export a contract as a PDF and send it to a client. The pages look exactly how you intended. What the client also received, without either of you noticing, was the account name of the person who drafted it, the software your company uses internally, and the exact date and time it was last edited.
That information lives in the metadata fields. It does not appear on any page, but it travels in the document structure every time the file is shared.
The fields most people have never looked at
The PDF format stores a standard set of fields alongside the visible content. Most applications fill these in automatically when they export:
- Author: the user account name on the machine that saved the file.
- Creator: the application that produced the document (Word, Canva, InDesign, a note-taking app).
- Producer: the PDF export library or engine that wrote the bytes.
- Creation date: when the file was first generated.
- Modification date: the last time it was saved.
- Title, Subject, Keywords: sometimes filled with internal project names or auto-populated labels from the authoring tool.
Open the PDF in a browser and you will not see any of this. But it is there, and any tool that reads the file structure can access it instantly.
Why this surfaces more when you use AI tools
Uploading a PDF to a chatbot or AI analysis tool has become a routine step for a lot of people. Some tools send the full file to an API. Others extract the text layer. A few parse the complete document structure, which includes the info dictionary.
Even when the AI does not use the metadata directly, the upload happened with those fields intact. If the Author field contains a real name and you were submitting something anonymously, that has already left. If the Creator field reveals internal software that you would rather not disclose to a third-party service, same problem.
There is also a subtler situation. Plenty of people use AI tools to prepare documents for external audiences: proposals, client reports, public summaries. They proofread the visible content carefully and never think about what the info dictionary is carrying along.
The part that has nothing to do with AI: the file history follows it
This problem predates AI tools entirely. When a PDF arrives at a client and the Creator field says it was made with an internal tool you did not intend to mention, or when an Author field contains the name of someone from your legal team on a document meant to look independent, that information is already out. You cannot un-send it.
The modification date can also expose things without you realizing. If a document is dated as sent on one day but the metadata shows it was last edited an hour before, that gap is readable by anyone with basic tools.
How to check what your file is storing
In Adobe Acrobat, go to File, then Properties, then the Description tab. On macOS you can select the file in Finder and press Command+I for a quick summary. The exiftool command-line utility gives a more complete read, including XMP metadata that some applications write on top of the standard fields.
Removing the fields before you share
PDFShore's metadata remover strips all standard fields and saves a clean copy. The pages, fonts, images, and layout stay untouched. Only the info dictionary is wiped.
It runs in your browser, so the original file does not leave your machine in the process. For something designed to stop information from leaking out, that seems like the right architecture.