How to remove tracking metadata from my PDF

SpookyBanana@reddthat.com · 2 years ago

How to remove tracking metadata from my PDF

Lunch@lemmy.world · 2 years ago

Could look into using exiftool, qpdf or pdftk, if you are comfortable with the terminal ✨

0x4E4F@vlemmy.net · 2 years ago

qpdf is very powerful. If OP is comfortable with the terminal, I’d recommend qpdf.

ruination@discuss.tchncs.de · 2 years ago

There’s dangerzone by freedom of press

Shizu@lemmy.world · 2 years ago

I would try reprinting the PDFs and comparing the hashes afterwards. That should remove any metadata in the headers as new headers are created.

bionicjoey@lemmy.ca · 2 years ago

That wouldn’t work for something like Pathfinder PDFs from the Paizo website. They add a text watermark with the name and email associated with your account on their site to each page of the document. It’s not metadata, it’s actual data

Shizu@lemmy.world · 2 years ago

Why would the checksum differ between downloads if there was a watermark with user identifiable data

bionicjoey@lemmy.ca · edit-2 2 years ago

Just checked one of my Paizo pdfs and in addition to my account name and email address it also has the datetime that I downloaded the pdf written in the watermark. Presumably because they append the file creation time when the pdf is being signed

Shizu@lemmy.world · 2 years ago

Fair, then reprinting won’t help. I’d go ahead and come up with some Python script which exported all pages as png, edited that specific portion of every image and recompile it to a pdf. I’m not sure if there is a too which could already do that out-of-the-box.

bionicjoey@lemmy.ca · 2 years ago

Unfortunately then you lose things like text and links. I think the only real solution for my specific example (which to be clear, might not be OP’s dilemma) is to crack and directly edit the binary data of the PDF file

SpookyBanana@reddthat.com · 2 years ago

What you mean by crack and directly edit?

daranto@feddit.de · 2 years ago

Maybe print the book via print to pdf and check again.

gh0stkey@lemmy.world · 2 years ago

Wow… The amount of information already being shared here is outstanding! Keep on rowing/patching mates

arr@lemmy.dbzer0.com · edit-2 1 year ago

deleted by creator

thumbman@lemm.ee · 2 years ago

Okay hear me out… physically print the documents then, using a high resolution scanner, make a digital copy and finally use a raster to vector convertor.

I know this is probably dumb, but I just wanted to throw this out there.

0x4E4F@vlemmy.net · edit-2 2 years ago

Why not just print it to PDF. It doesn’t lose any data, plus it doesn’t take ages to scan the books.

CanOpener · edit-2 1 year ago

deleted by creator

bbbhltz@beehaw.org · 2 years ago

Exiftool can remove metadata. There might even be websites that can handle this.

tubbadu@lemmy.kde.social · 1 year ago

Perhaps printing to pdf may work