Skip to content Skip to footer

0 items - $0.00 0

Show HN: HTML visualization of a PDF file’s internal structure by desgeeko

11CommentsShare PostShare on Facebook Share on XShare by EmailSend Link

Show

Show HN: HTML visualization of a PDF file’s internal structure by desgeeko

ByHackTech February 10, 2025

11Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

Inspecting the internal structure of a PDF file involves a lot of things (decompression, parsing, xref indexing, etc…) in order to make sense of the raw bytes.

PDFSyntax takes care of the processing and proposes a visualization approach that consists in adding information and hyperlinks on top of a text that is a mostly a pretty-print of the PDF data once uncompressed. It respects the physical flow of the file while offering a logical navigation between revisions (incremental updates) and between objects.

PDFSyntax is a self-contained Python package – without any dependency – and is principally a low-level PDF library.
The browse command is its highest and most visible part. It produces static HTML content that offers sufficient interactivity: JavaScript may be disabled.

Please try the LIVE DEMO of a full static HTML

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (11)

11 Comments

Post Author

Muromec

Posted February 10, 2025 at 2:07 pm

That's pretty cool! I would have used it a lot at my previous job if it existed back then. In my ideal world it should work somewhat like https://lapo.it/asn1js/ — you drop a file and it does all the stuff locally.

0Likes Log in to Reply
Post Author

SSLy

Posted February 10, 2025 at 2:13 pm

Damn, this is also convenient for forensics and finding watermarks.

0Likes Log in to Reply
Post Author

xeon06

Posted February 10, 2025 at 2:18 pm

Wow, I've been doing some PDF parsing at work and this is going to come in SO handy.

0Likes Log in to Reply
Post Author

est

Posted February 10, 2025 at 2:19 pm

I remember there was a similar project on github allows visualize any type of binary data by a given schema. There was an TCP/IP example IIRC.

0Likes Log in to Reply
Post Author

nonrandomstring

Posted February 10, 2025 at 2:21 pm

Well done. This is a very useful security previewing tool. PDFs are a
menace.

0Likes Log in to Reply
Post Author

swsieber

Posted February 10, 2025 at 2:23 pm

I've used the iText RUPS (free) for a while for debugging PDFs (as I have the "privilege" to work on code that extracts data from PDFs…). It looks like your introspection stuff might be a bit stronger, which would be great. I'll take it for a whirl.

0Likes Log in to Reply
Post Author

tyilo

Posted February 10, 2025 at 2:41 pm

Looks nice.

Would be better if all of the PDF's bytes where shown. Seems like `endobj` and `xref` are not shown.

0Likes Log in to Reply
Post Author

escapecharacter

Posted February 10, 2025 at 2:57 pm

I’ve been shopping for something that does a per-byte description of the content of visual media formats (jpeg, png, avi, mp4, etc). Anyone know of one?

0Likes Log in to Reply
Post Author

tekkk

Posted February 10, 2025 at 3:08 pm

This would be really nice as browser library. Could just dragn drop a file and see its insides. But impressive nonetheless.

0Likes Log in to Reply
Post Author

kevmo314

Posted February 10, 2025 at 3:14 pm

Is the UI tooling that does the visualization a library? I really like the UI format, would love to use this for breaking down and debugging video byte streams too.

EDIT: Oh it's actually reasonably simple, great use of CSS! https://github.com/desgeeko/pdfsyntax/blob/main/docs/simple_…

0Likes Log in to Reply
Post Author

LegionMammal978

Posted February 10, 2025 at 3:33 pm

If you're interested in manipulating PDFs, I've found QPDF [0] to be a useful tool. Its "QDF mode" lays out the objects in a form where you can directly edit them, and it can automatically fix up the xref table afterwards. It can also convert to and from a JSON format that you can manipulate with your own scripts.

[0] https://github.com/qpdf/qpdf, https://qpdf.readthedocs.io/en/stable/

0Likes Log in to Reply

Show HN: HTML visualization of a PDF file’s internal structure by desgeeko

Show HN: HTML visualization of a PDF file’s internal structure by desgeeko

Share This Article

Newsletter

HackTech

11 Comments

Muromec

SSLy

xeon06

est

nonrandomstring

swsieber

tyilo

escapecharacter

tekkk

kevmo314

LegionMammal978

Leave a comment Cancel reply

Editor's Choice

Show HN: HTML visualization of a PDF file’s internal structure by desgeeko

Show HN: HTML visualization of a PDF file’s internal structure by desgeeko

Share This Article

Newsletter

11 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter