Skip to content

Top Best Ask Show New Jobs

PDF is not portable in the digital world (opens in new tab)

(rz.scale-it.pl)

27 pointsrobert-zaremba8y ago47 comments

47 comments

32 comments · 13 top-level

massar8y ago· 4 in thread

Rendering a Microsoft Word/Powerpoint document as a PDF is a good thing as then one does not need a Doc/Docx/PPT/pptx viewer anymore while most devices come with a PDF viewer builtin (eg. Chrome :) (and as a bonus it kills the anims if they are there) this while keeping the formatting intact (some minor color changes though depending on one export it).

I tend to keep a whole bunch of things I want to 'read later' in my iBooks collection, just save as PDF and transfer to phone, or if already a PDF just download it directly; zooms great too. I got all kinds of device manuals, but also PADI and other diving reference books; always good to quickly check up on it when in doubt and then to reinforce that information with the knowledge of your dive buddy.

Indeed, for content that does not really need a layout outside of some headers (<h1>) and paragraphs (<p>) HTML is perfectly fine.

Quite a few text portions of conference papers (read: Tex :) can be rendered as markdown and then easily converted to HTML, but it won't feel 'as well', thus PDF is a easier format that also reflects the original intent and format.

IETF RFCs typically can be rendered in a myriad of ways thanks to xml2rfc, then again, one mostly will end up reading them from tools.ietf.org or to keep local, render as PDF and load it into iBooks.

discreditable8y ago

On the topic of MS Word, I really wish their export to html function was more simplistic. Most of the time I just want to export the document structure (headings, tables, bold, italic, etc.) and not include all of the styles and extra markup.

ferdterguson8y ago

Pandoc?

robert-zarembaOP8y ago

Reading PDF on mobile is possible, but far less comfortable than EPUB / MOBI...

noir_lord8y ago

I would have agreed until I got an ipad mini 2 (for safari testing), screen resolution is high enough that it's close to print experience.

guidoism8y ago· 3 in thread

A few weeks ago I would have made the same statement but I started reading the PDF implementation docs and now I really like the format.

The main issue I think we all have with the format is that people make docs that are almost impossible to read on a small screen.

There are ways around this: 1. Tagged PDFs present the underlying content and semantics in order to reflow for accessibility purposes though right now very few people seems to use this feature and 2. Maybe it wouldn't be a bad thing to make PDF pages closer to a paperback book rather than an A4 page with the resulting shorter line length and reduced margins.

PDF is indeed more complex than plain HTML with some cribbed CSS but in many ways it's a lot better: 1. It truly in portable in the sense that every computer will render it in exactly the same way, 2. It packages up all assets in an efficient manner (only the glyphs that are needed are included, not the entire font with all glyphs and position hints like web fonts), 3. The expensive layout computation is done once, on a computer in a galaxy far far away from my battery limited phone, and 4. PDFs are (by convention) free from all of the cacophony of crap like share buttons and navigation chrome and ads and articles-you-may-enjoy fluff.

The format itself is actually not that bad, it's a text format in that it's relatively easy to open up a text editor and bang one out. The only inconveniences are the places where you need to state exactly how long strings are (which your text editor can help with) and the creation of the index at the end (which I've been cheating by just running my hand created PDFs through a PDF lint-like utility.

The reason why most PDFs look crazy when opened up in a text editor is that the streams are almost always compressed. You can uncompress with them "qpdf --stream-data=uncompress in.pdf out.pdf"

I think the format itself could be ok, but not in its current format.

The PDF header can be anywhere in the document. This makes parsing for bad content harder. (Also, you can have a valid pdf that is also a valid zip with the contents being the original file. How do you virus scan this?)

To many old image formats allowed, with readers only supporting a subset. Tiff alone has a bunch of options that are often broken. What happens when you put a multipage tiff in a pdf? (I think you just see the first page in Reader but some other reader might allow you browse them)

Lots of features in later versions that are not well supported. (forms, document libraries, scripting?)

It has been while since I left the document area but while I liked simple PDFs, once you say you support them, you have to support all of them which is almost impossible to do correctly. The later specs just have too many features that are almost unused but add a lot of complexity that really isn't needed in a portable document format. A stripped down/cleaned up version of the spec would be nice.

The share buttons and navigation chrome certainly can be put in a PDF, they're just incredibly uncommon. I've even played video games that were distributed as PDF files.

mercer8y ago

Pdf games? Do you have an example you could send me, perhaps? I'm really curious.

emeraldd8y ago· 3 in thread

My first thought on this is that EPUB is not fully portable either ... it's just non-portable in a different set of circumstances. If you want to publish on the internet, for general consumption, just use plain html. That's about as portable as you can get without moving into raw text.

ldjb8y ago

Plain HTML isn't so good if you want to include images in your document or have multiple pages. You end up making the user download a whole bunch of files if they want a copy of the document.

The great thing about EPUB is that all the files are bundled together in a single .epub file. You can copy the file and move it around without worrying about keeping the structure intact.

I think that perhaps the main issue currently present with EPUB is that EPUB readers aren't really part of the standard installation on devices. Pretty much every PC or smartphone you come across in the wild will have software installed to view PDFs, but the same isn't the case for EPUB. I think Apple have done a good thing by including an EPUB reader (iBooks) as part of macOS and iOS, but that's not the case for other operating systems.

It might be nice if web browsers could natively act as EPUB viewers, in the same way a number of them natively act as PDF viewers. That way, the user already has an EPUB viewer installed, and they don't have to go and find and install one.

mercer8y ago

Now that I think about it I'm kind of surprised that browser don't natively show EPUB files.

Spivak8y ago

> Plain HTML isn't so good if you want to include images in your document or have multiple pages.

Am I missing something? This is practically the only thing that vanilla HTML is good at?

It seems like you could get all the benefits of EPUB with an archive of HTML files.

rubidium8y ago· 2 in thread

PDF is beautiful. Long live PDF.

I read a lot. I don't always want to read from a screen.

Making PDF's available on the internet just saves me from having to search through a journal stack.

PDF's have the advantange of the formatting looks good, and the author/publisher gets to choose how it looks. Usually with input from a professional. This is much better than the "styling" many websites and epubs provide.

robert-zarembaOP8y ago

that's the thing. 1) If you use simple HTML / EPUB majority of publications are good for printing. 2) Think about a relief of storing this documents on an ebook reader

> majority of publications are good for printing

This is a problem. PDF supports "all". Epub would have to have equivalent support, as well as native support in most/all vanilla OS distributions to be a real competitor for PDF.

> Think about a relief of storing this documents on an ebook reader

For many people this is their phone or tablet. Both of which support PDFs as well (as do many standalone ebook readers). You also don't have to get an extra app to view the PDFs.

psion8y ago· 2 in thread

I find PDF to be way more portable than most other document formats in terms of saving a document or for printing a document. Saving an HTML page has it's own set of problems, and if I share it I have to make sure to get all the images gathered as well. Word processor documents depend on system fonts, etc., and I cannot be sure that what my document depends on is installed on the other computer. With a PDF, I can be sure to get the necessary elements, be them font, images, etc.

robert-zarembaOP8y ago

That's where EPUB / MOBI comes for.

logfromblammo8y ago

EPUB is essentially an entire self-contained HTML web site in a ZIP file container.

Whatever you can do on a website, you can theoretically do in an EPUB. It might not display as expected when rendered by an e-reader or printed, however, which is why most EPUB files stay rather safe and unambitious in their CSS and JS.

I'm a bit disappointed that web browsers don't generally function as EPUB readers or include a "Save As... EPUB" option, but I can't seem to muster the motivation to write a Firefox add-on to do that. It wouldn't even be that difficult, as I have created EPUBs from filesystem directories using nothing more than 7Zip and a shell script.

There is a possibility that law firms would pay for a premium version that included a crawler and some form of cryptographic validation that could prove that the EPUB file was created at a certain time, from a certain IP address, and hasn't been altered since then. The idea being that trademark owner's lawyer takes a snapshot of a website selling knockoffs when sending the C&D letter, then another when filing the lawsuit, and the evidence for the complaint is preserved without having to rely on static snapshot images of the rendered website or third-party archive sites.

icebraining8y ago· 2 in thread

Most devices include a PDF reader, but not an EPUB reader. Yes, you can download one, but as a publisher, you can't expect your readers to jump through that hoop.

robert-zarembaOP8y ago

EPUB / MOBI reader is not a problem this days

That really doesn't speak to his statement. The majority of devices read PDF by default where EPUB/MOBI, the user needs to go get an app.

robert-zarembaOP8y ago· 2 in thread

Do you publish on the Internet? Do you read a lot of publications on your digital devices?

How about stopping using PDF for internet publications and using EPUB instead (or some other screen independent format)? Please, share your comments.

throwaway2016a8y ago

Trying to be helpful...

I think you may be getting down voted because it is generally considered bad form to submit your own blog unless it is a "Show HN" but if the content is good it can out weight that and an article can be up voted anyway. But if you do submit your own content it is probably best to let the content speak for itself vs trying to solicit HN as a discussion forum.

robert-zarembaOP8y ago

Thanks for a comment. What's the reason for not posting own article to content discovery services / agregators? It's a common thing.

thinkMOAR8y ago· 1 in thread

Instead of calling for a 'ban' on a format by very much subjective reasons, how about calling for publication in multiple formats, so the people have a choice? It is certainly not much more work, and it looks in my humble opinion, professional.

robert-zarembaOP8y ago

Good point! Sorry if my Call sounds repulsive. Your idea with creating publications in multiple format works. I will update my post for that. Though, in my post, I want to highlight that usually simple solutions works fine. Most of this publications are not complex in terms of typesetting. If there is an objective for complex typesetting than fair enough.

omgtehlion8y ago

Finally, in 2017, when PDF became abundant, does not require additional drivers to use it, has somewhat usable spec, and a lot of 3rd party and open source software to work with do we really need to get rid of it?

Bashing PDF is so 2000...

The point of the story is as follows:

> PDF is not portable on digital screens. It doesn’t scale. It’s not comfortable to read PDF files on a mobile or ebook readers

Arguing that PDFs are good for printing out, or better than some other format, doesn't actually address the issue ;-)

accordionclown8y ago

1a. .pdf is a fine format.

1b. unless -- as is increasingly the case these days -- you're reading it on a screen smaller than the one for which the .pdf was designed, in which case .pdf is an awful format.

2a. .epub is a fine format.

2b. unless you're reading it in a viewer-app which is wonky, which most of them are. (the inconsistencies of rendering with this so-called "standard" are unbelievably bad, and seem to be getting worse rather than better as time goes on.)

3a. when you try to re-use text by copying it out of a .pdf, you often get some really bad stuff that loses a lot of important styling.

3b. when you try to re-use text by copying it out of an .epub, it's not much better.

4a. the standard line is that an .epub is just "a website packaged into a .zip file", implying that anything you can do on a website can be done in an .epub.

4b. the standard line is a lie. an .epub requires .xhtml rather than .html, and a complex mess of associated files, and most .epub viewer-apps have trouble supporting the full gamut of .css, and also do not allow you to use javascript at all.

conclusion: the state of sharing documents on the web in a way that allows offline use while enabling the convenient re-use of text is a sad state indeed.

dragonwriter8y ago

PDF a perfectly portable in the digital world (the only world in which it has ever existed.)

It's not perfectly optimized for every display (or print page, the two being equivalent) size, resolution, etc., but then neither is any other format that can handle the same range of content, nor will any format ever be until we have AI layout that does as good as professional layout from a single source file for all media sizes and properties.

I find professionally laid out PDFs that are designed for letter/A4 size pages to superior in practical use to any reflowable format I've yet seen at pretty much every size for most content more complex than plain linear text like you'd find in a novel. (Smartphone and smaller devices aren't great for it, but then they aren't great for reading content more complex than linear text regardless of format.)

unsignedint8y ago

I'm understanding hard time understand some of the point this article makes; particularly the claim about that you need to think about typesetting and design more. Most of word processors these day have some type of style system that akin to HTML.

PDF is also one of few formats that is readily available that has a well defined archival spec (PDF/A) which further makes it more compatible across the readers. (As essentially it requires documents follow certain specs.)

j / k navigate · click thread line to collapse