I tend to keep a whole bunch of things I want to 'read later' in my iBooks collection, just save as PDF and transfer to phone, or if already a PDF just download it directly; zooms great too. I got all kinds of device manuals, but also PADI and other diving reference books; always good to quickly check up on it when in doubt and then to reinforce that information with the knowledge of your dive buddy.
Indeed, for content that does not really need a layout outside of some headers (<h1>) and paragraphs (<p>) HTML is perfectly fine.
Quite a few text portions of conference papers (read: Tex :) can be rendered as markdown and then easily converted to HTML, but it won't feel 'as well', thus PDF is a easier format that also reflects the original intent and format.
IETF RFCs typically can be rendered in a myriad of ways thanks to xml2rfc, then again, one mostly will end up reading them from tools.ietf.org or to keep local, render as PDF and load it into iBooks.
The main issue I think we all have with the format is that people make docs that are almost impossible to read on a small screen.
There are ways around this: 1. Tagged PDFs present the underlying content and semantics in order to reflow for accessibility purposes though right now very few people seems to use this feature and 2. Maybe it wouldn't be a bad thing to make PDF pages closer to a paperback book rather than an A4 page with the resulting shorter line length and reduced margins.
PDF is indeed more complex than plain HTML with some cribbed CSS but in many ways it's a lot better: 1. It truly in portable in the sense that every computer will render it in exactly the same way, 2. It packages up all assets in an efficient manner (only the glyphs that are needed are included, not the entire font with all glyphs and position hints like web fonts), 3. The expensive layout computation is done once, on a computer in a galaxy far far away from my battery limited phone, and 4. PDFs are (by convention) free from all of the cacophony of crap like share buttons and navigation chrome and ads and articles-you-may-enjoy fluff.
The format itself is actually not that bad, it's a text format in that it's relatively easy to open up a text editor and bang one out. The only inconveniences are the places where you need to state exactly how long strings are (which your text editor can help with) and the creation of the index at the end (which I've been cheating by just running my hand created PDFs through a PDF lint-like utility.
The reason why most PDFs look crazy when opened up in a text editor is that the streams are almost always compressed. You can uncompress with them "qpdf --stream-data=uncompress in.pdf out.pdf"
The PDF header can be anywhere in the document. This makes parsing for bad content harder. (Also, you can have a valid pdf that is also a valid zip with the contents being the original file. How do you virus scan this?)
To many old image formats allowed, with readers only supporting a subset. Tiff alone has a bunch of options that are often broken. What happens when you put a multipage tiff in a pdf? (I think you just see the first page in Reader but some other reader might allow you browse them)
Lots of features in later versions that are not well supported. (forms, document libraries, scripting?)
It has been while since I left the document area but while I liked simple PDFs, once you say you support them, you have to support all of them which is almost impossible to do correctly. The later specs just have too many features that are almost unused but add a lot of complexity that really isn't needed in a portable document format. A stripped down/cleaned up version of the spec would be nice.
The great thing about EPUB is that all the files are bundled together in a single .epub file. You can copy the file and move it around without worrying about keeping the structure intact.
I think that perhaps the main issue currently present with EPUB is that EPUB readers aren't really part of the standard installation on devices. Pretty much every PC or smartphone you come across in the wild will have software installed to view PDFs, but the same isn't the case for EPUB. I think Apple have done a good thing by including an EPUB reader (iBooks) as part of macOS and iOS, but that's not the case for other operating systems.
It might be nice if web browsers could natively act as EPUB viewers, in the same way a number of them natively act as PDF viewers. That way, the user already has an EPUB viewer installed, and they don't have to go and find and install one.
Am I missing something? This is practically the only thing that vanilla HTML is good at?
It seems like you could get all the benefits of EPUB with an archive of HTML files.
I read a lot. I don't always want to read from a screen.
Making PDF's available on the internet just saves me from having to search through a journal stack.
PDF's have the advantange of the formatting looks good, and the author/publisher gets to choose how it looks. Usually with input from a professional. This is much better than the "styling" many websites and epubs provide.
This is a problem. PDF supports "all". Epub would have to have equivalent support, as well as native support in most/all vanilla OS distributions to be a real competitor for PDF.
> Think about a relief of storing this documents on an ebook reader
For many people this is their phone or tablet. Both of which support PDFs as well (as do many standalone ebook readers). You also don't have to get an extra app to view the PDFs.
Whatever you can do on a website, you can theoretically do in an EPUB. It might not display as expected when rendered by an e-reader or printed, however, which is why most EPUB files stay rather safe and unambitious in their CSS and JS.
I'm a bit disappointed that web browsers don't generally function as EPUB readers or include a "Save As... EPUB" option, but I can't seem to muster the motivation to write a Firefox add-on to do that. It wouldn't even be that difficult, as I have created EPUBs from filesystem directories using nothing more than 7Zip and a shell script.
There is a possibility that law firms would pay for a premium version that included a crawler and some form of cryptographic validation that could prove that the EPUB file was created at a certain time, from a certain IP address, and hasn't been altered since then. The idea being that trademark owner's lawyer takes a snapshot of a website selling knockoffs when sending the C&D letter, then another when filing the lawsuit, and the evidence for the complaint is preserved without having to rely on static snapshot images of the rendered website or third-party archive sites.
How about stopping using PDF for internet publications and using EPUB instead (or some other screen independent format)? Please, share your comments.
I think you may be getting down voted because it is generally considered bad form to submit your own blog unless it is a "Show HN" but if the content is good it can out weight that and an article can be up voted anyway. But if you do submit your own content it is probably best to let the content speak for itself vs trying to solicit HN as a discussion forum.
Bashing PDF is so 2000...
> PDF is not portable on digital screens. It doesn’t scale. It’s not comfortable to read PDF files on a mobile or ebook readers
Arguing that PDFs are good for printing out, or better than some other format, doesn't actually address the issue ;-)
1b. unless -- as is increasingly the case these days -- you're reading it on a screen smaller than the one for which the .pdf was designed, in which case .pdf is an awful format.
2a. .epub is a fine format.
2b. unless you're reading it in a viewer-app which is wonky, which most of them are. (the inconsistencies of rendering with this so-called "standard" are unbelievably bad, and seem to be getting worse rather than better as time goes on.)
3a. when you try to re-use text by copying it out of a .pdf, you often get some really bad stuff that loses a lot of important styling.
3b. when you try to re-use text by copying it out of an .epub, it's not much better.
4a. the standard line is that an .epub is just "a website packaged into a .zip file", implying that anything you can do on a website can be done in an .epub.
4b. the standard line is a lie. an .epub requires .xhtml rather than .html, and a complex mess of associated files, and most .epub viewer-apps have trouble supporting the full gamut of .css, and also do not allow you to use javascript at all.
conclusion: the state of sharing documents on the web in a way that allows offline use while enabling the convenient re-use of text is a sad state indeed.
It's not perfectly optimized for every display (or print page, the two being equivalent) size, resolution, etc., but then neither is any other format that can handle the same range of content, nor will any format ever be until we have AI layout that does as good as professional layout from a single source file for all media sizes and properties.
I find professionally laid out PDFs that are designed for letter/A4 size pages to superior in practical use to any reflowable format I've yet seen at pretty much every size for most content more complex than plain linear text like you'd find in a novel. (Smartphone and smaller devices aren't great for it, but then they aren't great for reading content more complex than linear text regardless of format.)
PDF is also one of few formats that is readily available that has a well defined archival spec (PDF/A) which further makes it more compatible across the readers. (As essentially it requires documents follow certain specs.)