There are other subtle defects, which make these PDFs pretty good, but not high quality.
Here is a brief discussion of some of the shortcomings of web typography, and why we still need to use TeX if we want the most beautiful and easiest to read results:
https://lwn.net/Articles/662053/
All that aside, this is impressive and should be useful to many people.
Just printing the <p> tag, with its constraints of text layout on all layers (word, line, paragraph, page) already has a lot of details you need to get right to get naturally readable text flow before adding on all the other complexities of html. For instance, if you have a single line creep onto the next page, but you could also just move the entire paragraph to the next page and subtly adjust spacing on the first page, then that is preferable so that each paragraph resides entirely on one page. This is obviously not always possible or desirable, so it turns into a search problem with many variables that can be dynamically altered in the middle of text flowing.
My understanding of modern CSS engines is both that a) CSS itself lacks the natural primitives to even express constraints you'd find in TeX, and also b) the concerns necessary to solve page layout to this degree fall into the type of search problem that browsers tend to try to avoid when rendering.
Of course, there's an argument to be made that if people don't realize it's missing, maybe it wasn't terribly valuable to begin with. I'd imagine for most home uses it's not very useful, but the fact that you can typeset decades old documents at a de-facto professional level, for free, OR with heavily modified engines allowing more modern practices, is really quite amazing. I hope the effort that went into formalizing "readable text" doesn't get lost as people move on from TeX--it'd be great to get some of this capacity in a browser with competing implementations; TeX is a lot to learn for most people, and it's also turing complete, which is IMHO mostly a bad sign for accessibility.
There are also projects which attempt to render HTML to TeX, but they were frankly mostly terrible the last time I looked. I honestly wonder if it's easier these days for javascript to attempt to render the DOM to TeX and just leverage the browser as much as possible, but I'm not familiar enough with the DOM to speculate on how much this is likely to work on unaltered pages. My guess is you only get so much for free before you have to specifically consider that output scenario, just like other types of responsive layout.
I wonder how hard it would be to just compile TeX to WebASM or something.... TeX is _ridiculously_ fast on a modern PC, hundreds of pages per second. All the libraries/macros might be problem, but I bet you could prepare a useful-but-minimal set that would fit in a MB or two gzipped.
https://www.w3schools.com/cssref/css3_pr_word-break.asp
From what I remember LaTeX has better algorithms, both in how to distribute words between lines, and in knowing where in a word it is ok to cut.
As a fun trivia, think where to hyphenate the word record. In all forms.
https://github.com/ytiurin/hyphen Franklin M. Liang's hyphenation algorithm, implemented in Javascript.
could be integrated.
Where Prince wins is in its support for CSS @page extensions (having pages with different margins etc.), it looks much more adapted to professional publishing. There are certainly many more advantages related to typography but I don't know them.
Link to Prince:
Real issue is Prince is the only browser that supports full print CSS, none of the major browsers seem to care about better print output anymore.
Interesting random factoid: Prince is (was?) written in a language called Mercury [1], which is kind of a statically typed Prolog. Research into Prince turned me on to state-of-the-art logic programming, so I'm thankful for that as well. :)
Also I feel like the biggest gripe with generating (long) PDFs from HTML are things such as page numbering, orphans and widows, semantically correct word-wrapping, page margins, etc...
Chrome does a decent job but is nowhere close to what LaTeX can do.
https://github.com/RelaxedJS/ReLaXed/wiki/ReLaXed-vs-other-s...
It is open to contributions, so any thoughts welcome. In a nutshell, all your points are valid. Chrome is one of the best browsers, but still behind LaTeX in some aspects. But which will evolve faster in the future ?
Are you comparing Chrome and LaTeX? Chrome is certainly evolving faster overall, but features related to PDF printing have not changed much, or any at all.
https://developer.mozilla.org/en-US/docs/Web/CSS/Paged_Media
a blog about this issue: http://www.pagedmedia.org
The coolest project I've seen with it is OMA (Rem Koolhaas' architecture firm), which uses it to print internal, very professional-looking booklets automatically generated from data, text and photos stored in Sanity [2]. (The Sanity team also built the system to make the booklets.)
[2] https://www.sanity.io/docs/introduction/what-the-headless
It is probably nobrainer if you are generating pdfs all the time but i would have to use it on multiple projects to make it financially possible. Funnily enough right now i am working on archive for architecture company. But thats like 100 pdfs.
https://i.imgur.com/tMkMjNV.png
In the image, ConTeXt generates PDFs. The EA box represents HTML documentation exported from Enterprise Architect, but could be any structured document that pandoc can parse. The source repository contains various themes for the final PDF.
Using ConTeXt offers several compelling features, such as: citations, cross-references, and ability to produce EBPUBs.
This uses a full browser rendering engine that supports modern html5/css3/js by ultimately running a headless browser.
I suspect pandoc is still a great approach for a lot of cases. Running a headless browser isn't cheap, especially at scale. If your output is a simple book or an invoice, pandoc is probably the way to go. If you want to pdf websites or dump an html file with charts into a pdf, use this.
https://github.com/GoogleChrome/puppeteer/blob/v1.3.0/docs/a...
Also, ReLaXed supports Markdown-it, which in turn has plug-ins for footnotes and citations, for instance. Not sure what you mean by auto-reference, but that should be possible, like in any other HTML page, wouldn't it ?
https://github.com/GoogleChrome/puppeteer
(I work for Chrome DevTools team, creators of Puppeteer)
Been pretty interesting seeing webtech handle these kinds of problems
What really upsets me... the typography still looks shit compared to LaTeX... MS Word / LibreOffice can do better. Would rather stick with plaintext again.
Five or six years or so ago I used reportlab (in Python) to generate some PDF reports (using the flowables API); it does kinda work but layout is more complicated than in tex and output quality is several notches down.
As far as appearance goes, you can make tex look like almost anything, even with fairly low effort.
They may know that, but it is irrelevant. English lacks the phoneme /x/ and the usual substitution in assimilating foreign words (or the letter-play that Donald Knuth started with TeX) is the closest unvoiced velar that English has: /k/. See how most English speakers pronounce the name of J. S. Bach as [bak], with only a small number of pedants saying [bax]. Or, outside of Scotland, Loch Ness is usually [lɔk], not [lɔx].
Currently I deliver ~2 PDF reports per week using Ulysses or MacDown for content creation (distraction-free writing), and then typesetting everything into InDesign.
Thank you for creating this tool, I will try it next week.
The ability to render Markdown to Pug as an "Import Markdown" feature would be key for many people to adopt this.
I am also a big markdown user and I have found that for writing reports all day long markdown clearly wins over Pug, in particular with tools like
https://atom.io/packages/markdown-preview-enhanced
But the day where you need to produce a super-nice report with a bit of custom layout, Pug/SCSS is awesome.
This would be useful for some presentations I think.
Maybe you would be interested by this project to make slideshows with Pug/SCSS/Vue.JS. There you can make plenty of animations:
[1]: https://github.com/mynane/PDF/blob/master/Docker%20——%20从入门到...
This being said, the primary goal of this library is to enable to make documents with complex or fancy layouts. Epubs generally have a simple structure (chapter/section/paragraph) and can be written using for instance Markdown:
See some basic use in ReLaXed in the "paper" or "slideshow" examples, or here for a basic documentation:
https://github.com/RelaxedJS/ReLaXed/wiki/Features#equations
So the beginnings of an alternative looks great!
This is an implementation of the line breaking algorithm used in TeX in Javascript. It would be nice to add to obtain better typographic results with justified text.
This pug language seems to be a good alternative to intermixed markdown+html.
I'm in the process of launching BreezyPDF.com which can generate equally as wonderful PDFs from the HTML/JS/CSS you're already using.
Here's a demo of turning a complex dashboard into a PDF: https://ruby.demo.breezypdf.com