It kept surprising me how much of a hassle it is to generate a PDF with decent HTML5 rendering from my SaaS apps. I tried several free libs and APIs but ended up with botched rendering a lot of times. So I set out to simplify this chore by creating an AWS hosted HTML to PDF conversion API that's based on Chrome. This API will allow a dev to just send the HTML to our API and get a PDF in response without having to worry about running and managing Chrome somewhere in their infra.
I just finished the first version of my landing page and hosted pdf generation API (https://thePdfApi.com) and would love to have some feedback on the following points:
- is it clear upon viewing the landing page what the product is about?
- are there any questions that you have that are not answered on the page? I'm thinking about adding an FAQ once the first questions pop up.
- would this product provide value to you if your startup needed to generate PDFs? If not, what would you use instead?
Thanks for helping a fellow hacker out.
re: "is it clear upon viewing the landing page what the product is about?" Yes. I think your landing page copy reads well, but could use a bit of polish. Emphasis how well PdfApi solves common margin and background rendering issues compared to other alternatives.
re: "are there any questions that you have that are not answered on the page" I'm not sure you need to answer all of a users questions, but considering you are targeting startups and developers, I'd focus on building out your API documentation.
re: "would this product provide value to you if your startup needed to generate PDFs" Potentially, but considering your audience (developers/startups) I suspect most of us would tackle the issue locally with a Chrome headless setup.
re: "what would you use instead" I've used the following to generate PDF's from HTML: * Chrome Headless * WKHTMLtoPDF
This is good work. I like what you've done. Build out strong API documentation with multiple code snippets and examples to improve your developer marketing.
How is it different/better than using puppeteer? If it's better, maybe SxS comparisons of generated pdfs could be a good selling point.
Puppeteer would indeed come close in rendering quality. Improvements of using my solution over puppeteer are:
1) I tweaked Chrome headless to have the fonts available to ensure that typography renders as it should. Even emojis work!
2) you don't need to worry about installing and maintaining puppeteer and Chrome headless into your own infra
3) I didn't really make this very clear on my landing page so far, but I'll provide support to clients that have issues getting a certain document to render exactly like they want.
4) Not really a benefit yet since I wanted to launch with the MVP but soon I'll offer several options in the API that puppeteer itself doesn't offer such as multi-document PDFs, automated clickable terms of content for longer documents, etc.
Browsers are good at laying things out on the screen. On paper, not so much.
I've looked into JS libraries that will directly generate PDFs you can print but each library seems to come with a lot of caveats.
I didn't bother to check whether the same browser version on different operating systems would produce the same results.
This workflow is a common one, and really frustrating: "Print -> Save as PDF -> choose location on disk/google drive/dropbox-> Save -> switch to email -> compose email -> enter email address -> enter subject -> add attachment -> navigate to saved location (if I can remember it) -> Send".
You could even add a premium feature that would hit a URL on a schedule, to automate report sending to managers (e.g. of Yahoo Ads or any other platform with similarly terrible reporting).
My manager and I at my old place of work used to spend 1-3 hours/month, times however many people had access to his credit card for their subscriptions.
Combining the PDF functionality together with an email function sure is interesting, gonna think this over a bit more. Thanks!
About the PDF results: I get mobile versions of websites a lot, but I guess in normal use cases you won't even request those.
You're right, the most common use case would be that a client sends HTML instead of an URL to the endpoint. This way a PDF can be created of data that's not publicly exposed on the internet (think invoices, contracts, etc.)
I also don't store any of the data you send to the API, as to not further contribute to your GDPR nightmares.
This way your dev team would not have to invest any time at all in the creation of PDF documents. Shoot me a mail at the email address in my profile if you want to know more.
Edit: Holy hell, was just reading some more comments mentioning the price and then had a second look. $79 bucks a month for 10 PDFs. Yeah I think I'm gonna go with spending 10 minutes writing a WebAPI to access my PDF generation API. For reference, it took me two days to land on Puppeteer, a day to configure it how I wanted, and costs me $5 a month to do about 50 PDFs per day (not upper limit, that's just how many we need in a typical business day, I don't know what the box is capable of).
It's very fast! Are you caching at all?
Since people probably want to use this with private data, would they usually be sending HTML strings to you, vs URLs?
Why's the "i" in API lower case?
I'm not caching at all since I do not want to store any potential confidential data on my servers. The main reason that it's fast is that there are several instances running Chrome headless behind a load balancer.
The main use case would indeed be to send HTML to the API instead of an URL. I just didn't add this use case to the landing page API tester, but it's definitely supported.
The i in APi is lowercased because I thought it looked cute :)
I ended up using wktohtml on AWS Lambda. Wktohtml isn’t nearly as nice as Princexml or headless Chrome, but for the most part it gets the job done.
What did you end up using for your stack?
My stack is a tweaked Chrome headless on Linux in a docker container, exposed by a Node API.
You are right that generating PDFs from HTML and Web Technologies is nothing new. Most existing PHP and node libs sadly don't provide great rendering once you have a document that consists of modern CSS and HTML or modern image formats such as SVG.
Puppeteer would indeed provide the same rendering quality but then you're responsible for the maintenance of the running instances. I'm hoping to make my user's life easier by taking this task out of their hands.
I have a similar API product for generating PDFs, we saw that the generation part is easy, it gets crazy when your customers start asking the customizations for their labels, invoices, packing slips, contacts etc.
The stack is Linux on Docker, running a tweaked version of headless Chrome with an API created in Node.
Page breaks are controlled by the CSS properties page-break-after, page-break-before and page-break-inside.
Edit : Emailed your contact@brainhashed address.
In short arbitrary local file inclusion.
This could have a few reasons, the most common one would be if the site you want to generate a PDF for blocks connections from an AWS ip-range.