Billions of PDFs are generated daily: invoices, contracts, receipts, reports, you name it. Developer time gets wasted producing these basic documents because there are no good-enough tools to design and generate PDFs.
We previously worked at giant firms, where documents (especially PDFs) were central to most workflows. We got asked to generate automated trade confirmations for our customer’s counterparties. We could not find any tool other than outdated libraries offering poor control over layout and the generation process. In the end, we just created our own—basically bringing web technologies to PDFs. That was the genesis of Onedoc.
PDF creation has two phases: design (specifying content and layout) and generation (producing the actual PDF file). Onedoc lets you do both simply and automatically.
Design: we have an open-source library called "react-print-pdf" (https://github.com/OnedocLabs/react-print-pdf ) that allows you to design a document the same way you would design a website. It supports Tailwind CSS components, Chakra UI components, and recently also built LaTeX and Markdown components. The latter let you write text in Markdown style, and include formulas using LaTeX syntax, directly within a React component.
Generation: we have an API (https://docs.onedoclabs.com/api-reference/introduction ) and Node.js SDK (https://docs.onedoclabs.com/quickstart/nodejs ) that render your designs into PDFs.
The choice of renderer significantly affects the accuracy of the resulting PDF. For example, exporting a webpage into PDF will often result in a layout that differs from the original webpage. We ensure that what you designed is what you get, and therefore you have 100% control over the entire layout of your document including margin, style, etc. We can do that because we built the react-print-pdf library to match the HTML/CSS to PDF rendering tool we have.
Once you have generated your document, you can either store it on your local system or, if you want, use our platform (https://app.onedoclabs.com/ ) to host your document online. If you use us, you’ll also get analytics over your documents.
Our main product is an API, but you can try it on our website directly (https://www.onedoclabs.com/) using our playground without any installation or sign-up. Our pricing is usage-based: per document generated. The pricing is degressive: the more documents you generate, the less you pay per document. If you don’t want to pay for PDF generation, you can still generate as many documents as you want, but with a watermark on the margin.
It’s been fun to see what our users are building with our open-source library (components, templates, etc.) and our API. We have a website (https://react-print.onedoclabs.com/) dedicated to the open-source library where we post the templates submitted by the community. Some early power users built simple web apps (CV/Resume generator, NDA and Invoice generator). We are excited to show our product to the HN community and look forward to your feedback!
That forms a solid foundation that I find it hard to imagine paying for. The things where you might still command a premium are basically safety mechanisms/CI checks/library components that ensure the PDF renders correctly in the presence of variable-length content, etc. as well as maybe PDF-specific features like metadata and fillable forms. Naive ways to format headers, footers, tables/grids/flexboxes etc. often fail in PDFs because of unexpected layout complications. So having a methodology, process, and validation system for ensuring that a mission critical piece of information appears on a PDF in the presence of these constraints could be attractive.
In fact their open source library, https://github.com/OnedocLabs/react-print-pdf, seems like a higher-level library that sits above react-pdf. Reminds me a lot of the set of react-pdf based components I built for a corporate job where letting users create PDFs was a huge part of the value proposition.
They're solving a really cool problem, actually, because building out into certain difficult use cases like SVG support was a huge pain.
Your second point is very interesting, seems like some kind of .assert('text').isVisible() API. We may want to dig into that further!
Cool project btw, congrats for the launch!
Overall you're right that color correction is another area where you could probably command a premium.
If you'd rather do it for free weasyprint[2] is the best open source alternative.
Another more affordable option you might want to consider is Urlbox[3]. (Disclosure: I work on this)
Urlbox's rendering engine is based on Chrome. It's been refined over the last 11 years to render pages as images or PDFs[4] that look great. I was a customer for 5 years before I joined the team. Everything we'd tried before Urlbox was a disappointment.
Urlbox probably can't match the power of either Onedoc or DocRaptor, but pricing starts at less than $0.01 per document and drops significantly with scale. If your PDF looks great when saving as PDF in Chrome it should look identically brilliant with Urlbox.
[1]: https://docraptor.com [2]: https://weasyprint.org [3]: https://urlbox.com [4]: https://urlbox.com/html-to-pdf
Edits and corrections on generated PDFs is not provided as the PDFs are signed as-is, however you can attach the metadata to the PDF and rerender with the modifications.
Their PDF conversion is pretty good (I use it for PPT/Word -> PDF conversion), though your product is obviously different and has different/better capabilities for programmatic PDF creation. Still, a reference point.
Pricing page: https://www.convertapi.com/prices
Do not misunderstand. A Stripe for generating PDFs can be great, but for a small team, $0.50/PDF is way more than I can afford (after all, you can create a small number of PDFs without too much fuss). Maybe you are oriented towards large companies?
But isn't that 100x what they're actually charging--at least for an enterprise account? Their pricing page says "from $0.005/doc." (Though I'm not sure how much work "from" is doing there.) Pro tier is, admittedly, more like $0.12 per document (assuming you use your full quota). But still much less than $0.50/
I'm generally very confused by the various assertions in this thread about their pricing. What am I missing?
You can choose which API to use: Headless Chrome, Wkhtmltopdf, Libreoffice, etc.
Using html to pdf solutions allow to do the templating in html, where it is pretty much a solved issue.
And as many said, headless chrome is a robust html to pdf solution, even though it feel like a hack.
But, yeah, there seems to be a lack of awareness about these options within corporations. So, kudos to you for addressing a genuine problem!
In C# I'd look to use the Playwright library or perhaps even embed chromium via CerSharp if I were trying to avoid extra processes.
I've also heard of one paid API that I forgot but seemed to work well, and this related service https://www.jotform.com/, and I also considered porting some server-side libraries to WASM. One day I'll collect all the libraries and findings in a blog post.
Are you looking to programmatically fill any PDF form by detecting the fields? Or are you filling one known PDF template?
It is a paid commercial product however.
The way we look at it is PDFs allows embedding of other files and metadata. It is easy to provide a platform where we can enrich PDFs to display different contents than the one in the PDF itself. If this gets interesting enough, we can then phase out the PDF in the first place. But this is a long way ahead.
EDI is the only place where people are regularly still paying for message by the kilobyte, where unsecured FTP over the open internet is still a norm, and where entire cottage industries exist to support AVOIDING using EDI.
Source: I work in EDI. it's a pain in the rump.
Also, EDI is really only good for things like PO's, shipping notices, invoices, sales orders, etc.
Given that the whole world has been running on PDFs for decades it's makes more sense to leverage the existing infrastructure and move it towards something more functional over time. Introducing a new format will just lead to another format the achieves 0.5% marketshare and then is abandoned after a few years. Microsoft basically forcing people to use XPS in windows (>70% market share of computing) still wasn't able to achieve meaningful usage or change.
I expect that PDFs will not go away for 20 years at least, but who knows
There was a time, when not every software had "export to pdf". So, having a "print to pdf" meant installing (often pirated) Adobe Acrobat or installing a sketchy free(ware) printdriver software downloaded from sourceforge.
MS adding xps print driver to windows enabled sharing docs consistently (within windows ecosystem) without resorting to hacks.
I don't know why it didn't catch up. May be it was the general mistrust of anything MS, it arrived too late or it was something else.
1. HTML-to-PDF: The web has a great layout system that works well for dynamic content. So using that seems like a good idea. BUT it is not very efficient as a lot of these libraries simply spin up a headless browser or deal with virtual doms.
2. PDF Libraries (like jsPDF): They mostly just have methods like ".text(x, y, string) which is an absolute pain to work with when building dynamic content or creating complex layouts.
This was such a pain point in various projects I worked on that I built my own library that has a component system to build dynamic layouts (like tables over multiple pages) and then computes that down to simple jsPDF commands. Giving you the best of both worlds.
Hope this makes somebody's life a bit easier: https://github.com/DevLeoko/painless-pdf
Going all the way down to raw HTML is a bit verbose, but with almost anything I've thrown at it - CV's, business cards, you name it - it hasn't let me down yet.
We ended up writing a similar wrapper around https://github.com/jung-kurt/gofpdf library. We haven't open sourced it yet. But it's made it a lot easier to deal with rendering a PDF, especially over pagebreaks ect.
If you want granular control over how your PDF will look with content that is more than one page long, you will have a hard time using html.
a. If this is a strategic value for my pipeline (and it is), we are going to code it ourselves, only because we can host it inside our fences. Critical customer data and hence.
b. The pricing is way off and is not reflective of the cost or value (for us). Even if it was 1/10th of the prices you charge, it will still be a no-go. At the volumes we have, it makes sense to build this ourselves.
c. SOC2 / ISO27001 - You might want to obtain them asap if you are looking to sell to outsourcing companies or FSG.
"C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --print-to-pdf=C:\temp\foo.pdf --no-margins --print-to-pdf-no-header C:\temp\test.mhtmlAnd yes, as a big fan of LaTeX myself (I used to do all my research reports on overleaf), we wanted to be able to integrate formulas, code and more into your document very simply. Glad you like it !
It might seem odd, given our long history with PDFs, but I believe there's still much to be done with these documents. They're everywhere—invoices, tickets, reports, etc.—yet the technology for generating and managing them hasn't evolved much in years. Our approach is to apply the same modern technologies used for web design to document design.
If you can nail accessible PDFs then you'd open up a very big government market.
Of course, you can't guarantee that the resulting document is 100% compliant because you can't enforce that the input is valid, but are you at least outputting a complete tag tree with as much semantics as possible given the input?
I agree that HTML -> PDF can be a really powerful tool. I worked on the UK government's tool to generate energy efficiency labels for consumer goods [0] and we ended up doing PDF generation with SVG templates, using Open HTML to PDF for the conversion. That ended up working very well, though as you allude to there can be some gotchas (eg unsupported CSS features) that you need to work around.
A few questions:
- Do the rendered documents support PDF's various accessibility features?
- How suitable is this for print PDF generation? For example, what version of the PDF spec do you target? What's your colour profile support like? Do you support the different PDF page boxes (MediaBox, CropBox, BleedBox, TrimBox, ArtBox)?
[0] https://github.com/UKGovernmentBEIS/energy-label-service
- We do not force PDF/* profiles down to the user, but it seems that for most of them PDF/UA-1 would be a sensible default. We can extract most of the tags from the HTML semantics by themselves which makes it much easier.
- We target the PDF 1.7 spec. Color profiles can be changed and you can use a custom .icc profile, with the corresponding embedding restrictions based on the document format. MediaBox is supported through the @page size property. Bleed, trim and marks can be added using vendor specific css properties. We don't support ArtBox yet but this is something we can look into! So far none of our customers really wanted to take this out to a real print shop, but we would be glad to help people go down this route :)
For those who don't know, if you use Chromium's print-to-pdf feature you get a tagged PDF. And it's scriptable from the command-line too.
Most HTML-to-PDF are deeply insecure and I am more than happy to pay someone else to deal with isolation and security. Report generators are often used to leak cloud secrets via the metadata API.
To give you an idea, this is the kind of PDF files we generate that way: https://assets.walterliving.com/documents/walter-charlotte-d...
[1]: https://developer.mozilla.org/en-US/docs/Web/CSS/@page/size [2]: https://developer.mozilla.org/en-US/docs/Web/CSS/orphans
1. <https://github.com/OnedocLabs/dev-local>
2. <https://marketplace.visualstudio.com/items?itemName=mathemat...>
How long does it take to render using your API? :)
This may not mean success, it means that game is not over in the documents field :)
ps.: As someone with very minimal PDF needs personally and at work, I'd say the beautiful templates are what caught my attention the most.
Though personally I wish stuff like ConTeXt was more popular and approachable - to my humble knowledge their Lua backend seems to have huge potential, I am doing my invoices with ConTeXt/Lua.
We like LaTeX, but even for advanced users laying things out can be a difficult thing. Given that documents are a frontend, we wanted to bring the same tools frontend developers already use.
Where things differ is that we don't actually use a browser under the hood. This allows a much better control over typesetting and layout - and you can do it on the server. We have also more controls over the outputted PDF and the ability to use more advanced features such as form fields or embedding other files and metadata in the PDF.
One of the features I wish I had with htmldocs was the ability to automatically store generated documents in my own S3. I'd rather not introduce another cloud to my data stack just to host PDFs.
In the end, what was the main decisive factor is the support for the PrintCSS and PagedMedia specifications, which have been completely discarded by major vendors and only implemented by specific engines.
Second reaction - the pricing is way over the top and the model is unusual. In your own pitch you talk about the volume of documents created every day. How does that square with per document pricing?
Most of the time when I'm writing HTML I want a set of default styles for the most common elements, It's tedious and error-prone to have to specify a class every single time.
But indeed, calling an API means sending documents contents to Onedoc in a way or another. We aim to provide a self-hosted solution in the future to solve this issue
Am I understanding the docs correctly that you don't have a local library available (the SDKs are just calling the APIs right?)? Mind going through why you chose a remote API?
This may come at a later stage once we have built our own rendering engine though
I think SOC2 is a must to start engaging with companies. Most PDFs will have sensitive data, and not many companies will feel comfortable sending customer data to a 3rd party platform, so you need security measures and certifications.
Good luck!
It seems that these conversion engines are massive pieces of work that require a lot of upkeep, partly because CSS is a living spec but also because of the sheer number of edge cases.
We are already working on SOC2 as this has been a recurring ask, and indeed documents almost always contain PII.
It seems that a better format should exist, but the fact that PDF is the de-facto for portable documents make it unlikely things can change overnight.
Thanks!
We also hope to keep the focus on the PDF generation part rather than expanding super-horizontal style to provide all imaginable PDF tools at the expense that none is really good.
Back in the days, I used to use XSL-FO [0] and it was okay. It was not very precise but it rarely if ever broke, and was perfectly integrated with an XML/XSLT solution. Yeah, this was a long time ago.
Last month I used html-to-pdfmake [1] and it's also not very precise and more fragile, but very efficient and fast.
Yet another approach would be to pro grammatically generate .rtf files (for example) and use Pandoc [2] to produce PDFs (I have not tried this in production but don't see why it wouldn't work).
[0] https://en.wikipedia.org/wiki/XSL_Formatting_Objects
vim filename.txt -c "hardcopy > filename.ps | q" && ps2pdf filename.ps #convert ps to pdf