Launch HN: Onedoc (YC W24) – A better way to create PDFs (opens in new tab)

(github.com)

293 pointsAugusteLef2y ago185 comments

Hey HN, we’re the co-founders of Onedoc (https://www.onedoclabs.com/ ), and the original contributors to the open-source library react-print-pdf (https://github.com/OnedocLabs/react-print-pdf ) which lets developers design and generate PDF documents automatically. Here’s a demo video: https://www.youtube.com/watch?v=MgfCyOyckQU&t=3s

Billions of PDFs are generated daily: invoices, contracts, receipts, reports, you name it. Developer time gets wasted producing these basic documents because there are no good-enough tools to design and generate PDFs.

We previously worked at giant firms, where documents (especially PDFs) were central to most workflows. We got asked to generate automated trade confirmations for our customer’s counterparties. We could not find any tool other than outdated libraries offering poor control over layout and the generation process. In the end, we just created our own—basically bringing web technologies to PDFs. That was the genesis of Onedoc.

PDF creation has two phases: design (specifying content and layout) and generation (producing the actual PDF file). Onedoc lets you do both simply and automatically.

Design: we have an open-source library called "react-print-pdf" (https://github.com/OnedocLabs/react-print-pdf ) that allows you to design a document the same way you would design a website. It supports Tailwind CSS components, Chakra UI components, and recently also built LaTeX and Markdown components. The latter let you write text in Markdown style, and include formulas using LaTeX syntax, directly within a React component.

Generation: we have an API (https://docs.onedoclabs.com/api-reference/introduction ) and Node.js SDK (https://docs.onedoclabs.com/quickstart/nodejs ) that render your designs into PDFs.

The choice of renderer significantly affects the accuracy of the resulting PDF. For example, exporting a webpage into PDF will often result in a layout that differs from the original webpage. We ensure that what you designed is what you get, and therefore you have 100% control over the entire layout of your document including margin, style, etc. We can do that because we built the react-print-pdf library to match the HTML/CSS to PDF rendering tool we have.

Once you have generated your document, you can either store it on your local system or, if you want, use our platform (https://app.onedoclabs.com/ ) to host your document online. If you use us, you’ll also get analytics over your documents.

Our main product is an API, but you can try it on our website directly (https://www.onedoclabs.com/) using our playground without any installation or sign-up. Our pricing is usage-based: per document generated. The pricing is degressive: the more documents you generate, the less you pay per document. If you don’t want to pay for PDF generation, you can still generate as many documents as you want, but with a watermark on the margin.

It’s been fun to see what our users are building with our open-source library (components, templates, etc.) and our API. We have a website (https://react-print.onedoclabs.com/) dedicated to the open-source library where we post the templates submitted by the community. Some early power users built simple web apps (CV/Resume generator, NDA and Invoice generator). We are excited to show our product to the HN community and look forward to your feedback!

Launch HN: Onedoc (YC W24) – A better way to create PDFs

(github.com)

293 pointsAugusteLef2y ago185 comments

PDF creation has two phases: design (specifying content and layout) and generation (producing the actual PDF file). Onedoc lets you do both simply and automatically.

Generation: we have an API (https://docs.onedoclabs.com/api-reference/introduction ) and Node.js SDK (https://docs.onedoclabs.com/quickstart/nodejs ) that render your designs into PDFs.

185 comments

150 comments · 48 top-level

ak2172y ago· 8 in thread

FYI: the open source state of the art in this area is Playwright (the successor to Puppeteer) with Paged.js (https://pagedjs.org/). I highly recommend that everyone check out and donate to paged.js, it's a fantastic project with lots to like. It certainly blows commercial alternatives like Prince XML out of the water.

That forms a solid foundation that I find it hard to imagine paying for. The things where you might still command a premium are basically safety mechanisms/CI checks/library components that ensure the PDF renders correctly in the presence of variable-length content, etc. as well as maybe PDF-specific features like metadata and fillable forms. Naive ways to format headers, footers, tables/grids/flexboxes etc. often fail in PDFs because of unexpected layout complications. So having a methodology, process, and validation system for ensuring that a mission critical piece of information appears on a PDF in the presence of these constraints could be attractive.

caesil2y ago

I think https://github.com/diegomura/react-pdf is closer to what this company is doing.

In fact their open source library, https://github.com/OnedocLabs/react-print-pdf, seems like a higher-level library that sits above react-pdf. Reminds me a lot of the set of react-pdf based components I built for a corporate job where letting users create PDFs was a huge part of the value proposition.

They're solving a really cool problem, actually, because building out into certain difficult use cases like SVG support was a huge pain.

AugusteLefOP2y ago

Exactly. We are aiming at offering a solution to build complex PDF design. Which means having 100% control over the layout (margin, header, footer), the style and also the content. That's why we integrated Tailwind, CharkraUI, Markdown, LaTeX, and also wanted to support SVG etc.

Titou3252y ago

We are currently experimenting with this approach. A good thing about paged.js is that we would be able to provide hot-reload and live preview of files without actually converting to PDF.

Your second point is very interesting, seems like some kind of .assert('text').isVisible() API. We may want to dig into that further!

rudasn2y ago

Or maybe some visual diffing based on expected output, based on the template/layout/theme used, since you'd want to perform this check on every pdf generated in prod (that has real, sensitive data) , not just in CI or testing mode, if you're aiming for critical docs.

Cool project btw, congrats for the launch!

1 more reply

timvdalen2y ago

(How) does it handle CMYK and print PDFs? I see images of printed books created by Paged.js, were these post-processed, or printed using a printer that does a best-effort RGB conversion?

ak2172y ago

I'm not sure - we don't do color correction on our PDFs because we don't have photos in them and color rendering is not mission critical - but paged.js is focused on the concern of layout for print media. I would imagine color rendering can be solved orthogonally to what paged.js does for you, as long as you specify the color data in CSS. I'm pretty sure paged.js will pass it through without messing with it, so you're good if the browser that Playwright/puppeteer is driving supports the correct color profile when emitting the PDF. I honestly don't know if browsers have sufficient support for that when emitting a PDF, though.

Overall you're right that color correction is another area where you could probably command a premium.

1 more reply

Mick-Jogger2y ago

Isn't Playwright a testing framework, I am not sure how this solves the use-case that Onedoc is aiming for. I would be highly interested in some more background as we are evaluating alternative solutions to princeXML right now.

ak2172y ago

Playwright at its core is a headless browser driver. In this case, we are using it to tell the browser to generate a PDF.

Brajeshwar2y ago· 8 in thread

May be this is just me but this looks extremely costly to me! It will cost $2,500 to generate 50,000 PDFs. Are edits/corrections additional cost?

jot2y ago

It sounds like this is as advanced as DocRaptor[1]. They have what I consider to be the best PDF generation API, giving complete control over the documents you need to create. The pricing is similar.

If you'd rather do it for free weasyprint[2] is the best open source alternative.

Another more affordable option you might want to consider is Urlbox[3]. (Disclosure: I work on this)

Urlbox's rendering engine is based on Chrome. It's been refined over the last 11 years to render pages as images or PDFs[4] that look great. I was a customer for 5 years before I joined the team. Everything we'd tried before Urlbox was a disappointment.

Urlbox probably can't match the power of either Onedoc or DocRaptor, but pricing starts at less than $0.01 per document and drops significantly with scale. If your PDF looks great when saving as PDF in Chrome it should look identically brilliant with Urlbox.

[1]: https://docraptor.com [2]: https://weasyprint.org [3]: https://urlbox.com [4]: https://urlbox.com/html-to-pdf

Titou3252y ago

This is a good point, and we are still trying to figure out how to price things fairly. Depending on the type of PDF, whether it is a simple receipt or a large multi-pages report, associated costs are very different on our side. At this time, we rely on other proprietary software that we are aiming to replace but that incur high costs on our side as well.

Edits and corrections on generated PDFs is not provided as the PDFs are signed as-is, however you can attach the metadata to the PDF and rerender with the modifications.

mediaman2y ago

As a point of reference on pricing, convertAPI charges $0.05 per document conversion at their most expensive tier, and with any level of fixed commitment ($80 - $300 per month) it goes down to $0.016-0.006 per document.

Their PDF conversion is pretty good (I use it for PPT/Word -> PDF conversion), though your product is obviously different and has different/better capabilities for programmatic PDF creation. Still, a reference point.

Pricing page: https://www.convertapi.com/prices

passion__desire2y ago

Edits would be limited to certain pages but may spill over (e.g. tables) so the whole PDF need not be generated. Only edited pages can be inserted back to previously generated PDFs. Could be an optimization to reduce cost.

snadal2y ago

I second this. Maybe I'm missing something in the value proposition, but we already generate PDFs from .docx/.html templates using open source libraries and Docker microservices.

Do not misunderstand. A Stripe for generating PDFs can be great, but for a small team, $0.50/PDF is way more than I can afford (after all, you can create a small number of PDFs without too much fuss). Maybe you are oriented towards large companies?

AugusteLefOP2y ago

Indeed, and as you mentioned, open-source libraries are always an option. It's worth noting that our open-source library assists in document design, allowing freedom in renderer choice. While the open-source library is aimed at individuals, our API targets businesses of any size. Our pricing can be as low as $0.05 per PDF for high-volume or annual commitments. Additionally, we offer cloud hosting for your documents for up to 90 days, and our pricing includes analytics.

pdabbadabba2y ago

> $0.50/PDF is way more than I can afford

But isn't that 100x what they're actually charging--at least for an enterprise account? Their pricing page says "from $0.005/doc." (Though I'm not sure how much work "from" is doing there.) Pro tier is, admittedly, more like $0.12 per document (assuming you use your full quota). But still much less than $0.50/

I'm generally very confused by the various assertions in this thread about their pricing. What am I missing?

adnans2y ago

We use https://www.api2pdf.com/pricing/ and it's priced per bandwidth and usage - ($.001 per mb bandwidth and $0.00019551 per second of computation)

You can choose which API to use: Headless Chrome, Wkhtmltopdf, Libreoffice, etc.

egnehots2y ago· 7 in thread

The main issue is conflating templating and pdf generation.

Using html to pdf solutions allow to do the templating in html, where it is pretty much a solved issue.

And as many said, headless chrome is a robust html to pdf solution, even though it feel like a hack.

But, yeah, there seems to be a lack of awareness about these options within corporations. So, kudos to you for addressing a genuine problem!

pedro1202y ago

Indeed, we aim at bundling this in a way that makes it easy and obvious for enterprises to build their PDFs that way.

yencabulator2y ago

Typst is a typesetting language that makes programmatic layout and processing JSON input pretty darn simple. I make invoices by having a Typst template read line items from a JSON file.

https://github.com/typst/typst

adfaure2y ago

Just spent my Sunday creating my invoice template in typst as well. I enjoyed it, and I could do what I wanted quickly!

1 more reply

plopz2y ago

The problem with chrome is the performance, it is very slow and uses a bunch of memory. There was a neat post here awhile ago about generating pdfs faster https://news.ycombinator.com/item?id=39379690

AugusteLefOP2y ago

Indeed, speed is an issue (and it's hard to tackle). Additionally, when using Chrome, what you see is not always what you get. The layout often doesn't match expectations, especially with complex elements. It's ok for simple use cases, but for professional and scalable solutions, you usually need to switch to something else!!

1 more reply

gzapp2y ago

There are also a few good options in a lot of languages for streamlining chromium use.

In C# I'd look to use the Playwright library or perhaps even embed chromium via CerSharp if I were trying to avoid extra processes.

AugusteLefOP2y ago

It seems there isn't a solution that satisfies everyone so far, indeed. With concerns about languages supported, functionalities, security, etc., there's certainly a lot of room for improvement in this space to offer a better solution!

dazh2y ago· 7 in thread

Glad to see people building in the PDF space, which as a format is unfortunately both awful and ubiquitous. Are you planning to build any support for programmatically filling out existing PDF forms? That's a huge pain point our product is facing that doesn't seem easy to solve.

wonger_2y ago

I'm facing that same pain point of programmatic PDF filling. I noodled around in the PDF format and learned it's a bit difficult to deal with fonts and formatting. But I think this client-side library works well enough, as a start: https://pdf-lib.js.org/#fill-form

I've also heard of one paid API that I forgot but seemed to work well, and this related service https://www.jotform.com/, and I also considered porting some server-side libraries to WASM. One day I'll collect all the libraries and findings in a blog post.

Are you looking to programmatically fill any PDF form by detecting the fields? Or are you filling one known PDF template?

kodt2y ago

Years ago I needed to programmatically fill PDFs and used this library to achieve it. Funny it has the same name as what you linked: https://www.pdflib.com/

It is a paid commercial product however.

steveneo2y ago

For programmatically filling, checkout https://platoforms.com. It even provides a API playground, https://www.platoforms.com/docs/advanced/api-playground/, super easy to test the API on your actual PDF form.

nip2y ago

For programmatic filling of PDFs, have a look at DocSpring: https://docspring.com

pedro1202y ago

Yes, our focus is on programmatic interactions with PDFs, form filling is on our roadmap, alongside programmatic digital signature and many more.

dazh2y ago

Amazing, is there anywhere I can follow along to find out when form filling will be available?

1 more reply

azmodeus2y ago

What are you looking for in programmatic pdf filling?

ramon1562y ago· 7 in thread

Can we not have an alternative to PDFs? I get that they're more standardized but why would everyone let adobe have the hammer for a file type that's so important

Titou3252y ago

We quite agree on this - but getting a new alternative out will require a significant critical mass before it can be of any interest. While PDF has its challenges, it remains a light portable format and its security features make it a good fit for binding documents. The ecosystem, although it is dominated by Adobe, also includes other major players and existing integrations.

The way we look at it is PDFs allows embedding of other files and metadata. It is easy to provide a platform where we can enrich PDFs to display different contents than the one in the PDF itself. If this gets interesting enough, we can then phase out the PDF in the first place. But this is a long way ahead.

breadwinner2y ago

PDF is an open format in the sense that you don't need to pay Adobe a license fee for generating PDFs, or for reading and rendering PDFs. The format is fully documented, although the specification is controlled by Adobe.

nradov2y ago

For supply chain workflows the ASC X12 Electronic Data Interchange (EDI) industry standard works much better than PDFs. Unfortunately, despite being around for decades in has only been adopted by forward thinking organizations such as Walmart. Most smaller companies and their vendors still haven't implemented EDI.

https://developer.walmart.com/home/us-edi/

calvinmorrison2y ago

Insanity.

EDI is the only place where people are regularly still paying for message by the kilobyte, where unsecured FTP over the open internet is still a norm, and where entire cottage industries exist to support AVOIDING using EDI.

Source: I work in EDI. it's a pain in the rump.

Also, EDI is really only good for things like PO's, shipping notices, invoices, sales orders, etc.

2 more replies

rapatel02y ago

PDF is an incredibly (stupidly) extensible format. There are tons of government forms that (sadly) bake in complex workflows into PDF forms.

Given that the whole world has been running on PDFs for decades it's makes more sense to leverage the existing infrastructure and move it towards something more functional over time. Introducing a new format will just lead to another format the achieves 0.5% marketshare and then is abandoned after a few years. Microsoft basically forcing people to use XPS in windows (>70% market share of computing) still wasn't able to achieve meaningful usage or change.

I expect that PDFs will not go away for 20 years at least, but who knows

nvr2192y ago

Yeah let's give XPS another go.

devsda2y ago

Giving credit where it's due, I can appreciate Microsoft for introducing XPS as an alternative to pdf.

There was a time, when not every software had "export to pdf". So, having a "print to pdf" meant installing (often pirated) Adobe Acrobat or installing a sketchy free(ware) printdriver software downloaded from sourceforge.

MS adding xps print driver to windows enabled sharing docs consistently (within windows ecosystem) without resorting to hacks.

I don't know why it didn't catch up. May be it was the general mistrust of anything MS, it arrived too late or it was something else.

1 more reply

Leoko2y ago· 5 in thread

I had to deal a lot with PDF generation over the past few years and I was very unhappy with the eco-system that was available:

1. HTML-to-PDF: The web has a great layout system that works well for dynamic content. So using that seems like a good idea. BUT it is not very efficient as a lot of these libraries simply spin up a headless browser or deal with virtual doms.

2. PDF Libraries (like jsPDF): They mostly just have methods like ".text(x, y, string) which is an absolute pain to work with when building dynamic content or creating complex layouts.

This was such a pain point in various projects I worked on that I built my own library that has a component system to build dynamic layouts (like tables over multiple pages) and then computes that down to simple jsPDF commands. Giving you the best of both worlds.

Hope this makes somebody's life a bit easier: https://github.com/DevLeoko/painless-pdf

chrisfinazzo2y ago

Is there a reason you didn't consider something like Weasyprint?

https://weasyprint.org

Going all the way down to raw HTML is a bit verbose, but with almost anything I've thrown at it - CV's, business cards, you name it - it hasn't let me down yet.

epgui2y ago

I just considered weasyprint and couldn't figure out where to put my credit card or where to go to get started or to see some docs, so that was a very short-lived consideration.

1 more reply

Crowberry2y ago

I'm with you..

We ended up writing a similar wrapper around https://github.com/jung-kurt/gofpdf library. We haven't open sourced it yet. But it's made it a lot easier to deal with rendering a PDF, especially over pagebreaks ect.

aforwardslash2y ago

A while ago I created a pdf report generation engine for Python, supporting Jinja2 template syntax, and server and client-side generation of content. Page formatting is handled by https://pagedjs.org/, and PDF generation is performed via a separate api daemon based on chrome-headless: https://zipreport.github.io/zipreport/ It is not fast, but it works quite well.

Leoko2y ago

Yes, page breaks are probably the most significant difference between the layout of a web page and a PDF document, and thereby a major drawback when using HTML-to-PDF. There is little to no tooling for this in the web.

If you want granular control over how your PDF will look with content that is more than one page long, you will have a hard time using html.

2 more replies

somberi2y ago· 4 in thread

Useful service and a large problem space. Congrats and all the best. As someone who is a target customer, my 2 cents:

a. If this is a strategic value for my pipeline (and it is), we are going to code it ourselves, only because we can host it inside our fences. Critical customer data and hence.

b. The pricing is way off and is not reflective of the cost or value (for us). Even if it was 1/10th of the prices you charge, it will still be a no-go. At the volumes we have, it makes sense to build this ourselves.

c. SOC2 / ISO27001 - You might want to obtain them asap if you are looking to sell to outsourcing companies or FSG.

AugusteLefOP2y ago

certifications (SOC2 / ISO27001) and offer an on-premise solution! I see there's already a discussion about pricing, so I'll leave that be. However, would an unlimited volume at a fixed cost (and self-host) be an attractive solution? It could be interesting for very high volumes.

somberi2y ago

I can tell you that the world I operate in will want something like what you are proposing (fixed rate + OnPrem) and the pricing is going to have a ceiling because building this in-house is a real and viable alternative. Our problem is not so much lack of talent but other product-roadmap priorities. What is the ceiling? I do not know, but can hazard a guess. 1/4th of the yearly cost of a good developer.

1 more reply

HatchedLake7212y ago

Curious, with ~$0.005 per document, what volumes do you do that pricing becomes a no-go for you?

somberi2y ago

In the long term, ~$0.005 per page (as opposed to document, which I assume hatchedlake meant) say on a mortgage document (~300 pages per) it adds up. The other alternative, which is to build this in-house (say 3 months and custom build, edge cases, such goodies), is more desirable (for us).

2 more replies

breadwinner2y ago· 4 in thread

How is this better than writing out an HTML file, then using headless chrome to export to PDF, like this:

    "C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --print-to-pdf=C:\temp\foo.pdf --no-margins --print-to-pdf-no-header C:\temp\test.mhtml

Titou3252y ago

This brings its own set of challenges. Headers and footers are strictly limited in terms of features, you cannot add footnotes, the notion of page spreads is harder to implement. Then you need to combine that with having a Chrome instance at hand + exposing the needed assets for URL resolution. Definitely not difficult let alone impossible, but not the easiest way to get started :)

aforwardslash2y ago

Some/most of these problems can be solved by using pagedjs and something like https://github.com/zipreport/zipreport-server

breadwinner2y ago

The easier way costs $0.05 cents per page. Imagine sending an invoice to your customer and the invoice itself costs 5 cents per page! That's prohibitively expensive for many applications. I wouldn't consider any solution that costs more than 1 cent per page.

1 more reply

_puk2y ago

"you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem."

https://news.ycombinator.com/item?id=8863

rahhulk72y ago· 3 in thread

This looks interesting! Especially the Markdown and LaTeX components in react-print-pdf. Could be a great way to streamline technical documentation generation in codebases. Would love to see some examples of those in action.

AugusteLefOP2y ago

Indeed it could be a very interesting use case. While we are more "Selling Shovels" it could be interesting to explore this use case and maybe build a simple demo out of it!

And yes, as a big fan of LaTeX myself (I used to do all my research reports on overleaf), we wanted to be able to integrate formulas, code and more into your document very simply. Glad you like it !

ska2y ago

FWIW I’ve had some good results for technical documentation in RST markdown with sphinx for generation. You can develop latex header details for detailed templating for pdf output, etc. while keeping the html more simple if you want .

AugusteLefOP2y ago

Thanks for the tip, I will take a look at it asap

admissionsguy2y ago· 3 in thread

You cannot make this up, generating PDFs is now an enterprise product.

AugusteLefOP2y ago

Editors such as Overleaf, and those offered by MS and Adobe, have been around for a long time. Recently, companies like Pandadoc and Docusign have started offering services around PDFs (generation or other aspects of their lifecycle).

It might seem odd, given our long history with PDFs, but I believe there's still much to be done with these documents. They're everywhere—invoices, tickets, reports, etc.—yet the technology for generating and managing them hasn't evolved much in years. Our approach is to apply the same modern technologies used for web design to document design.

dmazzoni2y ago

What do you mean "now"? It has been for years. It's a huge business.

admissionsguy2y ago

When I was first hired 15 years ago my first task was to create a PDF report. It was easy back then in PHP+fPDF. Two years ago I was hired to work on a Heroku-hosted NodeJS app. I was surprised to find that generating a PDF turned out to be substantially more difficult task, requiring running a browser emulator or connecting to an external service. And now, seeing PDF generation as a premium pay-as-you-go product is just too much.

3 more replies

midenginedcoupe2y ago· 2 in thread

I've also spent much longer than I'd like on this same problem. Having a lightweight-enough service to convert html->pdf on the fly, with good fidelity, and that can create an accessible pdf seems to be impossible.

If you can nail accessible PDFs then you'd open up a very big government market.

AugusteLefOP2y ago

We felt the same, and that's precisely why we built this tool! The key, as you mentioned, is fidelity, especially for designing complex layouts. We hope to bring something new and valuable to the table. And yes, documents are central to many industries including government, legal, banking etc.

dmazzoni2y ago

Can you directly answer whether your tool generates tagged PDFs?

Of course, you can't guarantee that the resulting document is 100% compliant because you can't enforce that the input is valid, but are you at least outputting a complete tag tree with as much semantics as possible given the input?

2 more replies

matteason2y ago· 2 in thread

Really interesting product. I do agree that the pricing seems steep ($0.25/document on Pro on the most generous tier) but I don't know enough about pricing B2B products to know if that would be a blocker.

I agree that HTML -> PDF can be a really powerful tool. I worked on the UK government's tool to generate energy efficiency labels for consumer goods [0] and we ended up doing PDF generation with SVG templates, using Open HTML to PDF for the conversion. That ended up working very well, though as you allude to there can be some gotchas (eg unsupported CSS features) that you need to work around.

A few questions:

- Do the rendered documents support PDF's various accessibility features?

- How suitable is this for print PDF generation? For example, what version of the PDF spec do you target? What's your colour profile support like? Do you support the different PDF page boxes (MediaBox, CropBox, BleedBox, TrimBox, ArtBox)?

[0] https://github.com/UKGovernmentBEIS/energy-label-service

[1] https://github.com/danfickle/openhtmltopdf

Titou3252y ago

The pricing does go down for larger volumes and is something we still have to narrow down to the exact place that makes sense to companies and is also viable.

- We do not force PDF/* profiles down to the user, but it seems that for most of them PDF/UA-1 would be a sensible default. We can extract most of the tags from the HTML semantics by themselves which makes it much easier.

- We target the PDF 1.7 spec. Color profiles can be changed and you can use a custom .icc profile, with the corresponding embedding restrictions based on the document format. MediaBox is supported through the @page size property. Bleed, trim and marks can be added using vendor specific css properties. We don't support ArtBox yet but this is something we can look into! So far none of our customers really wanted to take this out to a real print shop, but we would be glad to help people go down this route :)

dmazzoni2y ago

So are you saying that you don't output tagged PDFs now?

For those who don't know, if you use Chromium's print-to-pdf feature you get a tagged PDF. And it's scriptable from the command-line too.

1 more reply

Sytten2y ago· 2 in thread

Definitely a problem I experienced. Big fan of browserless.io. Though I didnt see any comment on the biggest problem in this space: SSRF.

Most HTML-to-PDF are deeply insecure and I am more than happy to pay someone else to deal with isolation and security. Report generators are often used to leak cloud secrets via the metadata API.

AugusteLefOP2y ago

True. Security is a significant concern, and in our discussions with businesses, we realised that most of them do not want any kind of data leaving their own systems. This is especially true in the biotech/healthcare industry, but also in legal and banking. That's why we're considering an on-premises solution for the future (as we're focusing on B2B). However, I assume most people were talking about personal use cases or non-sensitive documents, hence the fact that no one mentioned SSRF (yet ;)).

throw031720192y ago

Also big fan of browserless. Do you run it yourself? We run the browserless docker containers on prem.

marceldegraaf2y ago· 2 in thread

We're using Gotenberg[1] to convert a rendered web page (with Elixir/Phoenix, in our case) to PDF. Works like a charm and we can use our existing frontend code/styling (including SVG graph generators) which is a huge bonus.

1: https://gotenberg.dev/

Titou3252y ago

We actually experimented with Gotenberg! Ultimately it is a layer on top of Chromium for conversion and we were dissatisfied with the results. I am curious so as to how are you handling assets and other static media / attachments: do you embed everything in a single HTML file or do you use some kind of bucketing system to resolve URLs?

marceldegraaf2y ago

Great question! We actually just use the static assets (stylesheets, images) from our public asset CDN. The generated HTML points to the latest version of those assets, which means we can always use all the latest styling/assets in our generated PDF files.

To give you an idea, this is the kind of PDF files we generate that way: https://assets.walterliving.com/documents/walter-charlotte-d...

ffpip2y ago· 2 in thread

Love the demo on the homepage with the render button. Really helps explain the product!

AugusteLefOP2y ago

Thanks! We try to make our product as accessible as possible for anyone to use (or at least to test). It's good to hear that our efforts have been worthwhile!

Nathanba2y ago

The text says "Instantly generate dynamic documents based on real-time data." but when I changed the react code to give the QR code a red color and clicked render it took far longer, maybe 5-6sec to render again.

1 more reply

staffors2y ago· 2 in thread

I see that you support page breaks and headers and footers and stuff which is very cool. Is there some form of widow/orphan control when text wraps from one page to the next? How do you handle things like a large table that is longer than the length of a page?

staffors2y ago

Also, do you different paper sizes (A4 and Letter)?

Titou3252y ago

We support the size[1] property and the widows and orphans[2] spec for both your needs :)

[1]: https://developer.mozilla.org/en-US/docs/Web/CSS/@page/size [2]: https://developer.mozilla.org/en-US/docs/Web/CSS/orphans

winter-day2y ago· 2 in thread

Congrats! My career has also revolved around PDF generation (once for federal compliance at large companies, second for scrubbing data from PDFs for HIPAA compliance and then generating a new pdf based on the scrubbed data). I think I've seen your tool around, I ended up creating a workflow that generated LateX scripts then converted them to pdfs, and the second a python library. The most difficult aspect for our tools was formatting - the pdfs were generally 60-100 pages and tables could show up anywhere and break the page/formatting. Quite curious to see how your company will grow, good luck!

DutchHugo2y ago

Curious, which python library did you use to convert to PDFs? currently looking into a couple options myself

stormfather2y ago

weasyprint isn't terrible

cxr2y ago· 2 in thread

Why does your dev-local repo[1] README have a link that's described as being the Adobe PDF viewer extension for VS Code but actually link to an extension that uses pdf.js by a company called Mathematic[2]?

1. <https://github.com/OnedocLabs/dev-local>

2. <https://marketplace.visualstudio.com/items?itemName=mathemat...>

johnsonjo2y ago

Well I'm not entirely sure why they did that, Adobe is the original creators of the PDF format [1] as given by this Wikipedia article on PDFs which might mean they meant something more like a viewer for *Adobe's PDF format* rather than *Adobe's viewer* for PDFs.

[1]: https://en.wikipedia.org/wiki/PDF

AugusteLefOP2y ago

Sorry for the misunderstanding, we indeed "meant something more like a viewer for Adobe's PDF format rather than Adobe's viewer for PDFs.". We will make sure to change the wording. Thank you!

Crowberry2y ago· 2 in thread

This looks really interesting! One of the main reasons we've opted to writing a more complex rending code is for speed. We're getting around 500ms for a single document, which is (last I tested) quicker than any headless chrome setup.

How long does it take to render using your API? :)

pedro1202y ago

Rendering time scales with the length / complexity of the document. At the moment, our self-serve API renders slower than a headless chrome setup. We are working on speeding this up as it is currently in the order of seconds.

Crowberry2y ago

Alright, thanks!

baggy_trough2y ago· 2 in thread

This is definitely a somewhat painful process. I have done it with puppeteer / chromium on Debian, and it works very well after the headache of figuring it out. Having to pay 50 cents per PDF and deal with a 3rd party vendor would not provide value for our needs.

AugusteLefOP2y ago

We've updated our pricing, and it can go as low as $0.005 per document. True, you'll still need to work with a third-party vendor, but isn't it worth considering if the features are competitive and the interface is user-friendly? It would be interesting to know what might convince you to switch from Puppeteer to another solution—or if you're completely satisfied and wouldn't switch regardless of the offerings, which is perfectly fine.

baggy_trough2y ago

For me, it's probably not going to make sense since I already did the work. I think you should pitch it at people that don't want to bother to figure it out (could be several days of engineering time).

cpr2y ago· 2 in thread

So are you using PrinceXML for your "completely separate engine where typography is a first-class citizen"?

Titou3252y ago

Yes, we use an API layer on top of PrinceXML with additional polyfills to support modern features. This is a meh solution but it allowed us to iterate quickly and get to work with customers without building a full blown PDF engine firsthand. However building this engine ourselves is the key to reduced latency and overall better feature support. But we need to engage with our users first and see exactly where we should head first :)

cpr2y ago

Isn't PrinceXML pretty much up to date? What's missing?

gtirloni2y ago· 2 in thread

I wonder what YC expects from such investments (considering the multitude of FOSS solutions in this area).

Titou3252y ago

While this may sound a bit counterintuitive (maybe?) we actually pivoted to this field based on YC input and discussions they have had with their previous companies. The multitude of FOSS solutions in this area indicates this is a real problem people are willing to spend time on, and yet there is no go-to solution and every team we have talked to selected different tools based on a very specific requirement.

This may not mean success, it means that game is not over in the documents field :)

gtirloni2y ago

Thanks for the perspective. Indeed, this is an area with real demand. I haven't evaluated YC's recent startups but I trust they do know a bit about what has a better chance in the market. Best of luck :)

ps.: As someone with very minimal PDF needs personally and at work, I'd say the beautiful templates are what caught my attention the most.

mstijak2y ago· 2 in thread

Congratulations on the launch — it looks fantastic! My company is also developing a similar product. We've chosen to create a visual report designer that enables end-users (non-developers) to create and tweak PDF reports, and integrate with the existing IT infrastructure via the API. Our experience is that users want changes in reports very often and that it's best to allow them do it on their own.

https://www.cx-reports.com

Titou3252y ago

Really like your approach! We tried to keep things tied to code as much as possible rather than dealing with complex interfaces between changing inputs and outputs. Most legal and tech teams we talked to pointed to the fact that CI/CD would quickly become unbearable when decoupling documents and code implementation. What is your approach on that?

mstijak2y ago

We offer comprehensive import/export functionality, ensuring seamless transfer of reports between environments. Moreover, workspaces allow you to segregate test and production environments or create unique environments for each client, allowing easy report customization. While reports are simply JSON files, which could theoretically be stored on the file system and checked in, doing so would hurt the flexibility we're trying to achieve.

axhl2y ago· 2 in thread

neat. are there similar services / libs for generating word docs? this is a recurring problem for many

AugusteLefOP2y ago

We do not work with docx and never did it myself but: "DOCX Template API is a tool that allows you to dynamically generate MS Word documents by replacing custom properties using a JSON object that contains your data." I assume this is more or less similar to what you are looking for (?)

axhl2y ago

Yes - sadly the pricing and docs aren’t the most competitive or intuitive. Hoping someone in the thread has experience with alternatives. Very best of luck with your launch

Gualdrapo2y ago· 1 in thread

It seems TeX/LaTeX is a major inspiration in this, though there can be seen some room for improvement for details like hyphenation, expansion/protusion and microtypography. Not sure if/how a web engine can reach to those points but still it seems this has a potential niche and market outcome, so congrats.

Though personally I wish stuff like ConTeXt was more popular and approachable - to my humble knowledge their Lua backend seems to have huge potential, I am doing my invoices with ConTeXt/Lua.

Titou3252y ago

It definitely is! Typesetting quality was the main reason we chose not to go down the Puppeteer/headless browser route but rather use a completely separate engine where typography is a first-class citizen.

We like LaTeX, but even for advanced users laying things out can be a difficult thing. Given that documents are a frontend, we wanted to bring the same tools frontend developers already use.

kornhucker2y ago· 1 in thread

Super interesting and potentially a fit for a project I'm working on right now. What are the benefits of going this route vs styling your page for print (ex. tailwind print modifier) and relying on the browser's print dialogue?

Titou3252y ago

There is both commonalities and differences! Both approaches rely on web technology to provide the layout and are flexible in terms of frameworks and integrations.

Where things differ is that we don't actually use a browser under the hood. This allows a much better control over typesetting and layout - and you can do it on the server. We have also more controls over the outputted PDF and the ability to use more advanced features such as form fields or embedding other files and metadata in the PDF.

petern812y ago· 1 in thread

This is a good problem to tackle. The hours i've sunk...

AugusteLefOP2y ago

We spent many hours designing and generating PDFs at our previous venture.. terrible experience. Which is why we're now focused on solving this issue!

bbryanj232y ago· 1 in thread

Congrats on the launch. I was a user of htmldocs back in the day, good to see more products in the space.

One of the features I wish I had with htmldocs was the ability to automatically store generated documents in my own S3. I'd rather not introduce another cloud to my data stack just to host PDFs.

AugusteLefOP2y ago

Thanks! We are looking to extend our set of feature and integration, offering self-storing on S3 could definitely be one of them! Good call

fasteddie310032y ago· 1 in thread

Is this just a wrapper around Puppeteer that renders a pdf? I do this currently with an AWS lambda that has a chrome-aws-lambda layer.

Titou3252y ago

We use a dedicated HTML to PDF engine (such as PrinceXML) rather than building on top of a browser. Main issue with browser-backed implementations is that PDFs are often of subpar quality. However, the main good thing is you can rely on the latest CSS features.

In the end, what was the main decisive factor is the support for the PrintCSS and PagedMedia specifications, which have been completely discarded by major vendors and only implemented by specific engines.

travelinmyblood2y ago· 1 in thread

First reaction - congrats guys, this is a problem I have in my own business.

Second reaction - the pricing is way over the top and the model is unusual. In your own pitch you talk about the volume of documents created every day. How does that square with per document pricing?

AugusteLefOP2y ago

Thanks! We're fine-tuning our pricing model and realize we have some work to do in this area hahaha! Indeed, at a certain scale, per-document pricing becomes almost impossible (we're talking about millions of documents generated daily). As noted in another comment, costs vary significantly depending on the PDF type, from simple receipts to large multi-page reports, especially since we currently rely on other proprietary software that incurs high costs. In the future, we aim to offer more than just document generation (like e-signature, analytics, hosting, editor, etc.) and hope to move away from "per document" pricing for high volumes. That said, our open-source library allows anyone to design a document and use their preferred renderer for PDF conversion, with all the pros and cons each solution provides. There are more comments about pricing providing additional information; feel free to dive in if you have any comments or questions

roastedpillows2y ago· 1 in thread

The pricing is a little expensive. Have you heard of https://htmldocs.com/ I've been using them for a year now and it's free

AugusteLefOP2y ago

We've adjusted our pricing based on your comments and advice. It's not 100% free, but it now seems to make more sense to most of you. What do you think?

cratermoon2y ago· 1 in thread

The problem with using Tailwind is that I can't just say <h1>Some Heading</h1>. As noted in the Tailwind documents "All heading elements are completely unstyled by default, and have the same font-size and font-weight as normal text."[1]

Most of the time when I'm writing HTML I want a set of default styles for the most common elements, It's tedious and error-prone to have to specify a class every single time.

1 https://tailwindcss.com/docs/preflight

Titou3252y ago

Makes total sense. There is no real requirement to use Tailwind to create the PDFs, we just have grown accustomed to Tailwind :) If you don't use the <Tailwind> tag, the browser defaults are used to generate the PDF.

jjslocum32y ago· 1 in thread

Does Onedoc retain any visibility into, or in any way use or reserve the right to use any content created using its API in any way? Obviously, calling an API means sending document contents to Onedoc.

AugusteLefOP2y ago

We do not. We are also working on getting SOC2 compliant as soon as possible. More about security here: https://docs.onedoclabs.com/ressources/security (especially how we use temporary buckets). Also, you can chose rather to host you generated documents on our platform or to store it on your local system.

But indeed, calling an API means sending documents contents to Onedoc in a way or another. We aim to provide a self-hosted solution in the future to solve this issue

kwhinnery2y ago· 1 in thread

Looks awesome, will keep this in mind - every so often you need to create complex documents in code, and it's always a pain. Doing it with a familiar modern programming interface would be nice.

AugusteLefOP2y ago

Exactly, that's one of the main reasons we began working on this. We aim to bring the modern web technologies used for website design into the document world. This includes enabling the use of React and, of course, Tailwind, Chakra UI, etc.

anonymouse0082y ago· 1 in thread

Hmm interesting... I just went through this user experience on iOS generating PDF invoices locally. I attempted the HTML > PDF route, but Webkit is thorny wrt to layouts (as you mentioned). I did settle in with drawing everything from the ground up > which with LLMs wasn't as hairy as it used to be, even got a little Swift framework out of the deal.

Am I understanding the docs correctly that you don't have a local library available (the SDKs are just calling the APIs right?)? Mind going through why you chose a remote API?

Titou3252y ago

You are right in the sense we do not provide a local library. We considered the option but would have brought a lot of challenges to accommodate the various runtimes and device capabilities.

This may come at a later stage once we have built our own rendering engine though

Oras2y ago· 1 in thread

This is definitely a huge market. Are you targeting React developers only? I've successfully used html2pdf in the past, but looking again at their Github, it seems there has been no update in the last three years.

I think SOC2 is a must to start engaging with companies. Most PDFs will have sensitive data, and not many companies will feel comfortable sending customer data to a 3rd party platform, so you need security measures and certifications.

Good luck!

Titou3252y ago

We actually take HTML as an input to our API converter. The React tooling is mostly to ease the barrier with most frontend codebases, as well as leverage the existing ecosystem of components.

It seems that these conversion engines are massive pieces of work that require a lot of upkeep, partly because CSS is a living spec but also because of the sheer number of edge cases.

We are already working on SOC2 as this has been a recurring ask, and indeed documents almost always contain PII.

Perz1val2y ago· 1 in thread

If I would want to put an image in footer on every page, would it reuse the same resource? How do you do shadows, spam rectangles or attach an image + mask, or maybe you bake that into the image itself? Many pdf tools are so bad at, the result can be even more than x10 in size and I don't even mean saving entire pages as JPEGs

Titou3252y ago

Elements that are placed in page regions are shared between pages with the exception of CSS generated content such as running headers. Shadows are attached as an XObject image with a SMask indeed :)

canterburry2y ago· 1 in thread

May I ask, why do we still need PDFs? I know they are still popular, I just don't understand why.

Titou3252y ago

There are many reasons behind it, to name a few: files are self-contained(*) and easily portable, can guarantee some security features, the format is easily extended, and the ecosystem is very large.

It seems that a better format should exist, but the fact that PDF is the de-facto for portable documents make it unlikely things can change overnight.

kvakkefly2y ago· 1 in thread

Funny name! The reason I find it funny is I know some people who made Doconce: https://github.com/doconce/doconce :D

esafak2y ago

Are they still developing it after the founder's passing?

jjmaestro2y ago· 1 in thread

Just out of curiosity, as I've seen a few comments also mentioning PrinceXML. Is OneDoc an API, wrapper, etc, on top of PrinceXML? Or is it a completely new rendering engine?

Thanks!

AugusteLefOP2y ago

As of today we are building our solution on top of PrinceXML/DocRaptor which is considered "to be the best PDF generation API, giving complete control over the documents you need to create" (cf. another comment). As we started working on this solution less than 2 months ago, building our own renderer was not an option. But once we have validated the idea, we are definitely going to work on our own renderer to have 100% control over the workflow, and also to be able to offer a better pricing model!

BrandiATMuhkuh2y ago· 1 in thread

Congrats on the launch! What's the main advantage over pspdfkit?

Titou3252y ago

It is similar to pspdfkit. We add an abstraction layer over the HTML and assets hosting to make it easier to use without having to think too hard about security and serving assets.

We also hope to keep the focus on the PDF generation part rather than expanding super-horizontal style to provide all imaginable PDF tools at the expense that none is really good.

airbreather2y ago· 1 in thread

are you doing this with pdfmarks?

AugusteLefOP2y ago

No, we don't currently do that. However, we are considering adding metadata to PDFs, and using pdfmark could be very helpful!

debarshri2y ago

What would be interesting is that if users generate PDFs via your library, embedding a machine format that allows parsing the PDF via your library is made easy, could become a game changer.

acoyfellow2y ago

Reminds me of BrewPDF.com

bambax2y ago

Congrats on the launch, I guess, but there are so many free options that I can't think of a situation where paying $0.25 per document would be justified...? Just to name a few:

Back in the days, I used to use XSL-FO [0] and it was okay. It was not very precise but it rarely if ever broke, and was perfectly integrated with an XML/XSLT solution. Yeah, this was a long time ago.

Last month I used html-to-pdfmake [1] and it's also not very precise and more fragile, but very efficient and fast.

Yet another approach would be to pro grammatically generate .rtf files (for example) and use Pandoc [2] to produce PDFs (I have not tried this in production but don't see why it wouldn't work).

[0] https://en.wikipedia.org/wiki/XSL_Formatting_Objects

[1] https://www.npmjs.com/package/html-to-pdfmake

[2] https://pandoc.org/

ketanmaheshwari2y ago

I just wanted to add that if you want to convert plaintext files to pdf, vim has a builtin feature to do so:

  vim filename.txt -c "hardcopy > filename.ps | q" && ps2pdf filename.ps #convert ps to pdf

patrick4urcloud2y ago

very nice !

nbittich2y ago

Another way to create pdfs would be a better title imho.

j / k navigate · click thread line to collapse

185 comments

150 comments · 48 top-level

ak2172y ago· 8 in thread

caesil2y ago

I think https://github.com/diegomura/react-pdf is closer to what this company is doing.

They're solving a really cool problem, actually, because building out into certain difficult use cases like SVG support was a huge pain.

AugusteLefOP2y ago

Titou3252y ago

We are currently experimenting with this approach. A good thing about paged.js is that we would be able to provide hot-reload and live preview of files without actually converting to PDF.

Your second point is very interesting, seems like some kind of .assert('text').isVisible() API. We may want to dig into that further!

rudasn2y ago

Cool project btw, congrats for the launch!

1 more reply

timvdalen2y ago

(How) does it handle CMYK and print PDFs? I see images of printed books created by Paged.js, were these post-processed, or printed using a printer that does a best-effort RGB conversion?

ak2172y ago

Overall you're right that color correction is another area where you could probably command a premium.

1 more reply

Mick-Jogger2y ago

ak2172y ago

Playwright at its core is a headless browser driver. In this case, we are using it to tell the browser to generate a PDF.

Brajeshwar2y ago· 8 in thread

May be this is just me but this looks extremely costly to me! It will cost $2,500 to generate 50,000 PDFs. Are edits/corrections additional cost?

jot2y ago

It sounds like this is as advanced as DocRaptor[1]. They have what I consider to be the best PDF generation API, giving complete control over the documents you need to create. The pricing is similar.

If you'd rather do it for free weasyprint[2] is the best open source alternative.

Another more affordable option you might want to consider is Urlbox[3]. (Disclosure: I work on this)

[1]: https://docraptor.com [2]: https://weasyprint.org [3]: https://urlbox.com [4]: https://urlbox.com/html-to-pdf

Titou3252y ago

Edits and corrections on generated PDFs is not provided as the PDFs are signed as-is, however you can attach the metadata to the PDF and rerender with the modifications.

mediaman2y ago

Pricing page: https://www.convertapi.com/prices

passion__desire2y ago

snadal2y ago

I second this. Maybe I'm missing something in the value proposition, but we already generate PDFs from .docx/.html templates using open source libraries and Docker microservices.

AugusteLefOP2y ago

pdabbadabba2y ago

> $0.50/PDF is way more than I can afford

I'm generally very confused by the various assertions in this thread about their pricing. What am I missing?

adnans2y ago

We use https://www.api2pdf.com/pricing/ and it's priced per bandwidth and usage - ($.001 per mb bandwidth and $0.00019551 per second of computation)

You can choose which API to use: Headless Chrome, Wkhtmltopdf, Libreoffice, etc.

egnehots2y ago· 7 in thread

The main issue is conflating templating and pdf generation.

Using html to pdf solutions allow to do the templating in html, where it is pretty much a solved issue.

And as many said, headless chrome is a robust html to pdf solution, even though it feel like a hack.

But, yeah, there seems to be a lack of awareness about these options within corporations. So, kudos to you for addressing a genuine problem!

pedro1202y ago

Indeed, we aim at bundling this in a way that makes it easy and obvious for enterprises to build their PDFs that way.

yencabulator2y ago

Typst is a typesetting language that makes programmatic layout and processing JSON input pretty darn simple. I make invoices by having a Typst template read line items from a JSON file.

https://github.com/typst/typst

adfaure2y ago

Just spent my Sunday creating my invoice template in typst as well. I enjoyed it, and I could do what I wanted quickly!

1 more reply

plopz2y ago

AugusteLefOP2y ago

1 more reply

gzapp2y ago

There are also a few good options in a lot of languages for streamlining chromium use.

In C# I'd look to use the Playwright library or perhaps even embed chromium via CerSharp if I were trying to avoid extra processes.

AugusteLefOP2y ago

dazh2y ago· 7 in thread

wonger_2y ago

Are you looking to programmatically fill any PDF form by detecting the fields? Or are you filling one known PDF template?

kodt2y ago

Years ago I needed to programmatically fill PDFs and used this library to achieve it. Funny it has the same name as what you linked: https://www.pdflib.com/

It is a paid commercial product however.

steveneo2y ago

nip2y ago

For programmatic filling of PDFs, have a look at DocSpring: https://docspring.com

pedro1202y ago

Yes, our focus is on programmatic interactions with PDFs, form filling is on our roadmap, alongside programmatic digital signature and many more.

dazh2y ago

Amazing, is there anywhere I can follow along to find out when form filling will be available?

1 more reply

azmodeus2y ago

What are you looking for in programmatic pdf filling?

ramon1562y ago· 7 in thread

Can we not have an alternative to PDFs? I get that they're more standardized but why would everyone let adobe have the hammer for a file type that's so important

Titou3252y ago

breadwinner2y ago

nradov2y ago

https://developer.walmart.com/home/us-edi/

calvinmorrison2y ago

Insanity.

Source: I work in EDI. it's a pain in the rump.

Also, EDI is really only good for things like PO's, shipping notices, invoices, sales orders, etc.

2 more replies

rapatel02y ago

PDF is an incredibly (stupidly) extensible format. There are tons of government forms that (sadly) bake in complex workflows into PDF forms.

I expect that PDFs will not go away for 20 years at least, but who knows

nvr2192y ago

Yeah let's give XPS another go.

devsda2y ago

Giving credit where it's due, I can appreciate Microsoft for introducing XPS as an alternative to pdf.

MS adding xps print driver to windows enabled sharing docs consistently (within windows ecosystem) without resorting to hacks.

I don't know why it didn't catch up. May be it was the general mistrust of anything MS, it arrived too late or it was something else.

1 more reply

Leoko2y ago· 5 in thread

I had to deal a lot with PDF generation over the past few years and I was very unhappy with the eco-system that was available:

2. PDF Libraries (like jsPDF): They mostly just have methods like ".text(x, y, string) which is an absolute pain to work with when building dynamic content or creating complex layouts.

Hope this makes somebody's life a bit easier: https://github.com/DevLeoko/painless-pdf

chrisfinazzo2y ago

Is there a reason you didn't consider something like Weasyprint?

https://weasyprint.org

Going all the way down to raw HTML is a bit verbose, but with almost anything I've thrown at it - CV's, business cards, you name it - it hasn't let me down yet.

epgui2y ago

I just considered weasyprint and couldn't figure out where to put my credit card or where to go to get started or to see some docs, so that was a very short-lived consideration.

1 more reply

Crowberry2y ago

I'm with you..

aforwardslash2y ago

Leoko2y ago

If you want granular control over how your PDF will look with content that is more than one page long, you will have a hard time using html.

2 more replies

somberi2y ago· 4 in thread

Useful service and a large problem space. Congrats and all the best. As someone who is a target customer, my 2 cents:

a. If this is a strategic value for my pipeline (and it is), we are going to code it ourselves, only because we can host it inside our fences. Critical customer data and hence.

c. SOC2 / ISO27001 - You might want to obtain them asap if you are looking to sell to outsourcing companies or FSG.

AugusteLefOP2y ago

somberi2y ago

1 more reply

HatchedLake7212y ago

Curious, with ~$0.005 per document, what volumes do you do that pricing becomes a no-go for you?

somberi2y ago

2 more replies

breadwinner2y ago· 4 in thread

How is this better than writing out an HTML file, then using headless chrome to export to PDF, like this:

    "C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --print-to-pdf=C:\temp\foo.pdf --no-margins --print-to-pdf-no-header C:\temp\test.mhtml

Titou3252y ago

aforwardslash2y ago

Some/most of these problems can be solved by using pagedjs and something like https://github.com/zipreport/zipreport-server

breadwinner2y ago

1 more reply

_puk2y ago

"you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem."

https://news.ycombinator.com/item?id=8863

rahhulk72y ago· 3 in thread

AugusteLefOP2y ago

Indeed it could be a very interesting use case. While we are more "Selling Shovels" it could be interesting to explore this use case and maybe build a simple demo out of it!

And yes, as a big fan of LaTeX myself (I used to do all my research reports on overleaf), we wanted to be able to integrate formulas, code and more into your document very simply. Glad you like it !

ska2y ago

AugusteLefOP2y ago

Thanks for the tip, I will take a look at it asap

admissionsguy2y ago· 3 in thread

You cannot make this up, generating PDFs is now an enterprise product.

AugusteLefOP2y ago

dmazzoni2y ago

What do you mean "now"? It has been for years. It's a huge business.

admissionsguy2y ago

3 more replies

midenginedcoupe2y ago· 2 in thread

If you can nail accessible PDFs then you'd open up a very big government market.

AugusteLefOP2y ago

dmazzoni2y ago

Can you directly answer whether your tool generates tagged PDFs?

2 more replies

matteason2y ago· 2 in thread

A few questions:

- Do the rendered documents support PDF's various accessibility features?

[0] https://github.com/UKGovernmentBEIS/energy-label-service

[1] https://github.com/danfickle/openhtmltopdf

Titou3252y ago

The pricing does go down for larger volumes and is something we still have to narrow down to the exact place that makes sense to companies and is also viable.

dmazzoni2y ago

So are you saying that you don't output tagged PDFs now?

For those who don't know, if you use Chromium's print-to-pdf feature you get a tagged PDF. And it's scriptable from the command-line too.

1 more reply

Sytten2y ago· 2 in thread

Definitely a problem I experienced. Big fan of browserless.io. Though I didnt see any comment on the biggest problem in this space: SSRF.

Most HTML-to-PDF are deeply insecure and I am more than happy to pay someone else to deal with isolation and security. Report generators are often used to leak cloud secrets via the metadata API.

AugusteLefOP2y ago

throw031720192y ago

Also big fan of browserless. Do you run it yourself? We run the browserless docker containers on prem.

marceldegraaf2y ago· 2 in thread

1: https://gotenberg.dev/

Titou3252y ago

marceldegraaf2y ago

To give you an idea, this is the kind of PDF files we generate that way: https://assets.walterliving.com/documents/walter-charlotte-d...

ffpip2y ago· 2 in thread

Love the demo on the homepage with the render button. Really helps explain the product!

AugusteLefOP2y ago

Thanks! We try to make our product as accessible as possible for anyone to use (or at least to test). It's good to hear that our efforts have been worthwhile!

Nathanba2y ago

1 more reply

staffors2y ago· 2 in thread

staffors2y ago

Also, do you different paper sizes (A4 and Letter)?

Titou3252y ago

We support the size[1] property and the widows and orphans[2] spec for both your needs :)

[1]: https://developer.mozilla.org/en-US/docs/Web/CSS/@page/size [2]: https://developer.mozilla.org/en-US/docs/Web/CSS/orphans

winter-day2y ago· 2 in thread

DutchHugo2y ago

Curious, which python library did you use to convert to PDFs? currently looking into a couple options myself

stormfather2y ago

weasyprint isn't terrible

cxr2y ago· 2 in thread

1. <https://github.com/OnedocLabs/dev-local>

2. <https://marketplace.visualstudio.com/items?itemName=mathemat...>

johnsonjo2y ago

[1]: https://en.wikipedia.org/wiki/PDF

AugusteLefOP2y ago

Sorry for the misunderstanding, we indeed "meant something more like a viewer for Adobe's PDF format rather than Adobe's viewer for PDFs.". We will make sure to change the wording. Thank you!

Crowberry2y ago· 2 in thread

How long does it take to render using your API? :)

pedro1202y ago

Crowberry2y ago

Alright, thanks!

baggy_trough2y ago· 2 in thread

AugusteLefOP2y ago

baggy_trough2y ago

cpr2y ago· 2 in thread

So are you using PrinceXML for your "completely separate engine where typography is a first-class citizen"?

Titou3252y ago

cpr2y ago

Isn't PrinceXML pretty much up to date? What's missing?

gtirloni2y ago· 2 in thread

I wonder what YC expects from such investments (considering the multitude of FOSS solutions in this area).

Titou3252y ago

This may not mean success, it means that game is not over in the documents field :)

gtirloni2y ago

ps.: As someone with very minimal PDF needs personally and at work, I'd say the beautiful templates are what caught my attention the most.

mstijak2y ago· 2 in thread

https://www.cx-reports.com

Titou3252y ago

mstijak2y ago

axhl2y ago· 2 in thread

neat. are there similar services / libs for generating word docs? this is a recurring problem for many

AugusteLefOP2y ago

axhl2y ago

Yes - sadly the pricing and docs aren’t the most competitive or intuitive. Hoping someone in the thread has experience with alternatives. Very best of luck with your launch

Gualdrapo2y ago· 1 in thread

Though personally I wish stuff like ConTeXt was more popular and approachable - to my humble knowledge their Lua backend seems to have huge potential, I am doing my invoices with ConTeXt/Lua.

Titou3252y ago

We like LaTeX, but even for advanced users laying things out can be a difficult thing. Given that documents are a frontend, we wanted to bring the same tools frontend developers already use.

kornhucker2y ago· 1 in thread

Titou3252y ago

There is both commonalities and differences! Both approaches rely on web technology to provide the layout and are flexible in terms of frameworks and integrations.

petern812y ago· 1 in thread

This is a good problem to tackle. The hours i've sunk...

AugusteLefOP2y ago

We spent many hours designing and generating PDFs at our previous venture.. terrible experience. Which is why we're now focused on solving this issue!

bbryanj232y ago· 1 in thread

Congrats on the launch. I was a user of htmldocs back in the day, good to see more products in the space.

One of the features I wish I had with htmldocs was the ability to automatically store generated documents in my own S3. I'd rather not introduce another cloud to my data stack just to host PDFs.

AugusteLefOP2y ago

Thanks! We are looking to extend our set of feature and integration, offering self-storing on S3 could definitely be one of them! Good call

fasteddie310032y ago· 1 in thread

Is this just a wrapper around Puppeteer that renders a pdf? I do this currently with an AWS lambda that has a chrome-aws-lambda layer.

Titou3252y ago

travelinmyblood2y ago· 1 in thread

First reaction - congrats guys, this is a problem I have in my own business.

Second reaction - the pricing is way over the top and the model is unusual. In your own pitch you talk about the volume of documents created every day. How does that square with per document pricing?

AugusteLefOP2y ago

roastedpillows2y ago· 1 in thread

The pricing is a little expensive. Have you heard of https://htmldocs.com/ I've been using them for a year now and it's free

AugusteLefOP2y ago

We've adjusted our pricing based on your comments and advice. It's not 100% free, but it now seems to make more sense to most of you. What do you think?

cratermoon2y ago· 1 in thread

Most of the time when I'm writing HTML I want a set of default styles for the most common elements, It's tedious and error-prone to have to specify a class every single time.

1 https://tailwindcss.com/docs/preflight

Titou3252y ago

jjslocum32y ago· 1 in thread

AugusteLefOP2y ago

But indeed, calling an API means sending documents contents to Onedoc in a way or another. We aim to provide a self-hosted solution in the future to solve this issue

kwhinnery2y ago· 1 in thread

Looks awesome, will keep this in mind - every so often you need to create complex documents in code, and it's always a pain. Doing it with a familiar modern programming interface would be nice.

AugusteLefOP2y ago

anonymouse0082y ago· 1 in thread

Am I understanding the docs correctly that you don't have a local library available (the SDKs are just calling the APIs right?)? Mind going through why you chose a remote API?

Titou3252y ago

You are right in the sense we do not provide a local library. We considered the option but would have brought a lot of challenges to accommodate the various runtimes and device capabilities.

This may come at a later stage once we have built our own rendering engine though

Oras2y ago· 1 in thread

Good luck!

Titou3252y ago

We actually take HTML as an input to our API converter. The React tooling is mostly to ease the barrier with most frontend codebases, as well as leverage the existing ecosystem of components.

It seems that these conversion engines are massive pieces of work that require a lot of upkeep, partly because CSS is a living spec but also because of the sheer number of edge cases.

We are already working on SOC2 as this has been a recurring ask, and indeed documents almost always contain PII.

Perz1val2y ago· 1 in thread

Titou3252y ago

Elements that are placed in page regions are shared between pages with the exception of CSS generated content such as running headers. Shadows are attached as an XObject image with a SMask indeed :)

canterburry2y ago· 1 in thread

May I ask, why do we still need PDFs? I know they are still popular, I just don't understand why.

Titou3252y ago

There are many reasons behind it, to name a few: files are self-contained(*) and easily portable, can guarantee some security features, the format is easily extended, and the ecosystem is very large.

It seems that a better format should exist, but the fact that PDF is the de-facto for portable documents make it unlikely things can change overnight.

kvakkefly2y ago· 1 in thread

Funny name! The reason I find it funny is I know some people who made Doconce: https://github.com/doconce/doconce :D

esafak2y ago

Are they still developing it after the founder's passing?

jjmaestro2y ago· 1 in thread

Just out of curiosity, as I've seen a few comments also mentioning PrinceXML. Is OneDoc an API, wrapper, etc, on top of PrinceXML? Or is it a completely new rendering engine?

Thanks!

AugusteLefOP2y ago

BrandiATMuhkuh2y ago· 1 in thread

Congrats on the launch! What's the main advantage over pspdfkit?

Titou3252y ago

It is similar to pspdfkit. We add an abstraction layer over the HTML and assets hosting to make it easier to use without having to think too hard about security and serving assets.

We also hope to keep the focus on the PDF generation part rather than expanding super-horizontal style to provide all imaginable PDF tools at the expense that none is really good.

airbreather2y ago· 1 in thread

are you doing this with pdfmarks?

AugusteLefOP2y ago

No, we don't currently do that. However, we are considering adding metadata to PDFs, and using pdfmark could be very helpful!

debarshri2y ago

What would be interesting is that if users generate PDFs via your library, embedding a machine format that allows parsing the PDF via your library is made easy, could become a game changer.

acoyfellow2y ago

Reminds me of BrewPDF.com

bambax2y ago

Congrats on the launch, I guess, but there are so many free options that I can't think of a situation where paying $0.25 per document would be justified...? Just to name a few:

Last month I used html-to-pdfmake [1] and it's also not very precise and more fragile, but very efficient and fast.

Yet another approach would be to pro grammatically generate .rtf files (for example) and use Pandoc [2] to produce PDFs (I have not tried this in production but don't see why it wouldn't work).

[0] https://en.wikipedia.org/wiki/XSL_Formatting_Objects

[1] https://www.npmjs.com/package/html-to-pdfmake

[2] https://pandoc.org/

ketanmaheshwari2y ago

I just wanted to add that if you want to convert plaintext files to pdf, vim has a builtin feature to do so:

  vim filename.txt -c "hardcopy > filename.ps | q" && ps2pdf filename.ps #convert ps to pdf

patrick4urcloud2y ago

very nice !

nbittich2y ago

Another way to create pdfs would be a better title imho.

j / k navigate · click thread line to collapse