I would be willing to pay money for a reliable tool that didn't need much manual editing after processing.
Unfortunately, the pdftohtml project (http://pdftohtml.sourceforge.net/) has been inactive, and the current version has trouble with even moderately complex layouts.
The fundamental problem is that PDF stores the document presentation while html defines the document and the presentation is created by the browser. And obviously, to restore a document definition from its presentation is hard as lot of information is missing.
Yes, that's true.
I only bring it up b/c if your goal is to turn pdfcrowd into an app that people would pay money for (and I would be one of them), solving that problem would go a long way towards achieving it.
It works really well. The only quirk is that it needs a fake X server (for font loading), but Xvfb works just fine for that.
Try it:
http://www.tokyomuslim.com/2010/04/php-class-to-run-pdfcrowd...
I don't blame anyone for not wanting to use PHP's poorly documented CURL classes.
One of the ways of doing this is to host it on a simple shared server (it's not a heavily used app).
Downside of this is that it's unlikely we'll be able to use any of the PDF tools I've used in the past (since they need to be installed). This should work fine for our purposes.
Thanks, I was wondering how I'd get around this.
To all those who were dissing this because they couldn't immediately see a use for it, try to have a more open mind.
One other caveat is that having the ability to view flash would be awesome as well. main function of pdf as i understand it is to create a document that PRINTS completely identically on every setup, so frequently people are going to want to print flash, which is already a huge pain in the ass. Unfortunately it looks like it blanks out completely if there is flash on the page (2advanced.net)
if you could solve that i would start paying tomorrow.
http://sleep.dashnine.org/manual/ - original docs http://sleep.dashnine.org/download/sleep21manual.pdf - result
If PDFCrowd can effectively handle images, I'll brand their logo into my bicep.
I've just spent weeks working on HTML -> PDF conversion code, so I know it's not just my viewer. I've put all kinds of crazy stuff through there.
Also, you don't support the CSS3 styling of the header text.
The fonts look super aliased.
Finally, you don't snap the rendered HTML to the nearest page, leading to a page containing only the footer.
I think that the Pdfcrowd's selling points could be 1) wide availability - only HTTP is needed so it can be used theoretically on any platform 2) no need to install any 3rd party software which makes the applications more portable 3) API bindings
Given the focus on APIs I guess you're aiming it at those wanting to programmatically generate PDFs using a familiar markup, rather than conversion of existing (static) content into PDF? If so, maybe investigate the ability to overlay rendering onto an existing PDF template at some point - in my experience it's been a common requirement (think form letters, account statements, etc).
Interesting that it appears to execute Javascript; guess it's a sign of the times that you need to in order to render many sites correctly nowadays. I haven't poked it too hard, but suspect there might be one or two security challenges there...
Here is an idea for an extra feature: make a print bookmarklet -- clicking on it you get a nice PDF version of the page you are viewing right now. I can't stand firefox's print renditions of some pages... terrible...
(also you might want to set the page size to letter or A4 depending on the geolocation of your visitor's ip address)
I notice there are some questions about how to make money. One may be to position yourself as a way to get PDF reports generated from phone apps, in which case you may want to do per app licensing and provide facilities for email delivery of PDFs.
I could see this being useful porting apps from iPhone (can easily generate PDFs) to Android (which does not appear to support PDF output).
I fed it my homepage, and it nailed it. I'm impressed.