PDFcrowd is a web service that I came across recently. It allows users to convert HTML content to PDF. This can be done both via the PDFcrowd site - by entering either the content or the URL of an HTML page to be converted to PDF - or via the PDFcrowd API, which has support for multiple programming languages, including for Python. I tried multiple approaches, and all worked fairly well.
A slightly modified version of a simple PDFcrowd API example from their site, is shown below.
# Demo program to show how to use the PDFcrowd API # to convert HTML content to PDF. # Author: Vasudev Ram - www.dancingbison.com import pdfcrowd try: # create an API client instance # Dummy credentials used; to actually run the program, enter your own. client = pdfcrowd.Client("user_name", "api_key") client.setAuthor('author_name') # Dummy credentials used; to actually run the program, enter your own. client.setUserPassword('user_password') # Convert a web page and store the generated PDF in a file. pdf = client.convertURI('http://www.dancingbison.com') with open('dancingbison.pdf', 'wb') as output_file: output_file.write(pdf) # Convert a web page and store the generated PDF in a file. pdf = client.convertURI('http://jugad2.blogspot.in/p/about-vasudev-ram.html') with open('jugad2-about-vasudevram.pdf', 'wb') as output_file: output_file.write(pdf) # convert an HTML string and save the result to a file output_file = open('html.pdf', 'wb') html = "My Small HTML File" client.convertHtml(html, output_file) output_file.close() except pdfcrowd.Error, why: print 'Failed:', whyI used three calls to the API. For the first two calls, the inputs were: 1) my web site, 2) the about page of my blog.
Screenshots of the results of those two calls are below. You can see that they correspond closely to the originals.
Screenshot of generated PDF of dancingbison.com site
Screenshot of generated PDF of About Vasudev Ram page on jugad2.blogspot.com blog
- Vasudev Ram - Online Python training and programming Dancing Bison EnterprisesSignup to hear about new Python or PDF related products created by me. Posts about Python Posts about xtopdf Contact Page
2 comments:
You might also check out the <a href="https://docraptor.com'>DocRaptor HTML-to-PDF API</a> (Note: I work there). You'll find it a bit more accurate (useful for invoices, brochures, etc) than PDFCrowd. Example page from Dancing Bison: https://www.dropbox.com/s/9bdsmi0kb6dp2x3/Try%20It%20Out%20Doc.pdf?dl=0
So from the bottom graphic I take it that this also creates workable hyperlinks in PDF, i.e. not only from when the protocol/domain/URi is visible, but also when there is a Title tag? Because in e.g. WORD or OpenOffice this works only if you buy the expensive original Adobe Acrobat package - all other software will create a PDF alright, but the embedded hyperlinks are "dead" then.
Post a Comment