Saturday, March 7, 2015

PDFCrowd and its HTML to PDF API (for Python and other languages)

By Vasudev Ram


PDFcrowd is a web service that I came across recently. It allows users to convert HTML content to PDF. This can be done both via the PDFcrowd site - by entering either the content or the URL of an HTML page to be converted to PDF - or via the PDFcrowd API, which has support for multiple programming languages, including for Python. I tried multiple approaches, and all worked fairly well.

A slightly modified version of a simple PDFcrowd API example from their site, is shown below.

# Demo program to show how to use the PDFcrowd API
# to convert HTML content to PDF.
# Author: Vasudev Ram - www.dancingbison.com

import pdfcrowd

try:
    # create an API client instance
    # Dummy credentials used; to actually run the program, enter your own.
    client = pdfcrowd.Client("user_name", "api_key")
    client.setAuthor('author_name')
    # Dummy credentials used; to actually run the program, enter your own.
    client.setUserPassword('user_password')

    # Convert a web page and store the generated PDF in a file.
    pdf = client.convertURI('http://www.dancingbison.com')
    with open('dancingbison.pdf', 'wb') as output_file:
        output_file.write(pdf)
    
    # Convert a web page and store the generated PDF in a file.
    pdf = client.convertURI('http://jugad2.blogspot.in/p/about-vasudev-ram.html')
    with open('jugad2-about-vasudevram.pdf', 'wb') as output_file:
        output_file.write(pdf)

    # convert an HTML string and save the result to a file
    output_file = open('html.pdf', 'wb')
    html = "My Small HTML File"
    client.convertHtml(html, output_file)
    output_file.close()

except pdfcrowd.Error, why:
    print 'Failed:', why
I used three calls to the API. For the first two calls, the inputs were: 1) my web site, 2) the about page of my blog.

Screenshots of the results of those two calls are below. You can see that they correspond closely to the originals.

Screenshot of generated PDF of dancingbison.com site



Screenshot of generated PDF of About Vasudev Ram page on jugad2.blogspot.com blog



- Vasudev Ram - Online Python training and programming

Dancing Bison Enterprises

Signup to hear about new Python or PDF related products created by me.

Posts about Python  Posts about xtopdf

Contact Page

2 comments:

James said...

You might also check out the <a href="https://docraptor.com'>DocRaptor HTML-to-PDF API</a> (Note: I work there). You'll find it a bit more accurate (useful for invoices, brochures, etc) than PDFCrowd. Example page from Dancing Bison: https://www.dropbox.com/s/9bdsmi0kb6dp2x3/Try%20It%20Out%20Doc.pdf?dl=0

CrisisMaven said...

So from the bottom graphic I take it that this also creates workable hyperlinks in PDF, i.e. not only from when the protocol/domain/URi is visible, but also when there is a Title tag? Because in e.g. WORD or OpenOffice this works only if you buy the expensive original Adobe Acrobat package - all other software will create a PDF alright, but the embedded hyperlinks are "dead" then.