jugad2 - Vasudev Ram on software innovation: programmatic-PDF-creation

Showing posts with label programmatic-PDF-creation. Show all posts

Wednesday, May 1, 2013

PDF in a Bottle - creating PDF using xtopdf, ReportLab, Bottle and Python

pdf_bottle.py is a program I wrote that allows you to create a PDF file from text, over the web, by entering your text into a form and submitting it.

Here is the program:

# pdf_bottle.py

# Description: Program to generate PDF from text, over the web,
# using xtopdf, ReportLab and the Bottle web framework.
# It can be used to create short, simple PDF e-books.
# Author: Vasudev Ram - http://dancingbison.com
# Copyright 2013 Vasudev Ram 
# Tested with Python 2.7.

# Version: 0.1

# Dependencies:
# xtopdf - https://bitbucket.org/vasudevram/xtopdf
# bottle - http://bottlepy.org
# ReportLab - http://www.reportlab.com/ftp/reportlab-1.21.zip
# Python - http://python.org

from PDFWriter import PDFWriter

from bottle import route, request, run

@route('/edit_book')
def edit_book():
    return '''
    <form action="/save_book" method="post">
    PDF file name: <input type="text" name="pdf_file_name" />

    Header: <input type="text" name="header" />

    Footer: <input type="text" name="footer" />

    Content:
    <textarea name="content" rows="15"   cols="50"></textarea>

    <input type="submit" value="Submit" />

    </form>
'''

@route('/save_book', method='POST')
def save_book():
    try:
        pdf_file_name = request.forms.get('pdf_file_name')
        header = request.forms.get('header')
        footer = request.forms.get('footer')
        content = request.forms.get('content')

        pw = PDFWriter(pdf_file_name)
        pw.setFont("Courier", 12)
        pw.setHeader(header)
        pw.setFooter(footer)

        lines = content.split('\n')
        for line in lines:
            pw.writeLine(line)

        pw.savePage()
        pw.close()
        return "Done"
    except Exception:
        return "Not done"

def main():
    run(host='localhost', port=9999)

if __name__ == "__main__":
    main()

To run it, you need to have Python, the open source version of ReportLab, my xtopdf toolkit and the Bottle Python web framework installed.

Here is a guide to installing and using xtopdf.

For help with installing the other products, consult their respective sites, linked above.

Then run the program with this command:

python pdf_bottle.py

Then, in a browser window, go to localhost:9999

Enter the details - PDF file name, header, footer and text content - in the form, then click Submit.

The PDF file will be generated in the same directory from where you ran the Python program.

This is the first version, and has been tested only a bit. It you find any issues, please mention them in the comments.

Various improvements are possible, including sending the generated PDF to the user's browser, or providing a link to download it (better), and I'll work on some of them over time.

P.S. Excerpt from the Bottle framework site:

[
Bottle is a fast, simple and lightweight WSGI micro web-framework for Python. It is distributed as a single file module and has no dependencies other than the Python Standard Library.

Routing: Requests to function-call mapping with support for clean and dynamic URLs.
Templates: Fast and pythonic built-in template engine and support for mako, jinja2 and cheetah templates.
Utilities: Convenient access to form data, file uploads, cookies, headers and other HTTP-related metadata.
Server: Built-in HTTP development server and support for paste, fapws3, bjoern, Google App Engine, cherrypy or any other WSGI capable HTTP server.
]

- Vasudev Ram - Dancing Bison Enterprises

Share |

Wednesday, April 10, 2013

Using xtopdf and pypyodbc to publish MS Access database data to PDF

By Vasudev Ram

I had blogged about pypyodbc, a pure-Python ODBC library, recently.

Using pypyodbc with my xtopdf toolkit for PDF creation, you can publish your MS Access database data to PDF.

Here is some example code to publish MS Access data to PDF:

First, the program create_ppo_mdb.py, shown below, creates an MS Access database called fruits.mdb, then creates a table called fruits in it, and inserts 3 records into the table:

# create_ppo_mdb.py

import pypyodbc 
             
pypyodbc.win_create_mdb('.\\fruits.mdb')
connection_string = 'Driver={Microsoft Access Driver (*.mdb)};DBQ=.\\fruits.mdb'
connection = pypyodbc.connect(connection_string)

SQL = 'CREATE TABLE fruits (id COUNTER PRIMARY KEY, fruit_name VARCHAR(25));'
connection.cursor().execute(SQL).commit()

SQL = "INSERT INTO fruits values (1, 'apple');"
connection.cursor().execute(SQL).commit()

SQL = "INSERT INTO fruits values (2, 'banana');"
connection.cursor().execute(SQL).commit()

SQL = "INSERT INTO fruits values (3, 'orange');"
connection.cursor().execute(SQL).commit()

# Uncomment the 5 lines below make the program also display the data after creating it.

#SQL = 'SELECT * FROM fruits;'
#cursor = connection.cursor().execute(SQL)
#for row in cursor:
#    for col in row:
#        print col,
#    print

cursor.close()
connection.close()

Next, the program MDBtoPDF.py, shown below, reads the data from the fruits table in the MDB database just created above, and publishes the selected records to PDF:

#-------------------------------------------------------------------

# MDBtoPDF.py
# Description: A program to convert MS Access .MDB data to PDF format.
# Author: Vasudev Ram - http://www.dancingbison.com

#-------------------------------------------------------------------

# imports

import sys 
import os
import time
import string
import pypyodbc 
from PDFWriter import PDFWriter
             
#-------------------------------------------------------------------

# globals

##------------------------ usage ---------------------------------------

def usage():

 sys.stderr.write("Usage: python " + sys.argv[0] + " MDB_DSN table_name pdf_file\n")
 sys.stderr.write("where MDB_DSN is the ODBC DSN (Data Source Name) for the\n")
 sys.stderr.write("MDB file, table_name is the name of the table in that MDB,\n")
 sys.stderr.write("whose data you want to convert to PDF, and pdf_file is the\n")
 sys.stderr.write("output PDF filename.\n")
 sys.stderr.write(sys.argv[0] + " reads the table data from the MDB and\n")
 sys.stderr.write("writes it to pdf_file.\n")

##------------------------ main ------------------------------------------

def main():

 '''Main program to convert MDB data to PDF.
 '''

 # check for right num. of args
 if (len(sys.argv) != 4):
  usage()
  sys.exit(1)

 # extract MDB DSN, table name and pdf filename from args
 mdb_dsn = sys.argv[1]
 table_name = sys.argv[2]
 pdf_fn = sys.argv[3]

 print "mdb_dsn =", mdb_dsn
 print "table_name =", table_name
 print "pdf_fn =", pdf_fn

    # build connection string
 connection_string_prefix = 'Driver={Microsoft Access Driver (*.mdb)};DBQ='
 connection_string = connection_string_prefix + mdb_dsn
 print "connection_string =", connection_string
 connection = pypyodbc.connect(connection_string)
 print "connection =", connection

 # create the PDFWriter instance
 pw = PDFWriter(pdf_fn)

 # and set some of its fields

 # set the font
 pw.setFont("Courier", 10)

 # set the page header
 gen_datetime = time.asctime()
 pw.setHeader("Generated by MDBtoPDF: Input: " + mdb_dsn + \
 " At: " + gen_datetime)

 # set the page footer
 pw.setFooter("Generated by MDBtoPDF: Input: " + mdb_dsn + \
 " At: " + gen_datetime)

 # create the separator for logical grouping of output
 sep = "=" * 60

 # print the data records section title
 pw.writeLine("MDB Data Records from MDB: %s, table: %s" % (mdb_dsn, 
  table_name))

 # print a separator line
 pw.writeLine(sep)

 # read the input MDB data and write it to the PDF file

 SQL = 'SELECT * FROM fruits;'

 cursor = connection.cursor().execute(SQL)
 for row in cursor:
  str_row = ""
  for col in row:
   str_row = str_row + str(col) + " "
  pw.writeLine(str_row)

 # close the cursor and connection
 cursor.close()
 connection.close()

 # print a separator line
 pw.writeLine(sep)

 # save current page
 pw.savePage()

 # close the PDFWriter
 pw.close()

##------------------------ Global code -----------------------------------

# invoke main

if __name__ == '__main__':
 main()

##------------------------ EOF - MDBto_PDF.py ---------------

To make the above programs work, you need to have the Reportlab toolkit v1.21 and the xtopdf toolkit installed, in addition to pypyodbc and Python 2.7. (Click on the "Branches" tab on the xtopdf page linked in the previous sentence to download xtopdf.)

I've had an interest in ODBC ever since I first worked, as team leader, on a middleware software product that used ODBC. The middleware was developed at Infosys Technologies, where I worked at the time.

Though ODBC itself had a good architecture, many driver implementations of the time (this was some years ago) were rather slow, so one of the main goals of the product was to improve the performance of client-server or desktop applications (written in Visual Basic or C) that used ODBC for database access.

I remember learning ODBC as part of the project (and teaching it to the team), and reading most of the book "Inside ODBC" by Kyle Geiger, one of the architects of ODBC - it was a fascinating book, that gave a detailed look inside the architecture of ODBC, the reasons for certain design decisions that were made, and so on.

We succeeded in meeting all the goals of the project, and that middleware product was used in many large client-server applications (using VB and Oracle / Sybase) that were developed by Infosys for its clients. I really had a lot of fun working on that project.

Related links:

ODBC entry on Wikipedia

Inside ODBC - the book, on Amazon

eGenix mxODBC Connect, from eGenix, a German Python products company.

eGenix mxODBC

unixODBC

DataDirect ODBC

iODBC

The Microsoft SQL Server ODBC Driver for Linux - it provides native connectivity from Linux to Microsoft SQL Server. (Seems to be 64-bit only).

- Vasudev Ram - Dancing Bison Enterprises

Share |

Monday, November 5, 2012

PDFBuilder can now take multiple input files from command line

By Vasudev Ram

PDFBuilder, which I blogged about recently, can now build a composite PDF from an arbitrary number [1] of input files (CSV and TDV) [2] specified on the command line. (I've removed the hard-coding in the first version.)

I've also cleaned up and refactored the PDFBuilder code some, though I still need to do some more.

UPDATE: I've pasted a few code snippets from PDFBuilder.py at the end of this post.

This version of PDBBuilder can be downloaded here, as a part of xtopdf v1.4, from the Bitbucket repository.

[1] Arbitrary number, that is, subject to the limitations of the length of the command line supported by your OS, of course - whether Unix / Linux, Mac OS X or Windows. However, there is a solution for that.

[2] The design of PDFBuilder allows for easily adding support for other input file formats that are row-oriented. See the method next_row() in the file CSVReader.py in the source package, for an example of how to add support for other compatible input formats. You just have to write a reader class (analogous to CSVReader) for that other format, called, say, FooReader, and provide an open() method and a next_row() method as in the CSVReader class, but adapted to handle Foo data.

Some code snippets from PDFBuilder.py:

The PDFBuilder class:

class PDFBuilder:
 """
 Class to build a composite PDF out of multiple input sources.
 """

 def __init__(self, pdf_filename, font, font_size, 
    header, footer, input_filenames):
  """
  PDFBuilder __init__ method.
  """
  self._pdf_filename = pdf_filename
  self._input_filenames = input_filenames

  # Create a PDFWriter instance.
  self._pw = PDFWriter(pdf_filename)

  # Set its font.
  self._pw.setFont(font, font_size)

  # Set its header and footer.
  self._pw.setHeader(header)
  self._pw.setFooter(footer)
  
 def build_pdf(self, input_filenames):
  """
  PDFBuilder.build_pdf method.
  Builds the PDF using contents of the given input_filenames.
  """

  # Loop over all names in input_filenames.
  # Instantiate the appropriate reader for each filename, 
  # based on the filename extension.

  # For each reader, get each row, and for each row,
  # combine all the columns into a string separated by a space,
  # and write that string to the PDF file.

  # Start a new PDF page after each reader's content is written
  # to the PDF file.

  for input_filename in input_filenames:
   # Check if name ends in ".csv", ignoring upper/lower case
   if input_filename[-4:].lower() == ".csv":
    reader = CSVReader(input_filename)
   # Check if name ends in ".tdv", ignoring upper/lower case
   elif input_filename[-4:].lower() == ".tdv":
    reader = TDVReader(input_filename)
   else:
    sys.stderr.write("Error: Invalid input file. Exiting\n")
    sys.exit(0)

   hdr_str = "Data from reader: " + \
    reader.get_description()
   self._pw.writeLine(hdr_str)
   self._pw.writeLine('-' * len(hdr_str))

   reader.open()
   try:
    while True:
     row = reader.next_row()
     s = ""
     for item in row:
      s = s + item + " "
     self._pw.writeLine(s)
   except StopIteration:
    # Close this reader, save this PDF page, and 
    # start a new one for next reader.
    reader.close()
    self._pw.savePage()
    #continue

 def close(self):
  self._pw.close()

The main() function that uses the PDFBuilder class to create a composite PDF:

def main():

 # global variables

 # program name for error messages
 global prog_name
 # debug flag - if true, print debug messages, else don't
 global DEBUGGING
 
 # Set the debug flag based on environment variable DEBUG, 
 # if it exists.
 debug_env_var = os.getenv("DEBUG")
 if debug_env_var == "1":
  DEBUGGING = True

 # Save program filename for error messages
 prog_name = sys.argv[0]

 # check for right args
 if len(sys.argv) < 2:
  usage()
  sys.exit(1)

 # Get output PDF filename from the command line.
 pdf_filename = sys.argv[1]

 # Get the input filenames from the command line.
 input_filenames = sys.argv[2:]

 # Create a PDFBuilder instance.
 pdf_builder = PDFBuilder(pdf_filename, "Courier", 10, 
       "Composite PDF", "Composite PDF", 
       input_filenames)

 # Build the PDF using the inputs.
 pdf_builder.build_pdf(input_filenames)

 pdf_builder.close()

 sys.exit(0)

And a batch file, run.bat, calls the program with input filename arguments:

@echo off
python PDFBuilder.py %1 file1.csv file1.tdv file2.csv file2.tdv file1-repeats5.csv

Run the batch file like this:

C:> run composite.pdf

which will create a PDF file, composite.pdf, from the input CSV and TDV files given as command-line arguments.

Enjoy.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Saturday, November 3, 2012

PDFBuilder can create composite PDFs

By Vasudev Ram

PDFBuilder is a tool to create composite PDFs, i.e. PDFs comprising of data from multiple different input data formats. It is a new component of my xtopdf toolkit for PDF generation.

At present, for input formats, PDFBuilder supports only CSV (Comma Separated Values, which can be exported from / imported to spreadsheets, among other things) and TDV / TSV (Tab Delimited Values / Tab Separated Values), which many UNIX / Linux tools like sed, grep, and awk, can create or process).

But support for more input formats can be added fairly easily, due to the design.

PDFBuilder is included in xtopdf v1.4 (just released on Bitbucket).

To try PDFBuilder:

- Download xtopdf v1.4, then follow the steps in the file README.txt; the steps include installing Python (>= v2.2), if you don't have it already, and Reportlab v1.21. (The steps for installing ReportLab are here.)

Then run this command:

python PDFBuilder.py output.pdf

This will create a composite PDF file, output.pdf, from two CSV files and two TDV files (interleaved). This is hard-coded as of now, but will be changed to take a list of input files from the command-line.

The download includes the 4 input files and the corresponding output PDF file.

Note: The xtopdf links on SourceForge and my site dancingbison.com have not yet been updated for xtopdf v1.4, so don't try to get v1.4 from there, for now.

You can read more about the ReportLab toolkit here.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Friday, September 2, 2011

Started CreatingPDF, a list of PDF creation libraries on Wikia

By Vasudev Ram - dancingbison.com | @vasudevram | jugad2.blogspot.com

Hi readers,

I started a list of libraries that help you create PDF programmatically, here on Wikia:

http://http://creatingpdf.wikia.com

The list will be across languages, i.e., will not be restricted to just one or a few programming languages.

Will update it over time with more PDF creation libs that I know of.

Reportlab (Python), FPDF (PHP), Haru / libharu (C), POCO PDF (C++, thin wrapper over Haru), PyFPDF (Python port of FPDF), iText (Java), iTextSharp (C#), PDF::Writer and Prawn (both Ruby), xtopdf (Python, mine, uses Reportlab, easier interface for plain text to PDF), some Perl PDF libraries, are to be added. There are lots more. Anyone with suggestions for libraries to add, feel free to email me; see my Contact page at:

http://www.dancingbison.com/contact.html

Posted via email.

jugad2 - Vasudev Ram on software innovation

Pages

Wednesday, May 1, 2013

PDF in a Bottle - creating PDF using xtopdf, ReportLab, Bottle and Python

Wednesday, April 10, 2013

Using xtopdf and pypyodbc to publish MS Access database data to PDF

Monday, November 5, 2012

PDFBuilder can now take multiple input files from command line

Saturday, November 3, 2012

PDFBuilder can create composite PDFs

Friday, September 2, 2011

Started CreatingPDF, a list of PDF creation libraries on Wikia

Blog Archive

Labels