Thursday, August 23, 2012
Fulltext is a simple Python library for converting document and media files to text. It's main purpose is for use with full-text indexing systems.
http://pypi.python.org/pypi/fulltext/0.1-1 (Site giving an error at present)
For example, to easily extract text from a PDF file:
> import fulltext
Excerpt from the github site for fulltext:
[ Fulltext is a library that makes converting various file formats to plain text simple. Mostly it is a wrapper around shell tools. It will execute the shell program, scrape it's results and then post-process the results to pack as much text into as little space as possible.
The following formats are supported using the command line apps listed.
application/x-tar, gzip: tar & gunzip
application/x-tar, bzip2: tar & bunzip2
application/octet-stream: strings ]
Inspired by nature.
- dancingbison.com | @vasudevram | jugad2.blogspot.com