Monday, August 9, 2010

Docsplit command line utility and Ruby library - convert / extract between / from text images PDF HTML DOC DOCX PPT XLS ODF RTF SWF SVG WPD

By Vasudev Ram

Saw this via @gnat on O'Reilly Radar.

From the Docsplit site:

Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)
Docsplit is currently at version 0.3.1.
Docsplit is an open-source component of DocumentCloud.
- Vasudev Ram - Dancing Bison Enterprises

Share this post:

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine

No comments: