Wednesday, October 1, 2014

Cloudera acquires Datapad; has Python client for Impala (SQL on Hadoop)

By Vasudev Ram


Saw this via a tweet by GigaOm.

Cloudera has acquired Datapad.

GigaOm article about it:

Cloudera bought DataPad because data scientists need tooling, too

Summary of the GigaOm article about the acquisition:

[ Cloudera has acquired a data-visualization startup called DataPad, the founding team of which specializes in data analysis using the Python programming language [1]. As Hadoop competition heats up, Cloudera might be ramping up its Python tooling in order to attract more data scientists and developers. ]

[1] The founders of DataPad (Wes McKinney and Chang She) are also the creators of the pandas Python library for data analysis.

Here are a few other interesting links related to Cloudera buying Datapad:

A New Python Client for Impala

I had blogged earlier about Cloudera's Impala engine that allows SQL querying of Hadoop data:

Cloudera's Impala engine - SQL querying of Hadoop data

SQL coming to Hadoop

Cloudera: Impala’s it for interactive SQL on Hadoop; everything else will move to Spark

Apache Spark - "Lightning-fast cluster computing"

Apache Spark page on Wikipedia

WIRED magazine article about Apache Spark:

Open Source Superstar Rewrites Future of Big Data

- Vasudev Ram - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page

Monday, September 29, 2014

CommonMark, a pure Python Markdown parser and renderer


By Vasudev Ram

I got to know about CommonMark.org via this post on the Python Reddit:

CommonMark.py - pure Python Markdown parser and renderer

From what I could gather, CommonMark is, or aims to be, two things:

1. "A standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests".

2. A Python parser and renderer for the CommonMark Markdown spec.

CommonMark on PyPI, the Python Package Index.

Excerpts from the CommonMark.org site:

[ We propose a standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests to validate Markdown implementations against this specification. We believe this is necessary, even essential, for the future of Markdown. ]

[ Who are you?
We're a group of Markdown fans who either work at companies with industrial scale deployments of Markdown, have written Markdown parsers, have extensive experience supporting Markdown with end users – or all of the above.

John MacFarlane
David Greenspan
Vicent Marti
Neil Williams
Benjamin Dumke-von der Ehe
Jeff Atwood ]

So I installed the Python library for it with:
pip install commonmark
Then modified this snippet of example code from the CommonMark PyPI site:
import CommonMark
parser = CommonMark.DocParser()
renderer = CommonMark.HTMLRenderer()
print(renderer.render(parser.parse("Hello *World*")))
on my local machine, to add a few more types of Markdown syntax:
import CommonMark
parser = CommonMark.DocParser()
renderer = CommonMark.HTMLRenderer()
markdown_string = \
"""
Heading
=======
 
Sub-heading
-----------
 
# Atx-style H1 heading.
## Atx-style H2 heading.
### Atx-style H3 heading.
#### Atx-style H4 heading.
##### Atx-style H5 heading.
###### Atx-style H6 heading.
 
Paragraphs are separated
by a blank line.
 
Let 2 spaces at the end of a line to do a  
line break
 
Text attributes *italic*, **bold**, `monospace`.
 
A [link](http://example.com).
 
Shopping list:
 
  * apples
  * oranges
  * pears
 
Numbered list:
 
  1. apples
  2. oranges
  3. pears
 
"""
print(renderer.render(parser.parse(markdown_string)))
Here is a screenshot of the output HTML generated by CommonMark, loaded in Google Chrome:


Reddit user bracewel, who seems to be a CommonMark team member, said on the Py Reddit thread:

eventually we'd like to add a few more renderers, PDF/RTF being the first....

So CommonMark looks interesting and worth keeping an eye on, IMO.

- Vasudev Ram - Dancing Bison Enterprises - Python training and consulting

Dancing Bison - Contact Page

Sunday, September 28, 2014

My IBM developerWorks article: Developing a Linux command-line utility

By Vasudev Ram

I had written an article about Developing a Linux command-line utility for IBM developerWorks (IBM dW), some years ago. It was a tutorial on how to write Linux command-line utilities in C. It used a real-life Linux utility that I had earlier written [1], to show some of the techniques involved in writing such utilities for general-purpose use.

[1] I had originally written the utility for production use for one of the largest motorcycle manufacturers in the world.

The article was fairly well-received while it was on the site (for a long time) and received multiple four-star ratings (out of a possible five stars). It was viewed over 35,000 times. Since it was recently archived from the IBM dW site, I thought of putting up the article - as a PDF file [2], with the accompanying source code, in a project on my Bitbucket account, for the benefit of those interested in learning how to write Linux command-line utilities in C. The name of the utility was selpg (for select pages), so I named the project selpg on Bitbucket too.

[2] I got to know that the article had been archived from the IBM dW site, and wrote to them asking for a copy of the PDF of the article, which they kindly sent me.

Here is the selpg project on Bitbucket:

Developing a Linux command-line utility (selpg)

And you can get the article and all the source files here:

selpg source

In an upcoming post, I'll show a few practical uses of the selpg utility.

Enjoy.

- Vasudev Ram - Dancing Bison Enterprises. Python, C, Linux and open source consulting and training.

Contact Page

Wednesday, September 24, 2014

How Guido nearly dropped out - and then dropped in

By Vasudev Ram

Interview with Guido van Rossum, creator of the Python language.

He says in the interview that he nearly dropped out of school - but was dissuaded from doing so by his manager and professor.

Excerpt:

[ I was actually close to dropping out.

Oh my gosh! Why?

The job was so fun, and studying for exams wasn’t. Fortunately, my manager at the data center, as well as one of my professors, cared enough about me to give me small nudges in the direction of, “Well, maybe it would be smart to graduate, and then you can do this full-time!” (laughing) ]

Later, of course, he dropped in ... to the box.

- Vasudev Ram - Dancing Bison Enterprises

Contact Page

Tuesday, September 23, 2014

Yahoo buys Indian startup Bookpad

By Vasudev Ram



Just saw this on GigaOm via Twitter: Yahoo is buying a one-year-old Bangalore-based Indian startup called Bookpad. Bookpad has a product called Docspad which is used for document viewing and editing etc.

Links about the Yahoo acquisition of Bookpad on some news sites:

On GigaOm:

Yahoo buys Indian startup Bookpad for document viewing and editing capabilities

On the Economic Times:

Yahoo buys Bangalore-based tech startup Bookpad for Rs 50 crore

On TechCrunch:

Yahoo Acquires Bangalore-Based Bookpad, Makers Of Online File Editing And Collaboration Software Docspad




- Vasudev Ram - Dancing Bison Enterprises

Contact Page