Saturday, March 8, 2014

The html5lib Python library (and Animatron :-)

By Vasudev Ram



I came across the html5lib Python library recently. The site describes it thusly:

"html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers."

So it doesn't say explicitly that it is for parsing HTML5, though the library name includes "5" in its name. But I tried it out on a simple HTML5 document and it seems to be able to parse HTML5 - at least the few HTML5 elements I tried it on.

Here's the code I used to try out html5lib:
# test_html5lib.py
# A program to try out the html5lib Python library.
# Author: Vasudev Ram - www.dancingbison.com
import html5lib

f = open("html5doc.html")
tree = html5lib.parse(f)
print "tree:"
print repr(tree)
print
print "items in tree:"

for item in tree:
    print item
    for item2 in item:
        print "-" * 4, item2
        for item3 in item2:
            print "-" * 8, item3
And here is the output of running python test_html5lib.py:

<Element u'{http://www.w3.org/1999/xhtml}head' at 0x02B663C8>
<Element u'{http://www.w3.org/1999/xhtml}body' at 0x02B66488>
---- <Element u'{http://www.w3.org/1999/xhtml}header' at 0x02B664B8>
-------- <Element u'{http://www.w3.org/1999/xhtml}h1' at 0x02B66530>
-------- <Element u'{http://www.w3.org/1999/xhtml}h2' at 0x02B664E8>
-------- <Element u'{http://www.w3.org/1999/xhtml}h3' at 0x02B665F0>
---- <Element u'{http://www.w3.org/1999/xhtml}p' at 0x02B66650>
---- <Element u'{http://www.w3.org/2000/svg}svg' at 0x02B66BC0>
-------- <Element u'{http://www.w3.org/2000/svg}defs' at 0x02B66B60>
-------- <Element u'{http://www.w3.org/2000/svg}rect' at 0x02B66B30>
-------- <Element u'{http://www.w3.org/2000/svg}text' at 0x02B66BD8>
---- <Element u'{http://www.w3.org/1999/xhtml}footer' at 0x02B66BF0>

Here is the documentation for html5lib.

And speaking of HTML5, coincidentally, I came across Animatron via Hacker News, today:



Animatron is "a simple and powerful online tool that allows you to create stunning HTML5 animations and interactive content." Animatron is not really related to html5lib, except for the fact that both of them are about HTML5, but it looks cool. Check it out.

Hacker News thread about Animatron.

Enjoy.


- Vasudev Ram - Dancing Bison Enterprises

Contact Page

No comments: