pattern is a web mining and NLP (Natural Language Processing) library for Python.
It is from CLiPS (Computational Linguistics & Psycholinguistics), "a research center associated with the Linguistics department of the faculty of Arts of the University of Antwerp."
From the site:
[ It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics), clustering and classification (k-means, KNN, SVM), and data visualization (graph networks). ]
Example usage and output - from the site:
>>> from pattern.web import Twitter, plaintext >>> for tweet in Twitter().search('"more important than"', cached=False): >>> print plaintext(tweet.description) 'HINT: The mobile web is more important than mobile apps.' 'Start slowly, direction is more important than speed.' 'Imagination is more important than knowledge. - Albert Einstein' ...I installed it (download the zip file, extract it and do "python setup.py install"); then tried it out with the above test program and a few variations on it. It partially works; i.e. it's able to fetch some tweets, but in some cases it gives errors that seem to be related to Unicode.
It also has an NLP module for English and a few other languages, plus some other stuff.
UPDATE:
It is now working. Got it to fetch these recent tweets of mine (from my @vasudevram Twitter profile):
IGNORE THIS (testing a Twitter tool). test===444 IGNORE THIS (testing a Twitter tool). test===333 IGNORE THIS (testing a Twitter tool). test===222 IGNORE THIS (testing a Twitter tool). test===111
- Vasudev Ram - Dancing Bison Enterprises
No comments:
Post a Comment