Wednesday, April 18, 2018

Support the PSF during the 2018 Fundraising Drive

By Vasudev Ram

Support the PSF during the 2018 Fundraising Drive

Excerpt from the above post by the PSF (Python Software Foundation):

[ The PSF is launching an exciting fundraising drive with a goal of raising $20,000.00 USD by May 12th. The drive begins April 16, 2018 and concludes at PyCon on May 12th.

Your donations help the Python community worldwide by supporting sprints, meetups, community events and projects, the Python Ambassador Program, fiscal sponsorships, and of course, software development and open source projects. All of these initiatives help improve the Python community and Python tools that you use daily. The work cannot be done without the generous financial support that individuals and organizations provide us.

It is easy to donate - simply click on the amount you would like to give, and enter your email address. Confirm your contribution and you will be able to pay with your PayPal account or a credit or debit card. Contributions are tax deductible for individuals and organizations in the United States. ]

Note: The original PSF post about the donation drive, linked above, did not have a clickable link. There is a link for the donation in that post, but it was plain text, not clickable, at the time when I wrote this post. (That might change by the time you read my post, if they notice and fix it.) If not, you will have to copy-and-paste that link manually into a browser tab and then it will take you to another page where you can donate.

@PSF: Please fix that link.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Sunday, April 15, 2018

compilerbook.org - Introduction to Compilers and Language Design by Prof. Douglas Thain

By Vasudev Ram

Came across this book today:

Introduction to Compilers and Language Design (compilerbook.org)

(a free online textbook by Douglas Thain)

Prof. Douglas Thain is Associate Professor, Computer Science and Engineering, University of Notre Dame, USA.

Excerpts from the page:

[ This online textbook is being released chapter-by-chapter during 2017. The complete book will be available for purchase in the spring 2018 semester.

This textbook is suitable for a one semester undergraduate course in compilers. Guided by this book, students can undertake construction of a compiler which accepts a C-like language and produces working X86 code. The textbook and materials have been developed by Prof. Douglas Thain as part of the CSE 40243 compilers class at the University of Notre Dame.

You are free to download, use, and print these PDFs for personal and academic use. Commercial printing or distribution is prohibited. Instead of copying PDFs, please point students to this page (compilerbook.org) so that they can access the latest version. If you enjoy holding a physical book (like I do!) you will be able to order an inexpensive hardcover edition in 2018. ]

I just read a bit of the book so far, but it seems quite good.

- Vasudev Ram - Online Python training and consulting

Get fast reliable hosting with A2Hosting.com

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Do you create and sell digital products? Get Convertkit:

Email marketing for online creators



Quick-and-dirty disk free space checker for Windows

By Vasudev Ram


'I mean, if 10 years from now, when you are doing something quick and dirty, you suddenly visualize that I am looking over your shoulders and say to yourself "Dijkstra would not have liked this", well, that would be enough immortality for me.'

Dijkstra quote attribution

Hi readers,

[ This is the follow-up post that I said I would do after this previous post: Quick-and-clean disk usage utility in Python. This follow-up post describes the quick-and-dirty version of the disk space utility, which is the one I wrote first, before the quick-and-clean version linked above. Note that the two utilities do not give the exact same output - the clean one gives more information. Compare the outputs to see the difference. ]

I had a need to periodically check the free space on my disks in Windows. So I thought of semi-automating the process and came up with this quick-and-dirty utility for it. It used the DOS DIR command, a grep utility for Windows, and a simple Python script, all together in a pipeline, with the Python script processing the results provided by the previous two.

I will first show the Python script and then show its usage in a command pipeline together with the DIR command and a grep command. Then will briefly discuss other possible ways of doing this same task.

Here is the Python script, disk_free_space.py:

from __future__ import print_function
import sys

# Author: Vasudev Ram
# Copyright 2018 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram
# Software mentoring: https://www.codementor.io/vasudevram

#for line in sys.stdin:

# The first readline (below) is to read and throw away the line with 
# just "STDIN" in it. We do this because the grep tool that is used 
# before this program in the pipeline (see dfs.bat below), adds a line 
# with "STDIN" before the real grep output.
# Another alternative is to use another grep which does not do that; 
# in that case, delete the first readline statement.
line = sys.stdin.readline()

# The second readline (below) gets the line we want, with the free space in bytes.
line = sys.stdin.readline()

if line.endswith("bytes free\n"):
    words = line.split()
    bytes_free_with_commas = words[2]
    try:
        free_space_mb = int(bytes_free_with_commas.replace(
            ",", "")) / 1024.0 / 1024.0
        free_space_gb = free_space_mb / 1024.0 
        print("{:.1f} MiB = {:.2f} GiB".format(
            free_space_mb, free_space_gb))
    except ValueError as ve:
        sys.stdout.write("{}: Caught ValueError: {}\n".format(
            sys.argv[0], str(ve)))
    #break

An alternative method is to remove the first readline call above, and un-comment the for loop line at the top, and the break statement at the bottom. In that approach, the program will loop over all the lines of stdin, but skip processing all of them except for the single line we want, the one that has the pattern "bytes free". This is actually an extra level of checking that mostly will not be needed, since the grep preceding this program in the pipeline, should filter out all lines except for the one we want.

For why I used MiB and GiB units instead of MB and GB, refer to this article Wikipedia article: Mebibyte

Once we have the above program, we call it from the pipeline, which I have wrapped in this batch file, dfs.bat, for convenience, to get the end result we want:
@echo off
echo Disk free space on %1
dir %1 | grep "bytes free" | python c:\util\disk_free_space.py

Here is a run of dfs.bat to get disk free space information for drive D:\ :
$ dfs d:\
Disk free space on d:\
40103.0 MiB = 39.16 GiB
You can run dfs for both C: and D: in one single command like this:
$ dfs c:\ & dfs d:\
(It uses the Windows CMD operator & which means run the command to the left of the ampersand, then run the command to the right.)

Another way of doing the same task as this utility, is to use the Python psutil library. That way is shown in the quick-and-clean utility post linked near the top of this post. That way would be cross-platform, at least between Windows and Linux, as shown in that post. The only small drawback is that you have to install psutil for it to work, whereas this utility does not need it. This one does need a grep, of course.

Yet another way could be to use lower-level Windows file system APIs directly, to get the needed information. In fact, that is probably how psutil does it. I have not looked into that approach yet, but it might be interesting to do so. Might have to use techniques of calling C or C++ code from Python, like ctypes, SWIG or cffi for that, since those Windows APIs are probably written in C or C++. Check out this post for a very simple example on those lines:

Calling C from Python with ctypes

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get fast reliable hosting with A2Hosting.com

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Sunday, April 8, 2018

Quick-and-clean disk usage utility in Python

By Vasudev Ram


Hard disk image

Hard disk image attribution

Hi readers,

Recently, I thought that I should check the disk space on my PC more often, possibly because of having installed a lot of software on it over a period. As you know, these days, many software apps take up a lot of disk space, sometimes in the range of a gigabyte or more for one app. So I wanted a way to check more frequently whether my disks are close to getting full.

I thought of creating a quick-and-dirty disk free space checker tool in Python, to partially automate this task. Worked out how to do it, and wrote it - initially for Windows only. I called it disk_free_space.py. Ran it to check the disk free space on a few of my disk partitions, and it worked as intended.

Then I slapped my forehead as I realized that I could do it in a cleaner as well as more cross-platform way, using the psutil library, which I knew and had used earlier.

So I wrote another version of the tool using psutil, that I called disk_usage.py.

Here is the code for disk_usage.py:

#----------------------------------------------------------------------
#
# disk_usage.py
#
# Author: Vasudev Ram
# Copyright 2018 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram
# Software mentoring: https://www.codementor.io/vasudevram
#
# Description: A Python app to show disk usage.
# Usage: python disk_usage.py path
#
# For the path given as command-line argument, it shows 
# the percentage of space used, and the total, used and 
# free space, in both MiB and GiB. For definitions of
# MiB vs. MB and GiB vs. GB, see:
# https://en.wikipedia.org/wiki/Mebibyte
#
# Requires: The psutil module, see:
# https://psutil.readthedocs.io/
#
#----------------------------------------------------------------------

from __future__ import print_function
import sys
import psutil

BYTES_PER_MIB = 1024.0 * 1024.0

def disk_usage_in_mib(path):
    """ Return disk usage data in MiB. """
    # Here percent means percent used, not percent free.
    total, used, free, percent = psutil.disk_usage(path)
    # psutil returns usage data in bytes, so convert to MiB.
    return total/BYTES_PER_MIB, used/BYTES_PER_MIB, \
    free/BYTES_PER_MIB, percent

def main():
    if len(sys.argv) == 1:
        print("Usage: python {} path".format(sys.argv[0]))
        print("Shows the disk usage for the given path (file system).")
        sys.exit(0)
    path = sys.argv[1]
    try:
        # Get disk usage data.
        total_mib, used_mib, free_mib, percent = disk_usage_in_mib(path)
        # Print disk usage data.
        print("Disk Usage for {} - {:.1f} percent used. ".format( \
        path, percent))
        print("In MiB: {:.0f} total; {:.0f} used; {:.0f} free.".format(
            total_mib, used_mib, free_mib))
        print("In GiB: {:.3f} total; {:.3f} used; {:.3f} free.".format(
            total_mib/1024.0, used_mib/1024.0, free_mib/1024.0))
    except OSError as ose:
        sys.stdout.write("{}: Caught OSError: {}\n".format(
            sys.argv[0], str(ose)))
    except Exception as e:
        sys.stdout.write("{}: Caught Exception: {}\n".format(
            sys.argv[0], str(e)))

if __name__ == '__main__':
    main()

Here is the output from running it a few times:

On Linux:
$ df -BM -h /
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/precise32-root   79G  5.2G   70G   7% /

$ python disk_usage.py /
Disk Usage for / - 6.8 percent used.
In MiB: 80773 total; 5256 used; 71472 free.
In GiB: 78.880 total; 5.132 used; 69.797 free.

$ df -BM -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       228M   24M  192M  12% /boot

$ python disk_usage.py /boot
Disk Usage for /boot - 11.1 percent used.
In MiB: 228 total; 24 used; 192 free.
In GiB: 0.222 total; 0.023 used; 0.187 free.

On Windows:
$ python disk_usage.py d:\
Disk Usage for d:\ - 59.7 percent used.
In MiB: 100000 total; 59667 used; 40333 free.
In GiB: 97.656 total; 58.268 used; 39.388 free.

$ python disk_usage.py h:\
Disk Usage for h:\ - 28.4 percent used.
In MiB: 100 total; 28 used; 72 free.
In GiB: 0.098 total; 0.028 used; 0.070 free.

I had to tweak the df command invocation to be as you see it above, to make the results of my program and those of df to match. This is because of the difference in calculating MB vs. MiB and GB vs. GiB - see Wikipedia link in header comment of my program above, if you do not know the differences.

So this program using psutil is both cleaner and more cross-platform than my original quick-and-dirty one which was only for Windows, but which did not need psutil installed. Pros and cons for both. I will show the latter program in a following post.

The image at the top of the post is of "a newer 2.5-inch (63.5 mm) 6,495 MB HDD compared to an older 5.25-inch full-height 110 MB HDD".

I've worked some years earlier in system engineer roles where I encountered such older models of hard disks, and also had good experiences and learning in solving problems related to them, mainly on Unix machines, including sometimes using Unix commands and tricks of the trade that I learned or discovered, to recover data from systems where the machine or the hard disk had crashed, and of course, often without backups available. Here is one such anecdote, which I later wrote up and published as an article for Linux For You magazine (now called Open Source For You):

How Knoppix saved the day.

Talk of Murphy's Law ...

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get fast reliable hosting with A2Hosting.com

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Saturday, March 31, 2018

Checking if web sites are online with Python

By Vasudev Ram

Hi readers,

Recently, I thought of writing a small program to check if one or more web sites are online or not. I used the requests Python library with the HTTP HEAD method. I also checked out PycURL for this. It is a thin wrapper over libcurl, the library that powers the well-known and widely used curl command line tool. While PycURL looks powerful and fast (since it is a thin wrapper that exposes most or all of the functionality of libcurl), I decided to use requests for this version of the program. The code for the program is straightforward, but I found a few interesting things while running it with a few different sites as arguments. I mention those points below.

Here is the tool: I named it is_site_online.py:

"""
is_site_online.py
Purpose: A Python program to check if a site is online or not.
Uses the requests library and the HTTP HEAD method.
Tries both with and without HTTP redirects.
Author: Vasudev Ram
Copyright 2018 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
"""

from __future__ import print_function
import sys
import requests
import time

if len(sys.argv) < 2:
    sys.stderr.write("Usage: {} site ...".format(sys.argv[0]))
    sys.stderr.write("Checks if the given site(s) are online or not.")
    sys.exit(0)

print("Checking if these sites are online or not:")
print("   ".join(sys.argv[1:]))

print("-" * 60)
try:
    for site in sys.argv[1:]:
        for allow_redirects in (False, True):
            tc1 = time.clock()
            r = requests.head(site, allow_redirects=allow_redirects)
            tc2 = time.clock()
            print("Site:", site)
            print("Check with allow_redirects =", allow_redirects)
            print("Results:")
            print("r.ok:", r.ok)
            print("r.status_code:", r.status_code)
            print("request time:", round(tc2 - tc1, 3), "secs")
            print("-" * 60)
except requests.ConnectionError as ce:
    print("Error: ConnectionError: {}".format(ce))
    sys.exit(1)
except requests.exceptions.MissingSchema as ms:
    print("Error: MissingSchema: {}".format(ms))
    sys.exit(1)
except Exception as e:
    print("Error: Exception: {}".format(e))
    sys.exit(1)
The results of some runs of the program:

Check for Google and Yahoo!:

$ python is_site_online.py http://google.com http://yahoo.com
Checking if these sites are online or not:
http://google.com   http://yahoo.com
-----------------------------------------------------------
Site: http://google.com
Check with allow_redirects = False
Results:
r.ok: True
r.status_code: 302
request time: 0.217 secs
------------------------------------------------------------
Site: http://google.com
Check with allow_redirects = True
Results:
r.ok: True
r.status_code: 200
request time: 0.36 secs
------------------------------------------------------------
Site: http://yahoo.com
Check with allow_redirects = False
Results:
r.ok: True
r.status_code: 301
request time: 2.837 secs
------------------------------------------------------------
Site: http://yahoo.com
Check with allow_redirects = True
Results:
r.ok: True
r.status_code: 200
request time: 1.852 secs
------------------------------------------------------------
In the cases where allow_redirects is False, google.com gives a status code of 302 and yahoo.com gives a status code of 301. The 3xx series of codes are related to HTTP redirection.

After seeing this, I looked up HTTP status code information in a few sites such as Wikipedia and the official site www.w3.org (the World Wide Web Consortium), and found a point worth noting. See the part in the Related links section at the end of this post about "302 Found", where it says: "This is an example of industry practice contradicting the standard.".

Now let's check for some error cases:

One error case: we do not give an http:// prefix (assume some novice user who is mixed up about schemes and paths), so they type a garbled site name, say http.om:

$ python is_site_online.py http.om
Checking if these sites are online or not:
http.om
------------------------------------------------------------
Traceback (most recent call last):
  File "is_site_online.py", line 32, in 
    r = requests.head(site, allow_redirects=allow_redirects)
[snip long traceback]
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'http.om':
No schema supplied. Perhaps you meant http://http.om?
This traceback tells us that when no HTTP 'scheme' [1][2] is given, requests raises a MissingSchema exception. So we now know that we need to catch that exception in our code, by adding another except clause to the try statement, which I later did, in the program you see in this post. In general, this technique can be useful when using a new Python library for the first time: just don't handle any exceptions in the beginning, use it a few times with variations in input or modes of use, and see what sorts of exceptions it throws. Then add code to handle them.

[1] The components of a URL

[2] Parts of URL

Another error case - a made-up site name that does not exist:

$ python is_site_online.py http://abcd.efg
Checking if these sites are online or not:
http://abcd.efg
------------------------------------------------------------
Caught ConnectionError: HTTPConnectionPool(host='abcd.efg',
port=80): Max retries exceeded with url: / (Caused
by NewConnectionError(': Failed
to establish a new connection: [Errno 11004] getaddrinfo
failed',))
From the above error we can see or figure out a few things:

- the requests library defines a ConnectionError exception. I first ran the above command without catching ConnectionError in the program; it gave that error, then I added the handler for it.

- requests uses an HTTP connection pool

- requests does some retries when you try to get() or head() a URL (a site name)

- requests uses urllib3 (from the Python standard library) under the hood

I had discovered that last point earlier too; see this post:

urllib3, the library used by the Python requests library

And as I mentioned in that post, urllib3 itself uses httplib.

Now let's check for some sites that are misspellings of the site google.com:

$ python is_site_online.py http://gogle.com
Checking ...
------------------------------------------------------------
Site: http://gogle.com
With allow_redirects: False
Results:
r.ok: True
r.status_code: 301
request time: 3.377
------------------------------------------------------------
Site: http://gogle.com
With allow_redirects: True
Results:
r.ok: True
r.status_code: 200
request time: 1.982
------------------------------------------------------------

$ python is_site_online.py http://gooogle.com Checking ... ------------------------------------------------------------ Site: http://gooogle.com With allow_redirects: False Results: r.ok: True r.status_code: 301 request time: 0.425 ------------------------------------------------------------ Site: http://gooogle.com With allow_redirects: True Results: r.ok: True r.status_code: 200 request time: 1.216 ------------------------------------------------------------

Interestingly, the results show that that both those misspellings of google.com exist as sites.

It is known that some people register domains that are similar in spelling to well-known / popular / famous domain names, maybe hoping to capture some of the traffic resulting from users mistyping the famous ones. Although I did not plan it that way, I realized, from the above two results for gogle.com and gooogle.com, that this tool can be used to detect the existence of such sites (if they are online when you check, of course).

Related links:

Wikipedia: List_of_HTTP_status_codes

This excerpt from the above Wikipedia page is interesting:

[ 302 Found This is an example of industry practice contradicting the standard. The HTTP/1.0 specification (RFC 1945) required the client to perform a temporary redirect (the original describing phrase was "Moved Temporarily"),[22] but popular browsers implemented 302 with the functionality of a 303 See Other. Therefore, HTTP/1.1 added status codes 303 and 307 to distinguish between the two behaviours.[23] However, some Web applications and frameworks use the 302 status code as if it were the 303.[24] ]

3xx Redirection

W3C: Status Codes

URL redirection

requests docs: redirection section

IBM Knowledge Center: HTTP Status codes and reason phrases

Enjoy.

- Vasudev Ram - Online Python training and consulting

Sell your digital products online at a low monthly rate with SendOwl

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a creator of online products? Get Convertkit:

Email marketing for professional bloggers