By Vasudev Ram
String image attribution
Recently, something I read made me think of writing a simple program to generate all substrings of a given string.
(To be precise, excluding the null string.)
Here is an initial version I came up with, all_substrings.py:
""" all_substrings.py Function and program to find all the substrings of a given string. Author: Vasudev Ram Copyright 2018 Vasudev Ram Web site: https://vasudevram.github.io Blog: https://jugad2.blogspot.com Twitter: https://mobile.twitter.com/vasudevram Product Store: https://gumroad.com/vasudevram """ from __future__ import print_function import sys from error_exit import error_exit from debug1 import debug1 def usage(): message_lines = [\ "Usage: python {} a_string".format(sa[0]), "Print all substrings of a_string.", ] sys.stderr.write("\n".join(message_lines)) def all_substrings(s): """ Generator function that yields all the substrings of a given string. """ ls = len(s) if ls == 0: usage() error_exit("\nError: String argument must be non-empty.") start = 0 while start < ls: end = start + 1 while end <= ls: debug1("s[{}:{}] = {}".format(start, end, s[start:end])) yield s[start:end] end += 1 start += 1 def main(): if lsa != 2: usage() error_exit("\nError: Exactly one argument must be given.") for substring in all_substrings(sa[1]): print(substring) sa = sys.argv lsa = len(sa) if __name__ == "__main__": main()Some runs and output of the program:
With no command-line arguments:
$ python all_substrings.py Usage: python all_substrings.py a_string Print all substrings of a_string. Error: Exactly one argument must be given.With one command-line argument, an empty string:
$ python all_substrings.py "" Usage: python all_substrings.py a_string Print all substrings of a_string. Error: String argument must be non-empty.Now with a 3-character string, with debugging enabled, via the use of my debug1 debugging function [1] (and Python's __debug__ built-in variable, which is set to True by default):
$ python all_substrings.py abc s[0:1] = a a s[0:2] = ab ab s[0:3] = abc abc s[1:2] = b b s[1:3] = bc bc s[2:3] = c c[1] You can read about and get the code for that debugging function here:
Improved simple Python debugging function
The remaining runs are with debugging turned off via Python's -O flag:
With a 4-character string:
$ python -O all_substrings.py abcd a ab abc abcd b bc bcd c cd dWith a 4-character string, not all characters unique:
$ python -O all_substrings.py FEED F FE FEE FEED E EE EED E ED DNote that when there are duplicated characters in the input, we can get duplicate substrings in the output; in this case, E appears twice.
With a string of length 6, again with some characters repeated (E and D):
$ python -O all_substrings.py FEEDED F FE FEE FEED FEEDE FEEDED E EE EED EEDE EEDED E ED EDE EDED D DE DED E ED DAgain, we get duplicate substrings in the output.
With a 6-character string, no duplicate characters:
$ python -O all_substrings.py 123456 1 12 123 1234 12345 123456 2 23 234 2345 23456 3 34 345 3456 4 45 456 5 56 6Is there any other way of doing it?
Any interesting enhancements possible?
Yes to both questions.
I will cover some of those points in a follow-up post.
Actually, I already did one thing in the current version, which is of interest: I used a generator to yield the substrings lazily, instead of creating them all upfront, and then returning them all in a list. I'll show and discuss a few pros and cons of some other approaches later.
Meanwhile, want to have a bit of fun with visual effects?
Try some variations of runs of the program like these:
python -O all_substrings.py +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
python -O all_substrings.py /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
python -O all_substrings.py "%% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $"
$ python -O all_substrings.py 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
python -O all_substrings.py ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"
You can change the characters used in the string argument to any combination of any punctuation characters, or even letters or digits - anything you like. You can also vary the number of characters used in the string. Longer ones (though not too long) tend to give better visual effects and the display also lasts for longer. Note that all the characters of the string you use, should come on the same single line, the same line as the python command you use. Also, if using a pipe character (|) (or any other characters that are special to your OS shell), enclose the whole string in quotes as I have done in an example above. I ran this on Windows and so used double quotes for such cases. Single quotes give errors. On Unix-like systems, either may work, but some characters may get interpreted inside double quotes. Experiment :)
You can also add an import time statement in the imports section of the program, and then use a time.sleep(number) inside the for loop, say, just above the print(substring) statement. I used values like:
time.sleep(0.002)which works well for my display. You can tweak that number for your hardware.
- Have fun.
Did you know that there are a large number of meanings and contexts for the word string? Here are some of them:
String (Wikipedia).
This Wikipedia article about strings in computer science is interesting, and has a lot more points than one might imagine at first:
(computer) strings
- Vasudev Ram - Online Python training and consultingHit the ground running with my vi quickstart tutorial, vetted by two Windows system administrator friends.Jump to posts: Python * DLang * xtopdfInterested in a Python, SQL or Linux course?Get WP Engine, powerful managed WordPress hosting.Subscribe to my blog (jugad2.blogspot.com) by emailMy ActiveState Code recipes
Follow me on:Gumroad * LinkedIn * TwitterDo you create online products? Get Convertkit:Email marketing for digital product creators
Just use itertools:
ReplyDeletelist((x[i:j] for i, j in itertools.combinations(range(len(x) + 1), 2)))
returns all substrings of the string x
:-)
ReplyDeleteJust saw the comment now, some Google issue maybe.
Clever. But that list()-based approach may crash for a very large string, where the list of all its substrings cannot fit into memory together, while the generator-based approach may work for the same sized string, because it yields the substrings one at a time.
It's a generator expression wrapped in a list() call. If you want a bare iterator, just remove the list() call:
ReplyDelete(x[i:j] for i, j in itertools.combinations(range(len(x) + 1), 2))
Oh, yes, missed that on first look, though I knew about genexps. Good one then.
ReplyDeleteNeed to up my FP-fu, thanks.