Monday, July 27, 2009

Python Generators - examples and applications

Python generators are very handy - albeit a little hard to understand if you have never worked with them before. It took me a while to find out what it is they actually do, and even longer to figure out a use for one.

If you don't know what a generator is, or what one can do, this is the definition I came up with while learning about them: A generator is like an extended function that remembers that state it was in the last time it was called, and will continue from there using that same state. Generators look a lot like regular functions, but they have that characteristic "yield" keyword that sets them apart from conventional functions.

Here is a simple example:

# Generator example
def printName(name):
for section in name.split(' '):
yield section

for section in printName("Guido van Rossum"):
print section


All this generator does is it takes a name and splits it into the different sections that are separated by spaces. It's output will look like this:










In this example, we are treating the generator like an iterator - which is a very common usage of generators, but definitely not the only usage.

Another way we can use a generator is by assigning it to another variable - then, every time we want the next value "yielded" by the generator, we call "<variablename>.next()".


def getNextWordGenerator():
yield "Hello"
yield "this"
yield "is"
yield "an"
yield "example"

generator = getNextWordGenerator()

print generator.next()
print generator.next()
print generator.next()
print generator.next()
print generator.next()


Running this will give us:











But, if we try to call "generator.next()" in this example, we will get an error. This is because there is nothing left in the generator to yield. This may be what you want to happen, but maybe not. Sometimes you would rather have it start all over again. In that case, you can just put all of your yield statements in a "while True:" loop, like so:


def getNextWordGenerator():
while True:
yield "Hello"
yield "this"
yield "is"
yield "an"
yield "example"

generator = getNextWordGenerator()

print generator.next()
print generator.next()
print generator.next()
print generator.next()
print generator.next()
print
print 'Second time around, but only a little'
print generator.next()
print generator.next()



The output this time around will look like this:











This is just the beginning, however - you can do more complicated and useful things (of course, that depends on your definition of useful).

Lets says, for example, you are a member of Project Euler and you need a function that will spit out Fibonacci numbers. There are a few ways of calculating Fibonacci numbers, but I think that Python generators is the easiest, and fastest way. Here is an example:


def fib():
x,y = 1,1
while True:
yield x
x,y = y, x+y

for num in fib():
print num

Output:


























Running this will print out Fibonacci numbers at an alarming rate. You will have to hit CTRL-C to kill the script, or it will keep on going forever - and fast. I felt...odd...putting a "while True:" loop in my code for this first time when doing this. That seemed like something I should avoid. Don't worry though - it is not going to peg your processor or anything - the loop will only be run when called. It will not be run in he background without you knowing about it.

This example also acts as a reminder that a generator can remember it's state. In this case, the generator keeps track of the fact that it is in a loop, and it remembers the values in "x" and "y". So, the next time fib().next() is called by the iterator (this all happens automatically in the for loop) it doesn't start over at the top of the generator with "x,y = 1,1", but in the loop where it last left off. Very handy.

So, how can this help in every day scripting situations? Well, I am not too sure. I have found them very useful in solutions to Project Euler problems, and I used a generator in a school project to return all valid adjacent points to a point that I passed into the generator - but other than that, I haven't found a great reason to use them frequently in every day situations. But, there are some libraries and other built in functions of Python that do use them heavily. For example - xrange uses a "generator" rather than creating a list first in memory like "range" does. "count" in the "itertools" uses a generator to give you the next number in sequence for as long as you want.

You can use generators to simulate continuation programming - which also takes some time getting used to, but is pretty neat when you see it. Here is an example of that:


def fib():
x,y = 1,1
while True:
yield x
x,y = y, x+y

def odd(seq):
for number in seq:
if number % 2:
yield number

def underFourMillion(seq):
for number in seq:
if number > 4000000:
break
yield number

print sum(odd(underFourMillion(fib())))


This program sums all of the odd Fibonacci numbers that are under 4 million, but at no point does it store anything in a data structure. The "sum" built-in method will add numbers coming from the "odd" generator, which yields any odd numbers coming from the "underFourMillion" generator, which yields any number that is under 4 million coming from the "fib" generator. Neat. It is easy to change any part of this program, or even add another "filter" generator to the mix.

Do you have any other uses for generators? Share them in the comments.
blog comments powered by Disqus