Tuesday, July 28, 2009

Python - Opening URLs in a webbrowser

The other day at work I needed to test a web page that I had created. The page was a form that would allow certain logged in users to suggest changes to profile information that we have on a group of people. The form data was pulled from a database, and the page knew whose information to pull based on a URL parameter called "mem" - which contained the numeric primary key of the directory member's record in the database. There were several places where the page could crash, based on what was coming our of the database, as the original creator didn't conform to a standard way of representing a "NULL" value. Some fields held NULL, some 0, some an empty string - and some columns could be any of those 3, but all meant the same thing. To test the page, I would simply change the "mem" parameter to pull up someone else's data - this would help me find any other bugs that I may have overlooked. The page would crash, but in our testing environment the error message (detailing the error and the line number where it occurred) is embedded within the page. I wanted to be completely thorough with this testing, but there are several hundred members of this directory, and trying every single possibility by hand would have taken a VERY long time.

Setting up a proper testing environment with unit tests and everything would be ideal - but where I work that sadly isn't standard practice, and we don't have a good way of doing that. So, I improvised by writing a quick script in Python to test every single possibility for me. First, I grabbed a list of all the primary keys of the members in the database, and then set up a URL opener using a HTTPCookieProcessor, so I could get into the protected page. I looped through the list, assigning the "mem" parameter to the primary keys in the list, and checked for a HTTPError, which would happen only if the page crashed (HTTPError is thrown when there is a 501 Server Error).

Since there was a server error, the URL opener wouldn't let me download the page to process the error message - so I used the Python webbrowser module to simply open the page up in Firefox for me - that way I could look at the page myself and examine the error. Since the page I was testing is in a protected area of the site, I needed to first open up Firefox and log in - Python can't pass cookie data to Firefox. After that, all I needed to do was run the script. All the pages that had problems would open as new tabs in my currently running Firefox instance.

Here is the script (obviously changed since I can't be giving out login credentials):


import urllib2, urllib
import webbrowser
testMemberIds = []

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

loginParams = urllib.urlencode( {
'login' : '',
'password' : '',
} )
opener.open( 'http://', loginParams)

for memNum in testMemberIds:
print "Trying {0}...".format(memNum),
try:
opener.open("?mem={0}".format(memNum))
except urllib2.HTTPError, e:
print "Server error! - Member: {0}".format(memNum)
webbrowser.open("?mem={0}".format(memNum))


This is a pretty hackish way of testing a site, but it got the job done for me. It would have taken hours to test every possible primary key in the list by hand, but this script only took a a few minutes to throw together, and I found all the errors that could possibly come up (with the data currently in the database that is) and fixed - all in less than 20 minutes.

This is the first time I used the webbrowser module to do anything truly helpful to me, but it got me thinking about some other uses for it - possibly opening up separate tabs for items that match certain criteria on eBay or CraigsList. Just a thought - might be something to play with later on.

Monday, July 27, 2009

Python Generators - examples and applications

Python generators are very handy - albeit a little hard to understand if you have never worked with them before. It took me a while to find out what it is they actually do, and even longer to figure out a use for one.

If you don't know what a generator is, or what one can do, this is the definition I came up with while learning about them: A generator is like an extended function that remembers that state it was in the last time it was called, and will continue from there using that same state. Generators look a lot like regular functions, but they have that characteristic "yield" keyword that sets them apart from conventional functions.

Here is a simple example:

# Generator example
def printName(name):
for section in name.split(' '):
yield section

for section in printName("Guido van Rossum"):
print section


All this generator does is it takes a name and splits it into the different sections that are separated by spaces. It's output will look like this:










In this example, we are treating the generator like an iterator - which is a very common usage of generators, but definitely not the only usage.

Another way we can use a generator is by assigning it to another variable - then, every time we want the next value "yielded" by the generator, we call "<variablename>.next()".


def getNextWordGenerator():
yield "Hello"
yield "this"
yield "is"
yield "an"
yield "example"

generator = getNextWordGenerator()

print generator.next()
print generator.next()
print generator.next()
print generator.next()
print generator.next()


Running this will give us:











But, if we try to call "generator.next()" in this example, we will get an error. This is because there is nothing left in the generator to yield. This may be what you want to happen, but maybe not. Sometimes you would rather have it start all over again. In that case, you can just put all of your yield statements in a "while True:" loop, like so:


def getNextWordGenerator():
while True:
yield "Hello"
yield "this"
yield "is"
yield "an"
yield "example"

generator = getNextWordGenerator()

print generator.next()
print generator.next()
print generator.next()
print generator.next()
print generator.next()
print
print 'Second time around, but only a little'
print generator.next()
print generator.next()



The output this time around will look like this:











This is just the beginning, however - you can do more complicated and useful things (of course, that depends on your definition of useful).

Lets says, for example, you are a member of Project Euler and you need a function that will spit out Fibonacci numbers. There are a few ways of calculating Fibonacci numbers, but I think that Python generators is the easiest, and fastest way. Here is an example:


def fib():
x,y = 1,1
while True:
yield x
x,y = y, x+y

for num in fib():
print num

Output:


























Running this will print out Fibonacci numbers at an alarming rate. You will have to hit CTRL-C to kill the script, or it will keep on going forever - and fast. I felt...odd...putting a "while True:" loop in my code for this first time when doing this. That seemed like something I should avoid. Don't worry though - it is not going to peg your processor or anything - the loop will only be run when called. It will not be run in he background without you knowing about it.

This example also acts as a reminder that a generator can remember it's state. In this case, the generator keeps track of the fact that it is in a loop, and it remembers the values in "x" and "y". So, the next time fib().next() is called by the iterator (this all happens automatically in the for loop) it doesn't start over at the top of the generator with "x,y = 1,1", but in the loop where it last left off. Very handy.

So, how can this help in every day scripting situations? Well, I am not too sure. I have found them very useful in solutions to Project Euler problems, and I used a generator in a school project to return all valid adjacent points to a point that I passed into the generator - but other than that, I haven't found a great reason to use them frequently in every day situations. But, there are some libraries and other built in functions of Python that do use them heavily. For example - xrange uses a "generator" rather than creating a list first in memory like "range" does. "count" in the "itertools" uses a generator to give you the next number in sequence for as long as you want.

You can use generators to simulate continuation programming - which also takes some time getting used to, but is pretty neat when you see it. Here is an example of that:


def fib():
x,y = 1,1
while True:
yield x
x,y = y, x+y

def odd(seq):
for number in seq:
if number % 2:
yield number

def underFourMillion(seq):
for number in seq:
if number > 4000000:
break
yield number

print sum(odd(underFourMillion(fib())))


This program sums all of the odd Fibonacci numbers that are under 4 million, but at no point does it store anything in a data structure. The "sum" built-in method will add numbers coming from the "odd" generator, which yields any odd numbers coming from the "underFourMillion" generator, which yields any number that is under 4 million coming from the "fib" generator. Neat. It is easy to change any part of this program, or even add another "filter" generator to the mix.

Do you have any other uses for generators? Share them in the comments.

Tuesday, July 21, 2009

Redirect Python output

EDIT: Updated scripts to use the new login system described here.

Today at work I was working on a script to create some reports for one of my superiors. During my testing I was outputting everything to the screen so I could view it quickly and easily. My boss needed me to send him a file containing the data I was generating, and I didn't look forward to changing all of my print statements (
my script had many, many print statements) to something like "file.write(whateverwasbeingprintedbefore)". I could have done this, of course (and it could have been done in one easy step with the help of RegexBuddy) but that still seemed too inflexible of a solution. Python has a very simple solution for this problem, and it just takes an extra two lines of code.

My work example would have been boring here, so I thought up something else useful. Using my Google Voice Login class, I created another script to print out whatever is in my SMS inbox (up to the first 10) in a semi decent format. I used a couple of regular expressions, which I could not have created without RegexBuddy, to parse the page for the information I wanted. Here is that script (which requires the gvoice.py module I wrote):


# Print SMS inbox
from gvoice import GoogleVoiceLogin
import urllib
import re
import getpass

# Create an instance of a GoogleVoiceLogin object
# This will prompt you for your Google Account credentials
email = raw_input('Enter your Google Username: ')
password = getpass.getpass('Enter your password: ')

gv_login = GoogleVoiceLogin(email, password)

# Create our regular expressions (Created in RegexBuddy)
# Regular Expression to gather information on each SMS Conversation
sms_conversation_regex = re.compile(
r"""<span\sclass="gc-message-name">\s* # Get the message conversation info div
<span\sclass="">([^<]*?)</span>\s* # Get who originated the conversation
<span.*?</span>.*? # Eat this div - we don't need it
.*?>([^<]*).*? # Get who the conversation was started with
<div\sclass="gc-message-message-display">\s*
(<div\sclass="gc-message-sms-row">.*? # Get all the messages of the conversation
</div>\s*)* # Close up our divs
</div>""",
re.DOTALL | re.VERBOSE)

# Regular expression to gather information on individual messages within an SMS conversaion
sms_details_regex = re.compile(
r"""<div\sclass="gc-message-sms-row">.*? # Get up the message div
<span\sclass="gc-message-sms-from">\s*(.*?)\s* # Get who the message was from
</span>.*?
<span\sclass="gc-message-sms-text">\s*([^<]*)\s*.*?</div> # Get the message data""",
re.DOTALL | re.VERBOSE)

# Get the open (Cookie data still intact)
opener =gv_login.opener

sms_inbox_content = opener.open("https://www.google.com/voice/inbox/recent/sms/").read()

# Nested for-loops of regular expressions - not the best way, but the easiest for now.
for sms_conversation_match in sms_conversation_regex.finditer(sms_inbox_content):
print
print "---{0} to {1} {2}".format(sms_conversation_match.group(1), sms_conversation_match.group(2), '-' * 50)
for message_details_match in sms_details_regex.finditer(sms_conversation_match.group()):
print "{0}: {1}".format(message_details_match.group(1), message_details_match.group(2))


Running this script will show you the first 10 SMS conversations in your inbox, if there are any. But, it is limited right now to displaying the information on the screen. Using these next two lines, you can put all of this information in a file of your choice:


import sys

sys.stdout = open("SMS Conversations.txt", "w")



To get this to work in the SMS inbox script, I insert the "import sys" command right below the other imports. The "sys.stdout" line goes right below the line where I instantiate the GoogleVoiceLogin object. I deliberately do that, otherwise I won't be able to see the prompt, because it will be written to the file!

This is a very easy way to change the output of your program - I hope you find it as useful as I did.

Here is the final script:

# Print SMS inbox
from gvoice import GoogleVoiceLogin
import urllib
import re
import sys
import getpass

# Create an instance of a GoogleVoiceLogin object
# This will prompt you for your Google Account credentials
email = raw_input('Enter your Google Username: ')
password = getpass.getpass('Enter your password: ')

gv_login = GoogleVoiceLogin(email, password)

sys.stdout = open("SMS Conversations.txt", "w")

# Create our regular expressions (Created in RegexBuddy)
# Regular Expression to gather information on each SMS Conversation
sms_conversation_regex = re.compile(
r"""<span\sclass="gc-message-name">\s* # Get the message conversation info div
<span\sclass="">([^<]*?)</span>\s* # Get who originated the conversation
<span.*?</span>.*? # Eat this div - we don't need it
.*?>([^<]*).*? # Get who the conversation was started with
<div\sclass="gc-message-message-display">\s*
(<div\sclass="gc-message-sms-row">.*? # Get all the messages of the conversation
</div>\s*)* # Close up our divs
</div>""",
re.DOTALL | re.VERBOSE)

# Regular expression to gather information on individual messages within an SMS conversaion
sms_details_regex = re.compile(
r"""<div\sclass="gc-message-sms-row">.*? # Get up the message div
<span\sclass="gc-message-sms-from">\s*(.*?)\s* # Get who the message was from
</span>.*?
<span\sclass="gc-message-sms-text">\s*([^<]*)\s*.*?</div> # Get the message data""",
re.DOTALL | re.VERBOSE)

# Get the open (Cookie data still intact)
opener =gv_login.opener

sms_inbox_content = opener.open("https://www.google.com/voice/inbox/recent/sms/").read()

# Nested for-loops of regular expressions - not the best way, but the easiest for now.
for sms_conversation_match in sms_conversation_regex.finditer(sms_inbox_content):
print
print "---{0} to {1} {2}".format(sms_conversation_match.group(1), sms_conversation_match.group(2), '-' * 50)
for message_details_match in sms_details_regex.finditer(sms_conversation_match.group()):
print "{0}: {1}".format(message_details_match.group(1), message_details_match.group(2))
print "---{0} to {1} {2}".format(sms_conversation_match.group(1), sms_conversation_match.group(2), '-' * 50)
for message_details_match in sms_details_regex.finditer(sms_conversation_match.group()):
print "{0}: {1}".format(message_details_match.group(1), message_details_match.group(2))

Python - Google Voice part 2

EDIT: Google recently made some behind the scenes changes to the login page, which broke this script. Please see the new and improved script here.

My last post was also about accessing your Google Voice account through Python, but there is more that you can do with it than just send SMS messages. You can make calls, cancel calls, view your call/sms/voicemail history and a few more others, as pointed out by this blogger here while talking about his Firefox plugin.

Since all of these options require you to be logged in, I thought that it would be easier to create a separate class that would:
1) Log you in and let you know whether or not your attempt was successful
2) Provide a method to get the "opener" (which is what keeps cookie data in order during multiple requests),
3) Provide a method to get your "_rnr_se" value, which is required when sending SMS messages, making calls, and canceling calls.

Instead of hardcoding in Google Account credentials like in my last post, the new class will prompt you for your Google Account user name and password. Since I want to use this script in public, I used the getpass module to hide the input as I type it. Here is the script:


# Get URL handling support
import urllib2, urllib
# Get regular expression support
import re, getpass

class GoogleVoiceLogin:
def __init__(self):
print "Please enter your Google Account credentials"
self.email = raw_input("User name: ")
self.password = getpass.getpass("Password: ")

# Set up an opener with HTTPCookieProcessor
self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(self.opener)

# Set up login credentials for Google Accounts
# The 'continue' param redirects us to the Google Voice
# homepage, and gives us necessary cookie info
loginParams = urllib.urlencode( {
'Email' : self.email,
'Passwd' : self.password,
'continue' : 'https://www.google.com/voice/account/signin'
} )

# Perform the login. Cookie info sent back will be saved, so we remain logged in
# for future requests when using the opener
self.opener.open( 'https://www.google.com/accounts/ServiceLoginAuth', loginParams)

# Need to load the homepage to find user specific data
googleVoiceHomeData = self.opener.open('https://www.google.com/voice/#inbox').read()

# Go through the home page and grab the value for the hidden
# form field "_rnr_se", which must be included when sending texts and dealing with calls
match = re.search('name="_rnr_se".*?value="(.*?)"', googleVoiceHomeData)
if not match:
print "Login Unsuccessful!"
exit()
else:
print "Login Successful!"
self._rnr_se = match.group(1)

def getOpener(self):
return self.opener

def getRnrSe(self):
return self._rnr_se


The script, for the most part, should be self-explanatory. It pretty much follows the same pattern as the script in my previous post except it is more reusable. For example, here is the new script I use to send text messages from the command line:


# Send a text Message
from GoogleVoiceLogin import GoogleVoiceLogin
import urllib

# Create an instance of a GoogleVoiceLogin object
# This will prompt you for your Google Account credentials
gvLogin = GoogleVoiceLogin()

# Get the open (Cookie data still intact)
opener = gvLogin.getOpener()

# Get the _rnr_se value
_rnr_se = gvLogin.getRnrSe()

# Prompt for Text Message details
phoneNumber = raw_input("Destination number: ")
text = raw_input("Message to send: ")

# Insert blank line
print

# Set up parameters for sending text
sendTextParams = urllib.urlencode({
'_rnr_se': _rnr_se,
'phoneNumber': phoneNumber,
'text': text
})

# Send the text, display status message
response = opener.open('https://www.google.com/voice/sms/send/', sendTextParams)
if "true" in response.read():
print "Message successfully sent!"
else:
print "Message failed!"
response.close()

Similar scripts could easily be created to call people, or perform other tasks on your account.

Monday, July 13, 2009

Google Voice - Python - SMS

EDIT: Google recently made some behind the scenes changes to the login page, which broke this script. Please see the new and improved script here.

After a long wait, I finally received my Google Voice invitation last weekend. It offers a lot of features, but one that I was really excited about was the free SMS service. I don't have a cell phone, but nearly all my family members, friends and co-workers do. The majority of them are more prone to check their phone than their email, so sending a text is the best way to contact them. For a while, I used Gmail's SMS Lab to get in contact with the leaders and the Scouts of a Boy Scout troop that I participate in, but Gmail shut that down recently.

After I got my account, I began thinking about how neat it would be to be able to set up a script to send reminder text messages for me (preferably scheduled) using my account. I searched all over the internet to find out if there was an API provided by Google to do this -like their excellent gdata API - but no luck. I didn't give up there, there had to be a way.

A while ago, I learned that you can use HTTPCookieProcessor() from the urllib2 module to create an opener that will keep track of Cookie data sent back and forth from a server. So, with the help of Firebug (the best friend of many web-developers) I looked at the headers and POST data being sent when logging in to Google Voice, and when sending a text message. Turns out, it isn't very hard to log into your account, grab some necessary information from the Google Voice Home page (the inbox) and send a text.

This script will do just that one thing - log in your account, load the homepage to get a hidden form field's value, and then send one text message to one number. It will still show up in your outbox, so you aren't losing anything by using this method. This ability opens up a lot of possibilities, especially when coupled with the Gdata Contacts API. I have a few little scripts that I will be creating to send out weekly reminders to other people - and might use Google App Engine to make it even more useful and automated.


# Get URL handling support
import urllib2, urllib
# Get regular expression support
import re

# Google Account login credentials
email = 'YOUR_EMAIL'
password = 'YOUR_PASSWORD'

# Text message details
sendToNumber = 'NUMBER_TO_SEND_A_TEXT_TO'
messageToSend = 'I used Python to send this!'

# Set up an opener with HTTPCookieProcessor
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

# Set up login credentials for Google Accounts
# The 'continue' param redirects us to the Google Voice
# homepage, and gives us necessary cookie info
loginParams = urllib.urlencode( {
'Email' : email,
'Passwd' : password,
'continue' : 'https://www.google.com/voice/account/signin',
} )

# Perform the login. Cookie info sent back will be saved, so we remain logged in
# for future requests when using the opener
opener.open( 'https://www.google.com/accounts/ServiceLoginAuth', loginParams)

# Need to load the homepage to find user specific data
googleVoiceHomeData = opener.open('https://www.google.com/voice/#inbox').read()

# Go through the home page and grab the value for the hidden
# form field "_rnr_se", which must be included when sending texts
match = re.search('name="_rnr_se".*?value="(.*?)"', googleVoiceHomeData)
_rnr_se = match.group(1)

# Set up parameters for sending text
sendTextParams = urllib.urlencode({
'_rnr_se': _rnr_se,
'phoneNumber': sendToNumber,
'text': messageToSend
})

# Send the text, store the return value
f = opener.open('https://www.google.com/voice/sms/send/', sendTextParams)
data = f.read()
print data
f.close()


Right now, if it succeeds nothing spectacular happens - it simply prints out the response: {"ok":true,"data":{"code":0}}. If it fails, {"ok":false,"data":{"code":20}}. Just look at the "ok" value to see if it worked.

You may notice that I don't have much (any) error checking in the script, nor do I usually when putting together scripts quickly just to test an idea out. When I do create something actually useful using this above script, I will put in the appropriate "checker" code. Just be aware that a text message can be 160 characters long, if you go over the remaining characters will be sent in a separate message.

Have fun, and don't be evil.

Friday, July 10, 2009

Syntax Highlighting - quick and easy

Edit: I recently made my life easier by switching over to http://code.google.com/p/google-code-prettify/ for syntax highlighting.

Since this blog is mainly about Python examples, I needed a way to post syntax highlighted Python code. There were a few solutions, including a hosted javascript option. I didn't like that one too much for a few reasons. After short struggle, I was able to install Pygments on my computer which is a Python module that can apply syntax highlighting to many different programming languages' source. I wanted a quick and easy way of writing Python code in my favorite text editor, Notepad++, and getting it up on this blog. I thought it would be neat to create a syntax_highlighter.py script, and use the NppExec plug-in to run the syntax highlighter on the current document, and then paste the results on the clipboard. To do that last part, I had to download and install the pywin32 library (you can find that on sourceforge). Here is my final script:


from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
import win32clipboard
import sys,re

# Replace tab characters with 4 spaces for better look on webpage
def replaceTabs(matchobj):
return " "

# Open up and store the souce file contents
rawCode = open(sys.argv[1], "r").read()

# Apply syntax highlighting
highlightedCode = highlight(rawCode, PythonLexer(), HtmlFormatter())

# Replace tabs with spaces
finalCode = re.sub("\t", replaceTabs, highlightedCode)

win32clipboard.OpenClipboard()
win32clipboard.EmptyClipboard()
win32clipboard.SetClipboardText(finalCode)
win32clipboard.CloseClipboard()


This will put on my clipboard code that has been surrounded with the appropriate html tags to make it appear nicely formatted and highlighted. I created another function to replace all tabs with 4 spaces to make it look better (I found it easier to do that with the regular expression module).

If you use Notepad++, I used this to run the script on the current open file:
python "FULL_PATH_TO_YOUR_HIGHLIGHTING_SCRIPT" "$(FULL_CURRENT_PATH)"
The first argument should be the full path to the syntax_highlight.py script - the next ,"$(FULL_CURRENT_PATH)", is a standard variable that the NppExec plug-in uses to define the current open file. Once run, the result will be on your clipboard ready to be pasted. Once set up, having it ready to paste on this blog is a CTRL-F6 keystroke away. This could work for any language that Pygments supports.

The clipboard portion of the script was a lot easier than I thought that it would be. The pywin32 library has many useful features in it - I know that similar solutions exist for Linux.

NOTE: I ran

from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

print HtmlFormatter().get_style_defs('.highlight')


to get my CSS rules, which I was able to paste in my template.

What is this?

Many of my co-workers and class-mates have been asking me what Python can actually be used for, so I decided to start this blog demonstrating some of the ways that I have been employing its power and simplicity.

I am by no means a Python expert, but I like it quite a bit. I have been using it to perform little tasks at work and school for a while now and hope that by seeing some of my examples, you can see what it might be able to do for you. Each post here will contain a code snippet that I have created to help perform some task. I won't be really talking much about events happening in the Python community - there are plenty of other blogs for that. Here is a blog of simple, practical examples that might come in handy.