Thursday, August 20, 2009

Google App Engine - Cookie Handling with URL Fetch

I started working on creating a web based solution (on Google App Engine, using the Python API of course) to send mass SMS messages through your Google Voice account this week, but ran into a couple of problems right off the bat during some initial testing. The problem was that I could not log into my Google Account - the response always included a message that my "browser" didn't have cookies enabled. I was confused by this since I was using the exact method and code that I use in my Google Voice Command Line Script, which works perfectly:

self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(self.opener)


Doing this will allow you to use the newly created "opener" object to open any URLs, and any Cookie data that is sent from the server is saved and resubmitted in the headers of each additional request. So, if you are visiting a site that requires authorization, you can send your credentials to the login page, and each subsequent request made with that opener will contain the Cookie info. This lets you access the protected areas/privileges of the site (such as sending SMS messages or making calls with your GV account) without much effort at all - it really is a nice feature of the language.

Well, I set this up in my app, but no luck. So I looked around for a bit and found out that the urllib2, urllib and httplib libraries perform requests using Google's URL fetch service (read more here). This isn't terrible, but the important thing is, is that the "urlfetch" service does NOT handle Cookies, even if you use a HTTPCookieProcessor, or a CookieJar (read more about that here).

This entry isn't a lesson on what Cookie's really are and how they work, but you should know that Cookie information is sent back from the server, and sent to the server in the headers portion of the request/response. Although the urlfetch service does not handle Cookie automatically, it does give you full access to process the header information received from a server, and also what information to send in the headers when making a request to the server. So, I built a separate class that I could use to open up URL's that would handle all Cookie information for me, as well as handle any redirects. The Google Account login system uses redirects when logging in - they forward you to a bunch of different sites to make sure you are actually logged in and that you have the right Cookie info.

Here is the class:

import urllib, urllib2, Cookie
from google.appengine.api import urlfetch

class URLOpener:
def __init__(self):
self.cookie = Cookie.SimpleCookie()

def open(self, url, data = None):
if data is None:
method = urlfetch.GET
else:
method = urlfetch.POST

while url is not None:
response = urlfetch.fetch(url=url,
payload=data,
method=method,
headers=self._getHeaders(self.cookie),
allow_truncated=False,
follow_redirects=False,
deadline=10
)
data = None # Next request will be a get, so no need to send the data again.
method = urlfetch.GET
self.cookie.load(response.headers.get('set-cookie', '')) # Load the cookies from the response
url = response.headers.get('location')

return response

def _getHeaders(self, cookie):
headers = {
'Host' : 'www.google.com',
'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
'Cookie' : self._makeCookieHeader(cookie)
}
return headers

def _makeCookieHeader(self, cookie):
cookieHeader = ""
for value in cookie.values():
cookieHeader += "%s=%s; " % (value.key, value.value)
return cookieHeader



The class is simple, but works for my purposes. The class really has only one method that you should worry about: "open". Lets walk through that method: First, it checks to see if you are posting any data, and if not it makes it a GET request. Then, it starts into it's main loop. The first loop sends the data (if there is any) and some basic header information in the request, then saves the Cookie info it received in the response, then changes the request method to GET. It checks the headers for the "location" value to see if it needs to be redirected - if it does, it keeps going, saving and sending the received Cookie information along the way. Once it is done, it returns the response of the final location. Once you return the response, you can access the content by calling ".content" on the returned value.

Currently, it doesn't support GET requests with data, but mainly because I didn't need that for this project. It would be trivial to implement, however.

Here is an example on how to log in to your Google Voice account, then parse out the ever-so-important "_rnr_se" value using the URLOpener class:


opener = URLOpener()

loginParams = urllib.urlencode({
'Email' : email,
'Passwd' : password,
'continue' : 'https://www.google.com/voice/account/signin'
})

opener.open( 'https://www.google.com/accounts/ServiceLoginAuth', loginParams)

googleVoiceHomePage= opener.open('https://www.google.com/voice/#inbox').content

match = re.search('name="_rnr_se".*?value="(.*?)"', googleVoiceHomePage)

_rnr_se = match.group(1)

Hope this helps!
blog comments powered by Disqus