Adventures with Google App Engine
A crazy idea
On Saturday night one of my housemates thought it would be a good idea to apply the concept of Mark Zuckerberg’s now-infamous ‘FaceMash.com’ website to professional StarCraft II players. SC2 has a huge community that would probably find something like this very funny, and we were bored and needed something to code – the perfect opportunity, so at 8pm we got to work on StudCraft2.com.
Google App Engine
We weren’t sure how much traffic this site would generate, so a distributed, scalable hosting solution seemed like a good idea; we also wanted to use a framework with a simple deployment mechanism and an easy to use database API; so I decided to write the main site using appengine and Google’s ‘webapp’ framework, while my housemate worked on some Perl scripts to scrape team sites and the sc2wiki for player information.
This was my first real foray into appengine, and we were coding through the night hoping to deploy as soon as possible, so please take these code samples with a grain of salt, the approaches taken here may be suboptimal, non-idiomatic and brilliant examples of poor design.
First I had to design the models, like this one, that would be stored in the appengine datastore:
class Player(db.Model): """Models a StarCraft II player""" handle = db.StringProperty() name = db.StringProperty() team = db.ReferenceProperty(Team) race = db.ReferenceProperty(Race) nation = db.StringProperty() photo = db.StringProperty() score = db.FloatProperty() votecount = db.IntegerProperty()
Next was the class that handles the main page of the website:
class MainPage(webapp.RequestHandler):
def post(self):
if self.processForm():
self.redirect("/")
def get(self):
self.showVotePage()
def showVotePage(self):
...
def processForm(self):
...
def updateScore(self, winner, loser):
...
After processing the form in post(), we redirect to get() so that clicking ‘refresh’ doesn’t cause a browser message asking you to resubmit post data. The processForm method figures out which pair of players you are voting on, who won, and whether it’s a legal vote (more on this later). If all that checks out it calls updateScore.
Updating scores
def updateScore(self, winner, loser):
if winner is not None and loser is not None
and winner.key() != loser.key():
K = 20
winnerQ = math.pow(10, winner.score/400.0)
loserQ = math.pow(10, loser.score/400.0)
winnerE = winnerQ/(winnerQ+loserQ)
winnerS = 1.0
loserE = loserQ/(winnerQ+loserQ)
loserS = 0.0
winner.score = winner.score + K*(winnerS - winnerE)
loser.score = loser.score + K*(loserS - loserE)
winner.votecount = winner.votecount + 1
loser.votecount = loser.votecount + 1
winner.put()
loser.put()
This is a simple implementation of the ELO rating system, as used in chess tournaments. The variables winnerE and loserE are the expected points of the two players in the matchup, based on their current scores. If winner and loser have equal scores, both expected values will be 0.5, the further ahead the winner is, the closer their expected value is to 1, and the loser’s expected value is to 0, in every case winnerE + loserE is equal to 1. The scores are then updated according to the difference between their expected scores and actual scores, multiplied by a fixed learning rate K, in this case 20.
This implementation is not thread-safe, because the score of a player may change between when it is read and when it is put, or ever between when the winner is put and the loser is put. Again, more on this later.
Duck-typing is your friend
A couple of hours into writing this app, we decided that we wanted to rank Starcraft II teams and races, not just individual players. So, I had a look at the above updateScore method to see what needed to change, the answer: nothing at all!
I often hear people (including myself) complaining about Python’s duck-typing being responsible for all sorts of hard-to-find bugs, but this was a situation where it really shines. All that is needed for an object to work with the updateScore method, is for it to have key() and put() methods, and score and votecount properties. So I designed the other models to match:
class Team(db.Model): """Models a StarCraft II team/clan""" name = db.StringProperty() score = db.FloatProperty() votecount = db.IntegerProperty() class Race(db.Model): """Models a StarCraft II race (ZPTR)""" name = db.StringProperty() score = db.FloatProperty() votecount = db.IntegerProperty()
In other object-oriented languages like Java, to achieve the same effect of having a generic updateScore function that works with several object types, I would have to create a ScoredObject interface or abstract class with the required parameters; then have each of Player, Team and Race subclass or implement ScoredObject.
While you may argue that this more explicit type hierarchy makes things easier to reason about, the sheer number of lines of code that are required (in contrast to the duck-typing approach), makes it a poor choice for a rapidly developed application like this one.
Beware the script-kiddie
The StarCraft II community is home to a number of programmers and other tech-savvy users, who may attempt to sway the results of the voting site in favour of their personal preferences by cheating the system. To combat this, I wanted to design the site in such a way that it would detect repeated requests from the same user, or forms that have been modified to vote on a particular pair of users rather than the ones randomly allocated for that request. This is what I came up with:
class SpamHash(db.Model): player1 = db.StringProperty() player2 = db.StringProperty() secret = db.StringProperty()
Each time we serve up a page with a voting form, we create a new SpamHash object:
spamhash = SpamHash(player1=unicode(player1.key()),
player2=unicode(player2.key()),
secret=base64.b64encode(os.urandom(8), "*!"))
The form contains the players string-encoded keys, and a random 8-byte string, in hidden fields. The JavaScript on the page itself sets a ‘winner’ field to 0 or 1 when a user clicks a photo or presses the left or right arrow keys.
When we receive a post() request, we look up the matching SpamHash and delete it:
matchingSpamHash = SpamHash.gql("WHERE player1 = :1 AND player2 = :2
AND secret = :3", player1ID, player2ID, spamhash).get()
if matchingSpamHash is not None:
matchingSpamHash.delete()
else:
... # redirect to error page
This way each form we serve can only be used once, and only for the pair of players we randomly assigned. The system accumulates unused SpamHash objects over time, when people visit the home page but do not vote. If that ever gets out of control I’ll add timestamps to the SpamHash and create a CRON job that deletes old ones from time to time.
Going live
After a few finishing touches, a dodgy script to load up all the data, and a page for viewing the top-scoring Players, Teams and Races, it was time to go live. We started by just posting a link on the TeamLiquid.net forum, a popular hang-out for the SC2 community in the US, Australia and Europe, and sat back to watch the vote count grow.
We hit 1000 votes in the first 20 minutes, 5000 in the first hour and 8000 in the first two hours. Things have died off since then, but at our peak we were serving 24 page requests per second.
Transactions, Atomicity and Future Work
Earlier I mentioned that the score update function is not thread safe, in fact neither are the counter increments.
AppEngine provides a way to create transactions that are guaranteed to succeed or fail in their entirety, maintaining data integrity. To ensure that the ELO updates have a zero-sum net effect on the total number of points in the system, all the Player objects must be put in the same entity-group. When two simultaneous transactions that operate on the same entity-group are in progress, the one that tries to commit second will fail, effectively causing a vote not to be counted. Ideally we would like to add all votes to a queue of jobs to be processed atomically when the server has time (but I’m not sure how to do this with AppEngine).
As for the counters, there’s a simple pattern called a ‘sharded counter’ that gets around these sort of contention problems. A counter is split into N ‘shards’, each of which is a root entity, and when you want to increment the counter, you choose a random shard to increment in a transaction, then there’s only a 1/N chance of a collision, so setting N to a number a few times larger than your expected number of increments/second should allow almost all the transactions to succeed. When you want to print the counter, you just add up all the shards. Perhaps it’s possible to come up with a sharded version of the ELO update algorithm, but in general ELO updates are not commutative.
So, if we have time, we might update the StudCraft2 code to be thread safe so that we stop ‘leaking points’ as we are now, and perhaps make the system generic enough to be used for other types of pairwise-voting-comparison websites. Either way building this website was a lot of fun, and I learnt a lot about using Google AppEngine!
All the code is available on my GitHub account.



Leave a Comment