Technical Blog

Golf results notifier

Golf Result Notifier is a tool I developed to help following professional golf tournaments. The main golf tours (PGA Tour, European Tour, LPGA Tour, etc.) provide live leaderboards, updated every time a player holes out. This is a great way to follow the tournaments, but it requires to take a look on a regular basis to the rankings. Therefore, I decided to develop a program that notifies every time a player’s score is updated. It pops up notifications like these:

The program is developed with Python.

Accessing to data

The crucial step is to get the data. I developed my program to handle two tours: PGA Tour and European Tour.
On a user aspect, the way to access to those data is easy: going to the tour’s website, and click on leaderboard.

With a program, the problem is different.
Firstly, the tours uses two techniques. The PGA Tour uses one url all the season: http://www.pgatour.com/leaderboard, but the European Tour uses a different url for each tournament. The format is: 

http://www.europeantour.com/europeantour/season=[Year]/tournamentid=[TournamentId]/leaderboard/index.html

For example: http://www.europeantour.com/europeantour/season=2011/tournamentid=2011013/leaderboard/index.html. Note that we use the Leaderboard tab, and not the Results tab when the tournament is not over.

To handle the European version, the program simply download the tour’s main page, and finds the URL with the following regex:

file = get_file("http://www.europeantour.com/index.html")
# Getting the box containing the URL
containerPattern = r'id="ETContainer_thisWeek".*?>(.*?)id="STContainer_thisWeek'
data = re.findall(containerPattern, file, re.M|re.I|re.S)[0]
# Getting the current tournament season and id
leaderboardPattern = r'href="/europeantour/season=(.*?)/tournamentid=(.*?)/'
season, tournament_id = re.findall(leaderboardPattern, data, re.M|re.I|re.S)[0]

Now that we have those URLs, we can think that we only need to download the file and parse it with a regex. Unfortunately, it is now working this way. When we download those pages, we only obtain the page’s layout, and no information about the rankings and the results. Therefore, we need to inspect these pages in our browser, to find out where the data are.

To do that, I am using FireBug, a Firefox plugin. I opened http://www.pgatour.com/leaderboard with Firebug opened to the Networking panel. I checked the files list, and detected a JSon file that contains all the results. The pattern is:

http://www.pgatour.com/15s/.element/ssi/auto/3.0/sdms/leaderboards/r[TournamentId]/data/current/leaderboard-1-all.json

The only changing part is the tournament id. This number can be found in the HTML of http://www.pgatour.com/leaderboard, and I get it with a simple regex.

The european tour works almost the same way. I  found that the results are stored in an HTML file. The URL pattern is:

http://www.europeantour.com/europeantour/season=[Year]/tournamentid=[TournamentID]/library/leaderboard/_leaderboard_v2.html

Extracting the player’s data

The data sources are very different in the two cases :

  • A JSON file (PGA Tour)
  • An HTML file (European Tour)
The first one is very easy to parse: I directly used Python’s JSON package. The loads function transforms the string stored in the file to a Python’s structure (dictionaries). I used a Javascript Beautifier to display the JSON file in a readable way, to be able to understand the structure of the file.
For example, let’s take this leaderboard. The JSON file looks like this:

{
    "lb": {
        "lt": "April 3, 2011 - 5:57 p.m.", /* local time */
        "tid": "020", /* tournament id */
        "c": { /* course information */
            "c": {
                "sl": "1",
                "h": [{ /* hole one distance and par */
                    "d": "397",
                    "p": "4",
                    "n": "1"
                },
                /*...*/],
                "n": "Redstone GC Tournament Course" /* Course name */
            }
        },
        "sd": "Thursday Mar. 31 - Sunday Apr. 3, 2011", /* tournament date */
        "tn": "Shell Houston Open", /* tournament name */
        "pds": { /* leaderboard */
            "p": [{ /* First player is the leader */
                "fcr": "19", /* Current Fedex Cup Rank */
                "sh": "1", /* Current ranking */
                "pfcr": "2", /* Projected Fedex Cup Rank */
                "id": "01810", /* PGATour.com player ID */
                "ln": "Mickelson", /* last name */
                "stat": { /* player statistics: average drive distance, driving accuracy... */
                    "s": [{
                        "id": "101",
                        "v": "323.5"
                    }, /*...*/
                    ]
                },
                "fcp": "548", /* Fedex Cup Points */
                "t": "F", /* "Thru" information, here F: player finished his round */
                "pfct": "500", /* Projected Event Points */
                "crs": "-7", /* Current round's score */
                "fn": "Phil", /* First name */
                "f": "USA", /* Country */
                "tp": "-20", /* Tournament score */
                "rn": "4", /* Current round */
                "pfcp": "1048", /* Fedex Cup points */
                "rs": { /* Round scores */
                    "r": [{
                        "sc": "70", /* Score */
                        "t": "12:40pm", /* Tee Time */
                        "rn": "1" /* Round number */
                    }, /*...*/
                    }],
                }
            }, {
                /* next player ... */

In my Python code, I can access the results like this:

file_content = get_file(filepath)
obj = json.loads(file_content)
tournament_name = obj["lb"]["tn"]
local_time = obj["lb"]["lt"]
results = {}
for p in obj["lb"]["pds"]["p"]:
    data = {"hole" : p["t"], "score": p["tp"],
            "pos": p["p"], "today": p["crs"],
             "id" : p["id"]}
    results[p["fn"] + " " + p["ln"]] = data
In the HTML case, I had to parse myself the file. I used several regular expressions to extract the data, and build the data structure.
The first step is to extract the leaderboard content. The first regex is the following one:

leaderboardPattern = r'id="lbl">.*?<tbody>(.*?)</tbody>'
leaderboard = re.findall(leaderboardPattern, file, re.M|re.I|re.S)[0]

Then I want to get all the lines (one per player):

playerPattern = r'<tr\s* id="(.*?)"[^>]+>(.*?)</tr>'
players = re.findall(playerPattern, leaderboard, re.M|re.I|re.S)

Finally, for each player I can extract the data and build the data structure.
For example, this line:

has the following html:

<tr id="798" class="o 798 sel">
  <td class="l"></td>
  <td class="mk">1</td>
  <td class="o60">77</td>
  <td>118</td>
  <td><img src="/imgml/flags/23x17/flag_fra.gif" tag="FRA" alt="FRA" title="France" width="23" height="17"></td>
  <td class="">-</td>
  <td class="b">1</td>
  <td class="nm">JACQUELIN, Raphaël</td>
  <td>-12</td><td>18</td><td>-3</td>
  <td class="rnd ">66</td><td class="rnd ">69</td>
  <td class="rnd ">69</td><td class="rnd ">68</td>
  <td>272</td>
  <td class="r"></td>
</tr>

I process this way:

results = {}
for p in players:
    id = p[0]
    pattern = r'class="b">([^<]+)<.*?"nm">([^<]+)<.*?<td>(.*?)</td>.*?<td>(.*?)</td>.*?<td>(.*?)<'
    infos = re.findall(pattern, p[1], re.M|re.I|re.S)
    if len(infos) == 0:
        continue
    infos = infos[0]
    data = {"hole" : infos[3], "score": infos[2],
            "pos": infos[0], "today": infos[4],
            "id" : id}
    results[infos[1]] = data

Notifying score modifications

Displaying a notification

Now that we have the latest information, we can compare it to the previous one, player by player. We compare the current score to the old one. If it changed, we notify it. Then, we update the stored leaderboard, wait 30 seconds and start again this process.

To display the notification, I use Growl, a mac application. To create a notification, I decided to use the growlnotify application. It is a command-line application that helps displaying notifications. A Python’s module is also available to perform the same task.
Growlnotify commands line works this way:

growlnotify -m "Message" -t "Title" --image "image_path"

In my python code, I use it this way:

os.system(command_line)

Getting the picture

When a notification pops, I like to have the picture of the player. Therefore, I added a feature that downloads the picture of a player from its ID.
In both cases (PGA Tour and European Tour), I analyzed the URL of player’s pictures. On the European Tour website, a picture is displayed when clicking on a player’s name on the leaderboard. I easily figured out with FireBug’s HTML tab that the pattern is:

 http://www.europeantour.com/imgml/players/[PlayerID].jpg

On the PGA Tour website, the pictures can be found in the Personnal tab of the player’s profile. The URL can be built from the player’s ID this way:

playerID = int(playerID)
a = playerID / 10000
b = (playerID / 100) % 100
c = playerID % 100
a,b,c = str(a).zfill(2), str(b).zfill(2), str(c).zfill(2)
url = "http://i.cdn.turner.com/pgatour/players/%s/%s/%s/images/headshot-96x109.jpg" % (a, b, c)

To avoid downloading every time the picture, I added a simple cache system:

localFname = "cache/pga-" + playerID + ".png"
# Check if we have the picture
if os.path.exists(localFname):
    return localFname
# Dowloading the picture
os.system("/sw/bin/wget %s -O %s  1>/dev/null 2>/dev/null" % (url, localFname))
return localFname # local path to image, given as parameter to growlnotify