The following is an overview of my first experience with open data, which was used to build bark-parks (now offline) over the weekend. It’s a mashup of Toronto’s dog parks and Yelp reviews on a map. The source is on github, and I’d love any sort of contributions.
Problem
My girlfriend and I are new to Toronto. We don’t have a great idea of where things are, but we do have a puppy. So far, we’ve been using Meetup groups and a list on Toronto’s Parks, Forestry & Recreation website to figure out where to take our puppy. The list is good, but requires newcomers to open up a Google map and get directions to each location individually — a definite opportunity to contribute to the community.
Solution
Over the weekend, I spent a bit of time with Toronto’s open data[1]. The potential mashups and useful tools that could be created with just a day’s work seem endless, and could really make great strides toward improving the experience of living in a city with open data.
Data Quality
Specifically, I was looking at the dataset titled Parks and Recreation Facilities. This particular dataset is not published continuously, and there is no web service. It’s an XML file that was uploaded once — concerning, because that likely means the Toronto’s Parks, Forestry & Recreation website is not backed off this XML representation. While there is no indication other than the filename, the data looks to be from July 2011.
Perhaps it shouldn’t come as a surprise, but a career in enterprise development is a great leg-up in the world of coding with open government data. It’s wonderful the data is available, and it can’t be overstated how important that is. Of course, there is room for improvement. Among the unexpected things:
- duplicate data — I’ve chosen to enforce uniqueness in the code
- missing data — some parks appear on the website, but not in the XML
- misspellings
- missing elements that we could reasonably expect — other data sets have latitude and longitude coordinates. Here, we just have text addresses
I strive to leave the reference data alone in fixing any issue. I don’t want to mess with the format or content of it. Instead, I want to take the dataset, transform it, and enrich it for my application. By leaving the original reference source alone, we should be good for future updates[2].
Feedback Loop
I have an email out to the team on the above issues and am hopeful they’ll correct them. Still, I’d prefer something like github issues for open data — if we want the data open, shouldn’t we work together in the open to improve the quality of it as well?
- http://toronto.ca/open
- assuming updates continue to use the same format