Route Planning services should not be built by governmental organizations
2017, January 15
Is it a government’s task to build apps? If not, should a government build APIs? Or should they only publish their data?
A question that took me a long time to answer.
A Public Sector Body has the responsibility to perform a task for a certain group of citizens and/or a group of companies and/or other public sector bodies. It is up to the ministers and the people working in this administration, to decide how broad this task should be interpreted. The terms Big Government and Small Government are used to respectively indicate that a certain body has been given too many and too broad responsibilities (and funding) for a certain task, or too few. Commonly, the argument is used that a government should be small where there is no market failure: when the free market is able to target all the citizens’ demands, a government should not interfere. A government should be given the possibility to grow big, where the market fails to address all needs. In the railways sector in Europe, for instance, this is still a hot topic, where the transition from one government owned public transit agency towards a free railway market gets a lot of criticism. Another classic example is within the Open Data scene, where citizens want smart agents or apps, or where developers want APIs, or where data service creators want raw datasets. Let’s ask the Twitterverse first…
Apps
An app’s intention is to fix a pain in the daily life of a smartphone savvy citizen. Decision makers who want to make their mark, thus provide a one-time amount of money for the creation of an app. The bar is set high as competition is fierce: plenty of apps already exist which to a certain extent already solve these issues. However, some apps stand the test of time and may actually fulfill the needs of their users. Let’s have a look at three examples:
On the one hand, we can argue whether these apps should have existed in the first place, as their overall scores are not that great while we can imagine the investments that were put into it are large. For example, in 2015, Brecht Van de Vyvere (iRail) published a furious opinion piece in Datanews. He argued that De Lijn disrupts the route planning app market by using tax payer’s money to heavily promote their app. Of course, the apps of NMBS/SNCB and De Lijn today have their merits, as they serve as a direct communication channel with a segment of users that only take public transport.
On the other hand, they are not an app for everyone. For example, we cannot plan a route using a folding bike, or using a wheelchair. Neither can we plan routes which combine different non public transport modes, or plan routes across the Belgian border. Should Public Sector Bodies now also build a route planning app for these niches? “Of course not: they should provide an API for third parties, so they can create these new user interfaces” is what 33% of the respondents to my twitter poll would say.
An API for third party user interfaces
With iRail API, this is exactly what we have been doing since 2010. As an independent non-profit organization, we web-scraped the Belgian railway company and provided everyone with a free route planning interface. This enabled people within Belgium to create their own user interfaces for the Belgian railway systems, which get overall higher scores on the app stores while their budget is almost non existing:
We may conclude this is a good idea: the apps get higher ratings and there are more apps to choose from. However, this does not solve the initial problem: we still do not have apps that work across the borders of Belgium, and we still cannot calculate intermodal routes across different APIs. The only thing it solves is that the public transit agency itself does not have to care any longer about its user interface. If we would however like to have extra features, we would have to ask the API maintainer, and thus the data owner, to provide this feature for us. It becomes hardly possible for application developers to create a unique selling point, as each of these apps just compete on user interface for the same features, and thus, it becomes impossible for them to build a sustainable business model. For the data owner, this is not a very good idea either…
Scaling data reuse
Let’s create a user agent that provides its end-users with the nearest railway station. A user story would look like this: when you push a button, you should see the nearest station relative to your current location. In a Service-Oriented Architecture (soa), or how we would naturally design such an interaction in small-scale set-ups, we expose a functionality on the server which requires the application to send its current location to the server. A url of such a request would look like this: http://{my-service}/nearestStation?longitude=3.14159&latitude=51.315. The server then responds with a concise and precise answer. This minimizes the data that has to be exchanged when only one question is asked, as only one station needs to be transferred. Does this advantage weigh up to the disadvantages?
The number of information resources – or documents – that you potentially have to generate on the server, is over Let’s be precise: in the Coordinate Reference System (crs) wgs-84, the globe goes from a longitude -180° until longitude +180° and from a latitude -90° until a latitude +90°. For estimating the nearest stations, let’s assume a precision of 11m, or 4 decimal places in both longitude and latitude, is enough. Then we would still have options exposed for a simple feature. a trillion. As it is unlikely that two people wanting to know the nearest railway station are at exactly the same locations, each http request has to be sent to the server for evaluation. Rightfully, soa practitioners introduce rate limiting to this kind of requests to keep the number of requests low. An interesting business model is to sell people who need more requests, a higher rate limit. Yet, did we not want to maximize the reuse of our data, instead of limiting the number of requests possible?
As there are only 646 stations served by the Belgian railway company, describing this amount of stations easily fits into one information resource identified by one url. When the server does not expose the functionality to filter the stations on the basis of geolocation, all user agents that want to solve any question based on the location of stations, have to fetch the same resource. This puts the server at ease, as it can prepare the right document once each time the stations list is updated. Despite the fact that now all 646 stations had to be transferred to the user agent, and thus consumed significantly more bandwidth, also this user agent can benefit. For example, when soon after, a similar question is executed, the dataset will already be present in the client’s cache, and now, no data at all will need to be transferred.
When now the number of end-users increases by a factor of thousand per second – not uncommon on the Web –, it becomes easier for the server to keep delivering the same file for those user agents that do not have it in cache already. When it is not in cache of the user agent agent itself, it might already be in an intermediate cache on the Web, or in the server’s cache, not leading to the server having to invest in cpu time per user. Caching has the potential to eliminate some network interactions and server load. When exploited, a better network efficiency, scalability, and user-perceived performance can be achieved. In this case, it is thus better to do a bit less on the server instead of exposing all possible features.
Raw data sharing
Only one option left in our questionnaire: raw data sharing. 48% of twitter respondents (the largest part) say this is the way to go, and I agree. This way of publishing data, affords the creation of apps that go beyond what a government itself can create:
In August 2015, iRail generated a first static timetable data dump in GTFS during open Summer of code 2015. This allowed CityMapper to become the first intermodal route planner in Belgium to include the Belgian railway company’s data.
And raw data sharing also allows third parties to create very specific APIs which they can provide as a service to application developers that do not want to go through the hassle of setting up an API:
Everything in this chapter would not have been possible without the raw data being available.
Data is only going to be reused when the benefits for a third party outweigh the costs, and datasets are only going to be published when the costs for data publishers remain reasonable. We can keep data publishing cost-efficient, as long as we make sure the documents we publish are cacheable. Today, it’s still expensive for a data reuser to download a GTFS time schedules file, extract it, deploy it on a route planner, and then still also develop a front-end. Therefore, it needs to become much cheaper to start reusing this kind of data. As an alternative to GTFS, the same ideas as in the chapter about scaling data reuse can be applied to route planning:
We should not over-estimate the potential of this market either: many of the apps are available for free and the willingness to pay of end-users is low. E.g., the only reason why Bike Citizens get a lower score on the app store is because users need to pay for unlocking new cities. However, where money is to be found, I believe, is in niches. This does not only lower the costs for publishing the data, it also lowers the cost to start working with the data, to tweak route planning algorithms for specific needs. You can use this example and tweak it in Javascript to calculate an isochrone map for your specific end-user.
In conclusion: let’s publish data!
It would be wrong to think of the sequence “data publishing, API, app” as an indication of how good of a job a government is doing. Each step fulfills a different need. I would pick data publishing as the most important investment to be done today, as some datasets can only be delivered by governmental organizations, or by public transport authorities. It forms the basis for potential APIs, and it is the basis for the final application.