Route Planning services should not be built by governmental organizations

2017, January 15

Is it a government’s task to build apps? If not, should a government build APIs? Or should they only publish their data?
A question that took me a long time to answer.

A possible solution to the mobility problem in Belgium – without changing too much on the financial incentives – is to make sure Belgians are better informed about their travel options at any given moment. If an app – or a website – shows less time is lost using an alternative intermodal route, we might actually change their behaviour. But who has to build this app? In this post, I argue that the government should only focus on making raw data available on the Web. As there is a global market with route planning services for numerous use cases, we can safely assume there is no market failure. Building an extra app would only disrupt that market. But, governments should also not bother to build a route planning API, that only allow innovation for new fancy graphical user interfaces. A route planning API cannot be combined with other APIs to generate intermodal routes, and the problem that all data needs to be centralized within one system persists. Instead, publishing the raw data first, enables service creators to build APIs for the many.

A Public Sector Body has the responsibility to perform a task for a certain group of citizens and/or a group of companies and/or other public sector bodies. It is up to the ministers and the people working in this administration, to decide how broad this task should be interpreted. The terms Big Government and Small Government are used to respectively indicate that a certain body has been given too many and too broad responsibilities (and funding) for a certain task, or too few. Commonly, the argument is used that a government should be small where there is no market failure: when the free market is able to target all the citizens’ demands, a government should not interfere. A government should be given the possibility to grow big, where the market fails to address all needs. In the railways sector in Europe, for instance, this is still a hot topic, where the transition from one government owned public transit agency towards a free railway market gets a lot of criticism. Another classic example is within the Open Data scene, where citizens want smart agents or apps, or where developers want APIs, or where data service creators want raw datasets. Let’s ask the Twitterverse first…

Should your government create a route planner for a certain use case? (e.g., for folding bike routes or for wheelchair accessible routes)
— Pieter Colpaert (@pietercolpaert) 2017, January 14

Some want the government to build them an app, more would like the government to give them an API, but what we really need is the government to publish their data for third party services.

Apps

An app’s intention is to fix a pain in the daily life of a smartphone savvy citizen. Decision makers who want to make their mark, thus provide a one-time amount of money for the creation of an app. The bar is set high as competition is fierce: plenty of apps already exist which to a certain extent already solve these issues. However, some apps stand the test of time and may actually fulfill the needs of their users. Let’s have a look at three examples:

NMBS/SNCB Belgian rail app – An app that is able to plan routes through Belgium with the four public transport modes. The app itself is built by a German company called HAFAS. At the end of 2016, the NMBS/SNCB announced a user experience center where they would be able to test the app, perform eye tracking, and turn the app into a great commuter experience. The unique point of this app is that it allows to buy tickets. Today, the app receives a score of 3.8/5 on the Google Play store.
De Lijn app – The native app on the Google Play store receives a 3.3/5 rating.
Fietsrouteplanner Gent – A route planner for bicycles in the City of Ghent. The idea when launched in 2011 was to stimulate cyclists to take smarter routes through the city. Today, the route planner is left unmaintained, and is not updated with the latest mobility rules in the city.

Three Belgian examples of route planning applications maintained by the data owners themself.

On the one hand, we can argue whether these apps should have existed in the first place, as their overall scores are not that great while we can imagine the investments that were put into it are large. For example, in 2015, Brecht Van de Vyvere (iRail) published a furious opinion piece in Datanews. He argued that De Lijn disrupts the route planning app market by using tax payer’s money to heavily promote their app. Of course, the apps of NMBS/SNCB and De Lijn today have their merits, as they serve as a direct communication channel with a segment of users that only take public transport.

On the other hand, they are not an app for everyone. For example, we cannot plan a route using a folding bike, or using a wheelchair. Neither can we plan routes which combine different non public transport modes, or plan routes across the Belgian border. Should Public Sector Bodies now also build a route planning app for these niches? “Of course not: they should provide an API for third parties, so they can create these new user interfaces” is what 33% of the respondents to my twitter poll would say.

An API for third party user interfaces

With iRail API, this is exactly what we have been doing since 2010. As an independent non-profit organization, we web-scraped the Belgian railway company and provided everyone with a free route planning interface. This enabled people within Belgium to create their own user interfaces for the Belgian railway systems, which get overall higher scores on the app stores while their budget is almost non existing:

Railer (rating of 4.5/5 on the iPhone app store) and BeTrains (4.3/5 on the Google Play Store) are apps reusing the iRail API to create a GUI on iPhone and Android
NextTrain is a smartwatch application for planning your route by train in the Netherlands or Belgium.
Transportr (4.6/5 on the Google Play Store) is an open source app that works on top of the Navitia API

Examples of apps that need access to an API in order to function. They are created with the idea in mind to create a good user interface for third parties

We may conclude this is a good idea: the apps get higher ratings and there are more apps to choose from. However, this does not solve the initial problem: we still do not have apps that work across the borders of Belgium, and we still cannot calculate intermodal routes across different APIs. The only thing it solves is that the public transit agency itself does not have to care any longer about its user interface. If we would however like to have extra features, we would have to ask the API maintainer, and thus the data owner, to provide this feature for us. It becomes hardly possible for application developers to create a unique selling point, as each of these apps just compete on user interface for the same features, and thus, it becomes impossible for them to build a sustainable business model. For the data owner, this is not a very good idea either…

Scaling data reuse

Let’s create a user agent that provides its end-users with the nearest railway station. A user story would look like this: when you push a button, you should see the nearest station relative to your current location. In a Service-Oriented Architecture (soa), or how we would naturally design such an interaction in small-scale set-ups, we expose a functionality on the server which requires the application to send its current location to the server. A url of such a request would look like this: http://{my-service}/nearestStation?longitude=3.14159&latitude=51.315. The server then responds with a concise and precise answer. This minimizes the data that has to be exchanged when only one question is asked, as only one station needs to be transferred. Does this advantage weigh up to the disadvantages?

The number of information resources – or documents – that you potentially have to generate on the server, is over Let’s be precise: in the Coordinate Reference System (crs) wgs-84, the globe goes from a longitude -180° until longitude +180° and from a latitude -90° until a latitude +90°. For estimating the nearest stations, let’s assume a precision of 11m, or 4 decimal places in both longitude and latitude, is enough. Then we would still have $180 \times 10^{4} \times 360 \times 10^{4}$ $= 6.48 \times 10^{12}$ options exposed for a simple feature. a trillion. As it is unlikely that two people wanting to know the nearest railway station are at exactly the same locations, each http request has to be sent to the server for evaluation. Rightfully, soa practitioners introduce rate limiting to this kind of requests to keep the number of requests low. An interesting business model is to sell people who need more requests, a higher rate limit. Yet, did we not want to maximize the reuse of our data, instead of limiting the number of requests possible?

As there are only 646 stations served by the Belgian railway company, describing this amount of stations easily fits into one information resource identified by one url. When the server does not expose the functionality to filter the stations on the basis of geolocation, all user agents that want to solve any question based on the location of stations, have to fetch the same resource. This puts the server at ease, as it can prepare the right document once each time the stations list is updated. Despite the fact that now all 646 stations had to be transferred to the user agent, and thus consumed significantly more bandwidth, also this user agent can benefit. For example, when soon after, a similar question is executed, the dataset will already be present in the client’s cache, and now, no data at all will need to be transferred.

When now the number of end-users increases by a factor of thousand per second – not uncommon on the Web –, it becomes easier for the server to keep delivering the same file for those user agents that do not have it in cache already. When it is not in cache of the user agent agent itself, it might already be in an intermediate cache on the Web, or in the server’s cache, not leading to the server having to invest in cpu time per user. Caching has the potential to eliminate some network interactions and server load. When exploited, a better network efficiency, scalability, and user-perceived performance can be achieved. In this case, it is thus better to do a bit less on the server instead of exposing all possible features.

See the Pen BQrJGv by Pieter Colpaert (@pietercolpaert) on CodePen.

Your browser is smart enough to calculate the closest station itself, instead of having to send your current location to a remote server.

Raw data sharing

Only one option left in our questionnaire: raw data sharing. 48% of twitter respondents (the largest part) say this is the way to go, and I agree. This way of publishing data, affords the creation of apps that go beyond what a government itself can create:

Google Maps (4.3/5) – installed on each android device and probably the most frequently used app in Belgium at this moment.
CityMapper (4.5/5) – An application also providing folding bike routes in its beta version now!
Maps.me (4.5/5) – A new-comer on the app market. Provides routes between car, bike, public transport and walking.
Bike Citizens (3.9/5) – a route planner for biking in a city.
GO OV – A route assistant for people with a mental disability.
Komoot (4.4/5) – A route planner for people who love hiking.
And many more...

A set of applications that need access to raw data before being able to work in a certain region. The variety of these apps is evidence for the long tail of needs within route planning applications.

In August 2015, iRail generated a first static timetable data dump in GTFS during open Summer of code 2015. This allowed CityMapper to become the first intermodal route planner in Belgium to include the Belgian railway company’s data.

Walking to Hasselt station with @Citymapper. The first time I have a 100% realtime intermodal exp in Belgium @iRail pic.twitter.com/Lh6Wk1jANI
— Pieter Colpaert (@pietercolpaert) 2015, August 1

And raw data sharing also allows third parties to create very specific APIs which they can provide as a service to application developers that do not want to go through the hassle of setting up an API:

The iRail API – A free service offered by the non-profit project iRail for hobbyists in Belgium to create new kind of interfaces. A crowd-funding projects called Spitsgids recently was fully funded to predict how crowdy your train will be.
Navitia.io – A freemium API which tries to get its service to work with all data published world-wide.
Plannerstack – A platform as a service provider for on demand route planning services.
Digitransit – A consultancy/development firm for beautiful intermodal route planning user interfaces and back-ends.

Services and tools for other developers to start working with transport data

Everything in this chapter would not have been possible without the raw data being available.

Data is only going to be reused when the benefits for a third party outweigh the costs, and datasets are only going to be published when the costs for data publishers remain reasonable. We can keep data publishing cost-efficient, as long as we make sure the documents we publish are cacheable. Today, it’s still expensive for a data reuser to download a GTFS time schedules file, extract it, deploy it on a route planner, and then still also develop a front-end. Therefore, it needs to become much cheaper to start reusing this kind of data. As an alternative to GTFS, the same ideas as in the chapter about scaling data reuse can be applied to route planning:

Check out this demo on codepen

It’s possible to also apply the idea of resource oriented data publishing design on route planning. With the Linked Connections framework, you can for example easily create isochrone maps with very specific parameters by editing the javascript of this example. Select a pin to create an isochrone from that location.

We should not over-estimate the potential of this market either: many of the apps are available for free and the willingness to pay of end-users is low. E.g., the only reason why Bike Citizens get a lower score on the app store is because users need to pay for unlocking new cities. However, where money is to be found, I believe, is in niches. This does not only lower the costs for publishing the data, it also lowers the cost to start working with the data, to tweak route planning algorithms for specific needs. You can use this example and tweak it in Javascript to calculate an isochrone map for your specific end-user.

In conclusion: let’s publish data!

It would be wrong to think of the sequence “data publishing, API, app” as an indication of how good of a job a government is doing. Each step fulfills a different need. I would pick data publishing as the most important investment to be done today, as some datasets can only be delivered by governmental organizations, or by public transport authorities. It forms the basis for potential APIs, and it is the basis for the final application.

My PhD at @iMinds @ResearchUGent explained in 3 minutes at @TEDxGhent https://t.co/mqbOW3kyVL #opentransport
— Pieter Colpaert (@pietercolpaert) 2014, September 3

In 2014, a talk at TEDxGhent also advocated for data publishing instead of everyone creating apps.