Is There a Difference Between Open Data and Public Data?

Yep. And it’s a big one.

There is a general consensus that when we talk about open data we are referring to any piece of data or content that is free to access, use, reuse, and redistribute. Due to the way most governments have rolled out their open data portals, however, it would be easy to assume that the data available on these sites is the only data that’s available for public consumption. This isn’t true.

Although data sets that receive a governmental stamp of openness receive a lot more publicity, they actually only represent a fraction of the public data that exists on the web.

So what’s the difference between “public” data and “open” data?

What is open data?

Generally speaking, “open data” is the information that has been published on government-sanctioned portals. In the best case, this data is structured, machine-readable, open-licensed, and well maintained.

What is public data?

Public data is the data that exists everywhere else. This is information that’s freely available (but not really accessible) on the web. It is frequently unstructured and unruly, and its usage requirements are often vague.

"Only 10% of government data is published as open data"

What does this mean?

Well, for starters, it means that there’s a discrepancy between the open data that exists in government portals and public data in general. This is an important distinction to make, because while there’s a lot of excitement surrounding open data initiatives and their potential to transform modern society, the data that this premise rests on — open data — is only a fraction of what’s needed in order for this potential to be realized.

The fact is this: the majority of useful government data is either still proprietary or stowed away in a filing cabinet somewhere, and the stuff that is available is being released haphazardly.

Does it matter?

Does it actually matter that there’s a distinction between these two kinds of data? Well… yes.

Open data, because it represents such a small portion of what’s available, hasn’t lived up to its potential. People, like me, who have very high hopes for the open data movement haven’t yet seen the ROI (economically or socially) that we were supposed to. The reason we haven’t is manyfold, but this distinction is part of the problem.

In order for open data to be as effective as predicted, the line that demarcates open and public data needs to be erased, and governments need to start making a lot more of their public information open data. After all, we’re the ones paying for it.