Zemblanity in Data Science

So what is the opposite of Serendip, a southern land of spice and warmth, lush greenery and hummingbirds, seawashed, sunbasted? Think of another world in the far north, barren, icebound, cold, a world of flint and stone. Call it Zembla. Ergo: zemblanity, the opposite of serendipity, the faculty of making unhappy, unlucky and expected discoveries by design. – Armadillo, by William Boyd, 1998.

Yes, the modern landscape of data science is all sugar and spice, Serendip, if you believe all those blog posts and overal internet hype. The allure of the seredipitous discovery is irresistible to too many. But what does it actually means? Can we call a Netflix recommendation for “Friends” a seredipitous discovery when you were looking instead for something more like “Stranger Things”?

Seredipitous discovery requires segacity, a wisdom of recognising the unexpected and following it. People are not always capable of drawing that connection. Anyway, how many unexpected things can one find in a customer database? We will always be looking for the relationships between different variables. They might be unexpected, yes, but were not found by accident. It’s inference from data, not serendipity. There is a word for it - semantic bleaching.

Zemblanity in contrast, is an unavoidable discovery of what we don’t want to know. For example finding out that your grandma has died. I know, how morbid… The manifestation of zemblanity in our new data driven reality is much more sinister. This unpleasant discovery is made not by a data scientist but by us, the source of data and as a customer after it has been analysed. Small data from individuals collected with internet enabled devices is only one exhaust that is fuelling big data, but extraction and analysis of it is little regulated.

Because there is little or no regulation, the monetisation of free behavioural data is a revenue model for startups and big established companies alike. We, as providers of data, have no knowledge on how it is stored, accessed and how this access is granted. Often this data carries personably identifiable information (PII), which means that companies own our digital proxies that helps them to predict our future behaviour.

For the past few centuries the dominant form of economic system was and is capitalism. The way the wealth is accumulated defines the type of capitalism. Thus we had the mass-production corporate capitalism that later morphed into financial capitalism. The Economist announced in the middle of 2017 that the world’s most valuable resource is no longer oil, but data. Surely, capitalism is taking us on another ride, and it is there we find zemblanity.

This new form of capitalism, by some called surveillance capitalism, is a place where world is reborn as data. Don’t get me wrong, no technology is inherently evil, but this is socially and legally undefended territory. If unchallenged, surveillance capitalism is a threat to democracy. It thrives on public ignorance and we as data scientist have a big part to play in this system. To learn more about surveillance capitalism and the Big Other, I encourage you to read this paper by Shoshana Zuboff.

So, which one will it be?