Lessons from history on the dangers of blind trust in data

The Great Hanoi Rat Massacre of 1902 is a classic reminder of why we need to be wary about what data we measure and reward.

The French colonial administrators of the time, alarmed by the spread of rodents through the city’s sewers, offered local ratcatchers a bounty for each animal killed. The municipal government paid one cent for every rattail handed over as proof of elimination.

Initially, the data looked promising but, unfortunately, the plan went awry. Crafty Vietnamese entrepreneurs simply chopped the tails off living rats and set up rodent farms to boost their income. Bubonic plague broke out in Hanoi a few years later.

The data we all generate on our smartphones today may seem far removed from colonial statistics on Hanoi’s rat infestation. But the dangers of misinterpreting the data we produce remain the same. Correlations are sometimes spurious. Incentives will invariably be gamed. Stripped of context, data can be, and often is, misleading.

Some of those lessons appear to be being relearned today by the Chinese technology giant Alibaba as it experiments with its Sesame Credit scoring system. By capturing a vast array of data about its hundreds of millions of users, Alibaba had hoped to create a robust measure of consumer trustworthiness. That Sesame score, based on everything from electronic market purchase records to subway fares, could then be used to extend — or reject — consumer financing.

But as the Financial Times has reported, there is a significant difference between Big Data and strong data — and Alibaba has not yet used its Sesame scores to make loans. As Dai Xin, professor of law at Ocean University of China, told the FT, it is difficult to build reliable predictive models across different contexts. “Will a plagiarising student also commit fraud? Will a company that fails to pay a debt also renege on a building contract?” he asked.

One American tech executive I spoke with recently explained that algorithms were expressly designed to discriminate, to sort people into different categories. But that meant we should be incredibly careful in understanding exactly what data were being included and excluded in any given model and what inferences could be reasonably drawn. Otherwise, algorithmic discrimination risked becoming the “carbon monoxide of Big Data”, as he put it, colourless, odourless and potentially lethal. It was only when data were appropriately “oxygenated” with context that they became safe.

The broader risks of quantifying so many aspects of our lives are ominously sketched out by Steffen Mau in his forthcoming book, The Metric Society. Our obsession with measuring everything, from school grades, to personal looks, to behavioural habits, to popularity is creating a new social order of worth, a “conform and perform” culture, a world of “credible fictions”. Statistics are not just reflective of the existing world but are constructing an alternative new reality. Data are not just being used to inform society but to form it.

The professor of macrosociology at Berlin’s Humboldt University argues that this obsession with quantitative evaluation risks replacing material inequalities with numerical inequalities. Clashes between classes will be superseded by competition between individuals — think of Uber drivers scrambling for higher ratings. “Numbers describe, create and reproduce status,” he writes. “Numbers maketh the man.”

Who decides what numbers to collect and who determines the significance of those numbers therefore becomes an exercise in power. But the methodologies used by organisations — be they international agencies, government institutions, or global tech companies — to make those decisions are not subject to much, if any, scrutiny. That matters when algorithms are increasingly determining what grades students achieve at school, which job offers applicants receive, and whether or not prisoners are granted parole.

One response is to try to subvert the concept of tracking technologies, encouraging individuals to create their own data stories to monitor and challenge the power of those in authority. This could lead to a culture of sousveillance rather than surveillance, of undersight rather than oversight. The way in which the data-driven global environmental movement has changed the debate about climate change is one encouraging example.

Alternatively, some institutions may stop playing the quantitative game, as Ghent University appears determined to do. Earlier this month, the Belgian university announced it would downplay competitive, bureaucratically determined metrics of publications and citations used to determine funding decisions. Instead, the university rector Rik Van de Walle declared, it would foster a more collaborative culture between research groups and faculties nurtured by the academics themselves.

“Ghent University is deliberately choosing to step out of the rat race between individuals, departments and universities. We no longer wish to participate in the ranking of people,” he wrote. “A university is above all a place where everything can be questioned.”

That seems like a good place to start.

john.thornhill@ft.com

https://on.ft.com/2EXXwPC

Search This Blog

Plague

Featured Post

Community-Acquired Pneumonia |Clinical Guidance

Lessons from history on the dangers of blind trust in data - Financial Times

Comments

Post a Comment

Popular Posts

Model Monday's: Diana Moldovan

“Teaching a pandemic in real time, part 2 . Princeton professors share how they incorporate the study - Princeton University” plus 1 more

Preventing, controlling spread of animal diseases focus of forum at Penn State - Pennsylvania State University