Data for dissertations October 17, 2017

36371 The Attack on America and Civil Liberties Trade-Offs: A Three-Wave National Panel Survey, 2001-2004 http://doi.org/10.3886/ICPSR36371.v1

36622 Johns Hopkins University Prevention Research Center – Risks for Transitions in Drug Use Among Urban Adults, Baltimore City, 2008-2011 http://doi.org/10.3886/ICPSR36622.v1

36652 Afrobarometer Round 6: The Quality of Democracy and Governance in Burkina Faso, 2015 http://doi.org/10.3886/ICPSR36652.v1

36662 Eurobarometer 82.2: Quality of Transport, Cyber Security, Value Added Tax, and Public Health, October 2014 http://doi.org/10.3886/ICPSR36662.v1

36666 Eurobarometer 83.2: Perception of Security, Civil Protection, and Humanitarian Aid, March 2015 http://doi.org/10.3886/ICPSR36666.v1

The speed of information and the instability of public affairs

In 2013, it was discovered that 90% of the world’s data were created in the previous two years. I can only imagine how much faster the speed of information is these days. 

Most decision makers, though, process information just as they did 20 or 50 years ago – on paper, in bite-sized pieces. Some might claim that current decision makers, at least in politics, are less capable now at processing complex, high-dimensional information. 

What’s the impact of this imbalance? It’s easy to speculate, but I think there’s an argument to be made that one key outcome will be instability. 

With the addition of new data, at speed, the existing volume of information increases along with the difficulty of comparability. The likelihood of multidimensionality increases. The aggregation (or dimensional reduction) problem gets harder. 

We can hope that machine learning and mining technologies, probably fueled by AI, will save us. But I’m skeptical. Instead, I think it’s likely that instability increases. And that the demand for low-dimensional “rules of thumb” increases. And that the probability of failure of those rules also increases – if only because high-speed data means the world is changing quickly. 

Maybe I’m wrong. Comments are closed but feel free to correct my thinking on Twitter at @abwhitford. 

The problem of ephemeral data

You’ve collected data, analyzed/tortured them, written the paper, chosen a journal, and then submitted the paper for consideration. After a while, you hear back that the reviewers didn’t see enough promise to move forward so the editor has rejected it. You change some things, rinse, and repeat. Maybe it happens again. It’s expected in a world where many journals have acceptance rates lower than 10 percent. 

Finally, at some point, you receive an “R&R” – an opportunity to revise and resubmit your paper for further consideration. But the reviewers complain that the data are no longer “fresh” and require “updating.” What should you do?

Of course, most of us do whatever it takes and whatever is possible to close the R&R. The goal is to publish so it’s natural to jump through the hoops.

The problem of ephemeral data, though, is a philosophical one. If the data are truly “stale”, how does freshening them improve inference? 

If the world is changing that quickly, won’t even fresh data be outdated by the time the paper survives review, is accepted, is typeset, is processed, and finally is printed in the journal several years later? 

And won’t the data be stale when people notice the paper several years later when assembling reading lists for their own papers, syllabi, or students?

There’s no natural solution to this problem. Because researchers and practitioners meld together as one studies the other and the other changes behavior based on research, it is inevitable that data are ephemeral. The social “data generating process” is a moving target – and researchers are themselves embedded mechanisms.

Perhaps I have this wrong, though. Comments are closed, but please feel free to correct my thinking on Twitter at @abwhitford.