The Information

the-information-gleickWhat is Information? Is it inseparably connected to our human condition? How will the exponentially growing flow of information affect our societies?  How is the exploding amount of information affecting us as people, our societies, our democracies? When The Economist talks about post-truth society, how much of this trend is related to the failure of fact-checking, increasing polarity and fragmentation of media and the distrust of ‘experts’?  The Information starts with a reference to Borges’ Library of Babel:

The Library of Babel contains all books, in all languages.  Yet no knowledge can be discovered here, precisely because all knowledge is there, shelved side by side with all falsehood.  In the mirrored galeries, on the countless shelves, can be found everything and nothing.  There can be no more perfect case of information glut. We make our own storehouses.  The persistence of infomation, the difficulty of forgettting, so characteristic of our time, accretes confusion. (p. 373)

In The Information, James Gleick takes the reader on a historical world tour to trace the origins of our ‘Information Society’, basically an old term that keeps on being reinvented. It’s a sweeping and monumental tour that takes us from African drumming over alphabets, the beginnings of science, mathematical codes, data, electronics to the spooky world of quantum physics.  He shows how information has always been central to who we are as humans. He points to foreshadowings from the current information age such as the origin of the word “network” in the 19th century and how “computers” were people before they were machines.

shannonThe core figure in the book is Claude Shannon. In 1948 he invented information theory by making a mathematical theory out of something that doesn’t seem mathematical. He was the first one to use the word ‘bit’ as a measure of information. Until then nobody would have though to measure information in units, like meters or kilograms. He showed how all human creations such as words, music and visual images are all related in the way that can be captured by bits. It’s amazing that this unifying idea of information that has transformed our societies was only conceptualized less than 70 years ago.

It’s Shannon whose fingerprints are on every electronic device we own, every computer screen we gaze into, every means of digital communication. He’s one of these people who so transform the world that, after the transformation, the old world is forgotten.” That old world, Gleick said, treated information as “vague and unimportant,” as something to be relegated to “an information desk at the library.” The new world, Shannon’s world, exalted information; information was everywhere. (New Yorker)
At its most fundamental, information is a binary choice.  A bit of information is one yes-or-no choice. This is a very powerful concept that has made a lot of modern technology possible. By this technical definition, all information has a certain value, regardless of the content of the message.  A message might take 1.000 bits and contain complete nonsense. This shows how information is at the same time empowering, but also desiccating. Information is everywhere, but as a result, we find it increasingly hard to find meaning.  Has the easy accessibility of ‘facts’ diminished the value we assign to it?
Despite the progress in producing and storing information, we have remained human in our ability to filter and process information. Gleick gives the example of his own writing process:
The tools at my disposal now compared to just 10 years ago are extraordinary. A sentence that once might have required a day of library work now might require no more than a few minutes on the Internet. That is a good thing. Information is everywhere, and facts are astoundingly accessible. But it’s also a challenge because authors today must pay more attention than ever to where we add value. And I can tell you this, the value we add is not in the few minutes of work it takes to dig up some factoid, because any reader can now dig up the same factoid in the same few minutes.
It’s interesting because this feeling of the precariousness of information is everywhere. We think information is so fragile, that if we don’t grab it and store it someplace, we’ll forget it and we’ll never have it again. The reality is that information is more persistent and robust now than it’s ever been in human history. Our ancestors, far more than us, needed to worry about how fragile information was and how easily it could vanish. When the library of Alexandria burned, most of the plays of Sophocles were lost, never to be seen again. Now, we preserve knowledge with an almost infinite ability.
Redundancy is a key characteristic of natural information networks. As Taleb taught us, decentralized networks are much more resilient than centralized structures.  Every natural language has redundancy built in. This is why people can understand text riddled with errors or missing letters and why they can understand conversation in a noisy room.  The best example of a natural information network may be life’s genetic make-up:
“DNA is the quintessential information molecule, the most advanced message processor at the cellular level—an alphabet and a code, 6 billion bits to form a human being.” “When the genetic code was solved, in the early 1960s, it turned out to be full of redundancy. Some codons are redundant; some actually serve as start signals and stop signals. The redundancy serves exactly the purpose that an information theorist would expect. It provides tolerance for errors.”
 Technological innovation has always sparked anxiety. Gleick quotes Plato’s Socrates that the invention of writing “will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory.” (p.30) Mc Luhan recognized in 1962 the dawn of the information age.  He predicted the confusions and indecisions the new era would bring and wrote about a ‘global knowing’.  Thirty years before H.G. Wells wrote about a World Brain, a widespread world intelligence, taking the form of a network.  Wells saw this network as a gigantic decentralized encyclopedia, managed by a small group of ‘people of authority’. The network would rule the world in a ‘post-democratic’ world order.
Gleick writes that we’re still only at the start of the Information Age. Some effects on us and on our societies will only become apparent in the coming decades. Will the internet continue to evolve into a world brain or will it splinter into various parts. Will the atomisation of our media into countless echo chambers continue and what kind of society will it lead us into?
The library will endure; it is the universe. As for us, everything has not been written; we are not turning into phantoms. We walk the corridors, searching the shelves and rearranging them, looking for lines of meaning amid leagues of cacophony and incoherence, reading the history of the past and of the future, collecting our thoughts and collecting the thoughts of others, and every so often glimpsing mirrors, in which we recognize creatures of the information. (p.426)

Understandings and Misunderstandings about RCTs

angus-deatonPolicy makers and the media have shown a remarkable preference for Randomized Controlled Trials or RCTs in recent times. After their breakthrough in medicine, they are increasingly hailed as a way to bring human sciences into the realm of ‘evidence’-based policy. RCTs are believed to be accurate, objective and independent of the expert knowledge that is so widely distrusted these days. Policy makers are attracted by the seemingly ideology-free and theory-free focus on ‘what works’ in the RCT discourse.

Part of the appeal of RCTs lies in their simplicity.  Trials are easily explained along the lines that random selection generates two otherwise identical groups, one treated and one not. All we need is to compare two averages.  Unlike other methods, RCTs don’t require specialized understanding of the subject matter or prior knowledge. As such, it seems a truly general tool that works in the same way in agriculture, medicine, economics and education.

Deaton cautions against this view of RCTs as the magic bullet in social research. In a lengthy but well readable NBER paper he outlines a range of misunderstandings with RCTs. These broadly fall into two categories: problems with the running of RCTs and problems with their interpretation.

Firstly, RCTs require minimal assumptions, prior knowledge or insight in the context. They are non-parametric and no information is needed about the underlying nature of the data (no assumptions about covariates, heterogeneous treatment effects or shape of statistical distributions of the variables).  A crucial disadvantage of this simplicity is that precision is reduced, because no prior knowledge or theories can be used to design a more refined research hypothesis.  Precision is not the same as a lack of bias.  In RCTs treatment and control groups come from the same underlying distribution. Randomization guarantees that the net average balance of other causes (error term) is zero, but only when the RCT is repeated many times on the same population (which is rarely done). I hadn’t realized this before and it’s almost never mentioned in reports.  But it makes sense. In any one trial, the difference in means will be equal to the average treatment effect plus a term that reflects the imbalance in the net effects of the other causes. We do not know the size of this error term, but there is nothing in the randomization that limits its size.

RCTs are based on the fact that the difference in two means is the mean of the individual differences, i.e. the treatment effects.  This is not valid for medians. This focus on the mean makes them sensitive to outliers in the data and to asymmetrical distributions. Deaton shows how an RCT can yield completely different results depending on whether an outlier falls in the treatment or control group.  Many treatment effects are asymmetric, especially when money or health is involved. In a micro-financing scheme, a few talented, but credit-constrained entrepreneurs may experience a large and positive effect, while there is no effect for the majority of borrowers. Similarly, a health intervention may have no effect on the majority, but a large effect on a small group of people.

A key argument in favour of randomization is the ability to blind both those receiving the treatment and those administering it.  In social science, blinding is rarely possible though. Subjects usually know whether they are receiving the treatment or not and can react to their assignment in ways that can affect the outcome other than through the operation of the treatment. This is problematic, not only because of selection bias. Concerns about the placebo, Pygmalion, Hawthorne and John Henry effects are serious.

Deaton recognizes that RCTs have their use within social sciences. When combined with other methods, including conceptual and theoretical development, they can contribute to discovering not “what works,” but why things work.

Unless we are prepared to make assumptions, and to stand on what we know, making statements that will be incredible to some, all the credibility of RCTs is for naught.

Also in cases where there is good reason to doubt the good faith of experimenters, as in some pharmaceutical trials, randomization will be the appropriate response. However, ignoring the prior knowledge in the field should be resisted as a general prescription for scientific research.  Thirdly, an RCT may disprove a general theoretical proposition to which it provides a counterexample. Finally, an RCT, by demonstrating causality in some population can be thought of as proof of concept, that the treatment is capable of working somewhere.

Economists and other social scientists know a great deal, and there are many areas of theory and prior knowledge that are jointly endorsed by large numbers of knowledgeable researchers.  Such information needs to be built on and incorporated into new knowledge, not discarded in the face of aggressive know-nothing ignorance.

The conclusions of RTCs are often wrongly applied to other contexts. RCTs do not have external validity.  Establishing causality does nothing in and of itself to guarantee generalizability. Their results are not applicable outside the trial population. That doesn’t mean that RCTs are useless in other contexts. We can often learn much from coming to understand why replication failed and use that knowledge to make appropriate use of the original findings by looking for how the factors that caused the original result might be expected to operate differently in different settings. However, generalizability can only be obtained by thinking through the causal chain that has generated the RCT result, the underlying structures that support this causal chain, whether that causal chain might operate in a new setting and how it would do so with different joint distributions of the causal variables; we need to know why and whether that why will apply elsewhere.

Bertrand Russell’s chicken provides an excellent example of the limitations to straightforward extrapolation from repeated successful replication.

The bird infers, based on multiple repeated evidence, that when the farmer comes in the morning, he feeds her. The inference serves her well until Christmas morning, when he wrings her neck and serves her for Christmas dinner. Of course, our chicken did not base her inference on an RCT. But had we constructed one for her, we would have obtained exactly the same result.

The results of RCTs must be integrated with other knowledge, including the
practical wisdom of policy makers if they are to be usable outside the context in which they were constructed.

Another limitation of the results of RCTs relates to their scalability. As with other research methods, failure of trial results to replicate at a larger scale is likely to be the rule rather than the exception. Using RCT results is not the same as assuming the same results holds in all circumstances.  Giving one child a voucher to go to private school might improve her future, but doing so for everyone can decrease the quality of education for those children who are left in the public schools.

Knowing “what works” in a trial population is of limited value without understanding the political and institutional environment in which it is set. Jean Drèze notes, based on extensive experience in India, “when a foreign agency comes in with its heavy boots and suitcases of dollars to administer a `treatment,’ whether through a local NGO or government or whatever, there is a lot going on other than the treatment.” There is also the suspicion that a treatment that works does so because of the presence of the “treators,” often from abroad, rather than because of the people who will be called to work it in reality. Unfortunately, there are few RCTs which are replicated after the pilot on the scaled-up version of the experiment.

This readable paper from one of the foremost experts in development economics provides a valuable counterweight to the often unnuanced admiration for everything RCTs.  In a previous post, I discussed Poor Economics from “randomistas” Duflo and Banerjee. For those who want to know more, there is an excellent debate online between Abhijit Banerjee (J-PAL, MIT) and Angus Deaton on the merits of RCTs.