Rich and I gave a talk two weeks back at the IA Summit about how the ever-increasing amount of data we’re creating makes possible a new type of science: one that looks for patterns and correlation in data and can provide the fuel for the next round of scientific discovery.
This week I open the new Wired Magazine and there’s a perfectly relevant and current example of this trend – the Allen Institute of Brain Science and their quest to map out the genetic make-up of the brain. I wish I had come across his a week earlier – it’s complete reinforcement of our thesis:
To achieve this, the Allen Institute reimagined the scientific process. There was no grand hypothesis, or even a semblance of theory. The researchers just wanted the data, and, given the amount needed, it quickly became apparent that the work couldn’t be done by hand. So, shortly after the institute was founded in 2003, Jones and his team started thinking about how to industrialize the experimental process. While modern science remains, for the most part, a field of artisans – scientists performing their own experiments at their own benches – the atlas required a high-throughput model, in which everything would be done on an efficient assembly line. Thanks to a team of new laboratory robots, what would have taken a thousand technicians several years can now be accomplished in less than 20 months. The institute can produce more than a terabyte of data per day. (In comparison, the 3 billion base pairs in the human genome can fit in a text file that’s only 3 gigabytes.) And the project is just getting started.
The scientists at the Allen Institute are producing all of this data to enable easy access to the structural brain, specifically to catalog which genes are expressed in which of the brain’s regions. But in a larger sense, they’re producing the data first without a specific purpose in mind – thinking that by making this data available and accessible, they are opening the possibility for future discovery:
They remain excited by the idea of working on the frontier of science, by the possibility that their maps will allow others to make sense of this still inscrutable landscape. In other words, they are waiting for the future, for some scientist to invent an elegant theory that explains their enigmatic data.
One of the focus points of our talk was Tim Berners-Lee’s TED talk, where he expresses his frustration with the current Web – a delivery platform for human-readable documents – and implores all of us to make our raw data available (now!). He wants us all to stop hugging and beautifying our data and rather make it available for others to gain value from now.
Sounds a lot like Jonah Lehrer describing the scientists working at the Allen Institute, when he says “we don’t even know what we don’t know.”
Rachel Lovinger Said:
Hi Tim – Great talk you guys gave at the IA Summit. Are you going to post it anywhere? I want to link to it from my blog post about the conference.
-Rachel
Tim Said:
Thanks Rachel. Yes, we finally got the audio and will be putting it up shortly. I’ll link to it once up.