As I See It: All Things Big
March 25, 2013 Victor Rozek
Many years ago, 60 Minutes did a segment on the Israeli Air Force. At the time, the United States had just sold the Israelis a handful of our latest jet fighters and 60 Minutes wanted to see how Israeli pilots were using their new toys. Turns out they had made a number of low-tech modifications (some classified), which the Israelis believed would make the planes more user friendly. For one thing, they installed an inexpensive rear-view mirror so that the pilot could track enemy aircraft without turning his head. The reporter also asked about the vast array of gauges, flashing lights, buzzing indicators, computer readouts, and other assorted messaging systems that take up every available inch of cockpit space. “How do you keep track of it all?” he marveled.
“We don’t,” came the reply. “We shut it all off.”
The Israelis apparently didn’t want to risk having their pilots shot down because they were being distracted by too much data. At supersonic speeds, less is more.
History may record that this was the last time anyone deliberately opted for less data. In the decades hence, data gathering has become as obsessive a pastime as football is in Alabama. Governments, corporations, police forces, political parties, anyone with the ability to capture, store, and analyze data is gorging on the stuff. The nation’s obesity problems are not limited to Big Gulps and Big Macs. Information crunchers are grappling with a pervasive form of digital corpulence otherwise known as Big Data.
The modifier “big” when describing data is both modest and inaccurate. Once a manageable trickle, data streams have mutated into something resembling giant cancerous life forms, always growing, ever in need of new hosts. They even require their own linguistic inventions. In relatively short order, we’ve burned through kilobytes, megabytes, gigabytes, terabytes, and petabytes. Big Data is now measured in exabytes, a mind-bending quintillion bytes, or one billion gigabytes, if you prefer. That’s a number commonly associated with grains of sand on the world’s beaches, galactic star counts, or visits by Lindsay Lohan to a rehab facility.
But the language of Big Data is already sprinting ahead of capacity. Zettabytes, yottabytes, brontobytes (do they come with fries?), and geobytes lurk around the corner. And why not? Estimates are that every day 2.5 quintillion bytes of new data are being created. And whether motivated by scientific discovery, paranoia, or greed, someone, somewhere is busy scooping them up and searching through the stack with the urgency of a single guy pawing through an overflowing clothes hamper looking for a useable pair of unds.
Sources of data collection are as ubiquitous as uses. From remote sensing technologies with military, archeological, atmospheric, and oceanographic applications, to street cameras, microphones, RFI readers, wireless sensor networks, anything to do with phones or the Internet, and fleets of drones soon-to-be crisscrossing the nation, data keeps rolling in with the inexorability of a bore tide.
As the modifier implies, Big Data is collected in service to all things aspiring to be big: Big Science, Big Commerce, Big Media, and Big Brother. It’s the ultimate manifestation of finding the pony in the manure pile. If you’re looking for the Higgs boson, be prepared to sort through a great deal of manure before you find your pony. According to the fount of all wisdom, Wikipedia, experiments conducted at the Large Hadron Collider “represent about 150 million sensors delivering data 40 million times per second,” which creates one hell of a pile. Likewise, genomics, astronomy, and just about any interdisciplinary scientific research generate dust storms of data. As do our personal viewing habits.
Recently Kevin Spacey became the poster boy for Big Data when Netflix crunched viewer preferences and concluded that Spacey plus political intrigue equaled happy paying viewers. So the company invested a ton of money remaking a British series called House of Cards. I was one of those happy paying viewers, but no need to thank me. According to Andrew Leonard writing for Salon, Netflix apparently not only knows what you watch and when and where you watch it, but also what device you’re using, and where you pause, rewind, or fast-forward through the video stream.
But even Big Data has limitations. It can record the fact that an event occurred, but it can’t tell you why that event occurred. Maybe you rewound the video to replay that torrid sex scene a half-dozen times. Maybe you were watching Marlon Brando and he was mumbling so you missed some part of the dialogue. Maybe you got up to make a sandwich. And if you fast-forward, was the movie boring, or were you just looking for the next sex scene? Big Data provides a reliable means to predict effect, but it is really lousy at determining cause. No one, for instance, has ever been able to figure out why vampire movies are so popular.
But if you were so inclined, you could use Big Data to figure out when women you’ve never met were pregnant. Target did. Charles During, intrepid reporter for the New York Times, discovered that the retailer tracked certain purchases couples made at specific times, such as vitamins, unscented soaps and lotions, and hand towels, from which they deduced an impending blessed event. Then, discount coupons for baby carriages, bassinettes, and other newborn gear would darken the skies. In one instance an irate father of a teenage girl accused the retailer of trying to get his daughter pregnant, only to discover that Target knew the truth before he did. (In this case, however, the cause was relatively simple to deduce.)
More problematic than the cause of pregnancy is the cause of crime. Agencies with sufficient computing power at their disposal are using Big Data to solve old crimes and predict where the next crime is likely to occur (that’s easy, Wall Street). The emerging fields of algorithmic criminality and predictive policing have shown promise but evoke eerie comparisons to the movie Minority Report in which people were arrested for a crime they were only foreseen to commit.
Departments struggling with staffing cutbacks are using data analysis in lieu of neighborhood policing. Knowing, for example, the days of the week, times, and locations that certain types of crimes are likely to be committed, allows for the more effective deployment of available personnel. Of course, eventually, criminals will use Big Data to figure out where the police will be.
Preemptive government, like preventive healthcare, is an attractive and sensible concept. But it doesn’t take an Alan Turing to figure out that the majority of crime will occur in drug-addled, poverty stricken neighborhoods–the kind of crime that actually gets prosecuted, that is. Data may be color blind, but in a budding surveillance society replete with players eager to record everything you do, the opportunity for abusing this technology is palpable. I, for one, plan to be careful about what I rewind.
It was none other than libertarian pin up Ayn Rand who said: “Civilization is the progress toward a society of privacy.”
Well, when last seen privacy was sailing off on the good ship Algorithm powered by a limitless supply of invasive fuel called Big Data.