Feeling Like A Heel
November 28, 2011 Timothy Prickett Morgan
Correlation is not causation. That’s one of the first things you learn as an engineer or scientist. But as human beings that are genetically predisposed to find connections between disparate phenomenon, we just can’t help ourselves. We are, in the final analysis, pattern recognition machines that are wired for small tribes and that have a tendency toward gluttony with fatty, sweet, or salty foods. (Which is why Chubby Hubby ice cream from Ben and Jerry’s should be outlawed, right after I finish this here pint.)
We may be designed to live in small tribes and do hunting and gathering, knowing perhaps 200 people reasonably well, but we live in a world with 7 billion people who are increasingly connected to each other economically (it really makes little sense to talk about state or national economies, as if there was an external frame of reference) and literally through social media like Facebook or the various phone, Web, messaging, and email networks. Everything we do on these networks is tracked by someone or many different entities, and there is a tremendous amount of data from which information can be gleaned and services to be provided by making correlations between data sets. This is how Google makes tens of billions of dollars a year, after all.
In the past, the companies that had access to capital and knew how to deploy it and grow it were the ones who ran the economy, which is why we call our economic system capitalism. While capital is still important–just try to start a company on $500 and a dream, as I have done twice now, and you will see what I mean–raw data and the ability to leverage it, to put it through the combine and turn it into information, is what matters most. I guess our economy could more accurately be called informationism. And what that means is that the companies that have access to the data that affects their business and their customers (as if they were different things in an ecosystem) will win out. It’s no surprise that there is so much volatility in financial markets, with so many people betting on or against just about everything we do.
The recent “big data” craze is just a reflection of the belief that keeping all operational data from a business and correlating it to larger sets of information external to the corporation will provide some kind of insight that will protect sales or drive new revenues or somehow give companies an edge. And while information certainly is not free, no matter how much Internet enthusiasts may tell you so, just like those who had pools of capital got to rule during the Age of Capitalism, those with pools of data and the knowledge of how to leverage that information are going to be able to transmute that into capital. (After all, money is the way we all keep score in the economy.) So, to a certain way of thinking, information is a first order integration of capital: If you have money, you can build a business that can gather, store, and harvest mountains of data and convert it into more money.
IBM doesn’t just talk about smart analytics. It does it, and such analysis is continuously performed on its own operations to find ways to cut costs. I told you about one of these tweaks to IBM’s operations back in March, when it killed off the quick ship option for Power Systems servers in one of thousands of moves that IBM will make from 2011 through 2015 to remove $8 billion of costs out of its own operations. No, instead of having partners do final assembly based on real system orders and Big Blue doing emergency quick ships, IBM does all final assembly and all shipping. In effect, everything is quick ship and IBM cut out $3.5 million in annual inventory costs.
IBM is not just trying to show that analytics can help run your business, but that it has its finger on the pulse of the Internet and the social media sites that allow it to be on the front end of consumer and corporate trends and that it has the propellerheads in its IBM Research division that can chew through mountains of data and come up with the algorithms to make (heaven help me I hate this word but this is how C-level execs talk) “actionable information.” I’ll give you a few examples.
Here’s a recent one where IBM trolled the social media sites (presumably Facebook but it could be other sites as well) to show that the correlation between the height of women’s high heeled shoes and the strength of the economy.
I knew there was roughly an indirect correlation between women’s skirts and economic strength–the shorter the skirt, the better we are doing, apparently. What qualifies as short has changed over time as well, as you can see from this chart, compliments of Wikipedia:
Generally speaking, the dips in the hems correspond to dips in the economy, according to this rendition and generalization of the trends. My point in showing this is not that I want to argue about cause and effect, that when the economy is tight people are feeling a little more exposed and want more fabric for their money, and that if you are a dress maker, miniskirts are probably a tough sell about right now. I am merely pointing out that this is precisely the kind of needle-in-the-haystack correlation that the IT giants are promising that they can find about your business and your customers. This one only helps the fashion mags and clothiers.
Just after The Four Hundred went on hiatus for the Thanksgiving holiday, IBM’s big data experts chewed through 100 years of heel height data for women’s shoes and came up with a correlation that fits, more or less with the skirts. Heels get higher during economic recessions and depressions in the Western economies, or at least have historically. But in the past four years, IBM researchers found that out there on the fashion blogs and other social media, the chatter about heel length is about shorter heels, to the point that we are averaging around 2 inches compared to 7 inches in the belly of the Great Recession back in 2008. Take a gander:
Personally, I think this is not actionable information at all, but that is my own personal bias showing through. People who have tons of money to spend on expensive shoes live in a different world from the one I inhabit, where a good steak and a great beer matter a lot more and comfortable sneakers and shoes are a lot more important than fashion. (I still fit into the Brooks Brothers suit I bought to do job interviews 24 years ago. I am not exactly a fashion bug.) But these kinds of correlations are what people are yammering about when they talk about big data. Instead of looking at heels, data munchers are looking to figure out who we are and what we like so they can pitch the right ad or right product or the right service to us at the right time. It is like they want to change the world into one, giant Amazon.com shopping experience.
This neglects a few things. First, there’s a funny thing about people. They don’t like to be perceived as being more predictable, even if it is true, and the knowledge that they are being so thoroughly metered will, I think, drive people to behave in different ways. They will not rage against the machine so much as try to mislead it. Second, we are not the sum of our data sets, or said like the financial services companies put it, past performance is no guarantee of future performance. We evolve. New people come into our lives and introduce us to new experiences that change who we are and what we do. The data munchers are always playing catch-up. Well, for now. In a “perfect” world with absolute tracking and frictionless, instantaneous information exchange, Google would have been able to figure out what I was writing about for this essay as I was doing. In fact, it may have even been trying for all I know, since I used Google about 25 times as I wrote.
In another example from this month, IBM created a mobile shopping index that is derived from the buying habits of customers shopping through the Web and through smartphones. The index is based on data culled from retailers that use IBM’s recently acquired Coremetrics online marketing software in conjunction with their storefronts, and it shows not only a rough correlation between online surfers and buyers coming in via mobile devices, but that both surfing and buying are on the rise, from around 4 percent of total traffic and sales in October 2010 to roughly 10 percent in October 2011. IBM is predicting that during the holiday season this year, which technically started on Black Friday, 15 percent of shoppers online will come in through a smartphone or tablet. What this means is this: if your company’s online systems aren’t optimized for these devices, you lose.
What IBM is pitching to your company, by bringing up these metrics at all, is that it has expertise in analyzing data, which you probably don’t, and perhaps more importantly, that through its deep connections with the IT departments of the world, it has access to data that you do not. Do you have full access to Twitter’s feed? Or all the searches on Google or all the page views that Google’s Chrome browser sees? Or similarly, can you see all the Bing search data and track all the Internet Explorer bouncing around? Can you churn through 800 million people and their connections on Facebook and see patterns? No, you can’t.
But I will be willing to bet Sam Palmisano’s last dollar that IBM Global Services would be willing to charge you a pretty penny to help you make correlations in data big and small and make you feel like you know more than perhaps you actually do. Ditto for Microsoft, Google, and anyone else who has big pools of consumer and corporate data. But before you give them big piles of money, you might want to figure out what structured and unstructured data you are already collecting and how you might make use of it yourself and how you might bring in other datasets from the outside to look for interesting trends. The open source Hadoop big data cruncher is just a job scheduler and analytical engine written in Java, so don’t be afraid of it. Take some time, build a skunkworks, and learn about it.
And don’t forget to use that gray matter in your head to make your own connections and correlations. Google just has a bandwidth advantage and access to the data. You are just as good at it as Google, so don’t feel like a heel.