The data explosion: so much to look at, so much to sift through! We are so fascinated with big! Big boats, big cigars and big shoes. But what, really, is big data? What does it mean to your business? And how can you or should you manage it? Although adoption of the buzz word is new, the management of big data is not.
So What Is Big Data?
Big data vs. data is really about the challenge that the data explosion presents to us. Today we have unprecedented access to multiple data sources about society, economics, products and people. It comes to us from multiple channels — systems, web and mobile. In addition, our generation, unlike past generations, is much more likely to share information with trading partners, necessitating the use of secure, managed data transport technology.
Companies have systematically accumulated a lot of their own historical data. As well, data can allow them to look at patterns in customer and environmental behavior and create predictive analytics that can help with everything from avoiding catastrophic events, detecting stress patterns in machines so they know when to repair them, to predicting demand for new products. The problem is that accumulating all this data takes space. And analyzing it takes software. Thus, the current rage in discussing big data.
“Big data seems like a marketing buzz word for technology that was once commoditized like ‘computer hardware’ to get back some attention in the media,” said one CIO we talked to. But is there more to it? Like all the tech hype, seers (big consulting firms) are exclaiming that big data will change the world and improve your ROI. And the big companies touting big data are companies like IBM, HP, SAS, EMC, and Microsoft, to name a few, that sell storage and analytics to help you get to the bottom of all that data!
Big data is all the buzz now, because new sources and new forms of data are available. We have our existing data plus new unexplored or unmined opportunities afforded to us from the internet and social networks, which bring us structured and unstructured data. The theory is that there are things to learn there — about customers, about markets, about innovation — that can mean bigger opportunities for us.
The science community has been grappling with this problem for many decades. The Human Genome Project, NASA, research labs around the world, as well as various government agencies, have acquired massive data centers. They use modeling software to catalogue and predict everything from weather patterns to human (and bird) migration patterns to track disease outbreaks. And, of course, law enforcement and counter-terrorism are big customers of big data.
But big data is among the common folks now. We personally own so much storage in the form of back-up drives (terabyte drives are now available inexpensively), sticks, CDs and DVDs. This doesn’t even include the elastic, cloud-based backup services you can buy for all those photographs you might never look at again. (But it’s cheap — right?)
Another CIO flatly stated, “What I care about is source, size, security and sense — that is making sense of it, or analytics. Just because there is data all over the place, I am not sure of where it comes from and if it tells us anything useful about our customers. And size? That is how much money I need in the budget to deal with all the databases the business users want. And security. I don’t want users downloading stuff with malware. My main issue with all these is what’s the point? Does all this data matter to us?”
When the topic of big comes along, I often think about Big Daddy from Cat on a Hot Tin Roof, the master of living large.He asked, “Wouldja look at all this stuff? . . . Got any idea how much it’s worth?”
That is just the question you have to pose for yourself. We are storing, streaming, searching and analyzing, but what is relevant information? We also talked a bit about the issues with data management — searching, sensing, acquiring and analyzing for many years; but now the data seems to be piling up. And with the adoption of serialization, RFID, Visual Intelligence and other types of media, the pile keeps getting bigger and more costly to manage.
One of the super brilliant database people, Tao Lin,1 explained to me that there are real limitations in how organizations store and how application providers ‘serve up’ data to users. “We store a lot of data, but what matters to users is a small fraction of that data, which is relevant to only him, or her.” In addition, technologies like “Auto-ID and serialization can tell you what is in front of you — 20 cartons — but your database says you should have 24. So I really want to know about the other 4 and I don’t have that data. Some call this exception management, but that is really what we are looking for. So how we search and acquire should be based on the use case.”
Edward Tufte, the master of envisioning and displaying quantitative data2 scoffs at this recent buzz on the topic. He stated (and he knows from working with a wide range of organizations in science, financial services, publishing, government, and the creative community) that people are chasing huge databases, but there is truly only one bit that might be important to know, track, and chart.
He mentioned in a recent seminar we had with him in Boston that many organizations collect everything, surmising that there might be an answer to something in there, rather than asking straightforward questions and then extracting the data relevant to solving the real problems they have.
So Tao and Tufte — experts on data usage and applicability — seem to be pointing us to relevancy.
Pardon that meandering above, but I hope it will lead us to the relevant.
Data Management vs. Big Data
So, big data may or may not be a new thing for many companies, we hear. However, a few aspects about big data that end-users need to consider are:
- Design and management — create data stores and data warehouses that support the work for different users and departments
- Adhering to and participating in industry data standards
- Mining and analytics — leveraging data for predictive tasks such as risk management, weather prediction and consumer behavior (forecasting and modeling)
- Making it secure — the more accounts, the more storage, the more vulnerability. And this has to be a consideration today with so much hacking!
Most governments seem to be on a quest to collect even more data about their citizens. From Beijing to Buenos Aeries to Barcelona to Boston, let’s collect as much data as we can! Data from a wide range of services — from healthcare to drivers’ licenses — has skyrocketed. And they do correlate this data. If you have a misspelling or used your nickname years ago on some application, you may find yourself unable to get healthcare or renew your license without a major reconciliation of your data.
On the upside, we have noticed a very strong correlation between data management strategies, in general, and improved performance in business. Successful firms such as Amazon, Walmart, Dell, Apple, and many modest-sized organizations (you can join a webinar we are doing next month with two such firms) embrace the value of information as a source of wealth, and not just for what it can tell us about the future. These corporations also find ways to make data actionable. They do not have aimless data collection schemes. Rather, their data collection is application driven, and, therefore, pertinent to managing their business processes.
These firms have better cash positions and seem to have been in control of their supply chain due to adherence to data standards and communications technologies (read about MFT in this issue) which allows them to reduce their information cycle times. This, of course, contributes to the management of all that data.
Conclusion: Big Data Is Big Business
Users are also like Imelda — more closet space is always required. Here is the very advantageous opportunity for cloud services, providing off-site elasticity. Since we seem so enamored with collecting and staring at data, the technology sector is quite willing to scale its services to us — for storage, backup and recovery; alternative site backup for risk mitigation; surge capacity for big analytic exercises, seasonal work volumes (Christmas), or web pages for the weather channel during hurricane season.
Cloud applications are springing up for tracking and tracing — from shipping to item level standards-based identification. Location-based services, mass couponing for promotional sales, freeware and free storage, social network sites that store content, and of course the masters of really big data — Google, are just a few of the big data closets out there, so you don’t have to get your own mass storage.
Over the years, watching and sometimes participating in upsizing, downsizing, and moving the business, I observed a curious thing: the data we thought we must have — we don’t use very much of it. It becomes waste. And as I was buying yet another shredder recently, the philosophical end-point of all big data has become clear to me.
Some good definitions and information can be obtained at Wikipedia.
To view other articles from this issue of the brief, click here.