Big Data is not all Social Media
It has been a little over a year since the term “Big Data” became a catch phrase, but it still appears to evoke the same kind of response as it did then with an interesting twist. The twist being, it evoked a sense of “awe” to begin with but that is turing into a sense of “confusion”. There is still no one crisp, common, consistent definition of the term big data, which is one of the reasons for this confusion [reminds me of the .Net days, when it referring to everything from a coding framework, to a database to office products etc confusing the heck out of everyone]
However, one commonly used to phrase describing the virtues of big data and agreed upon by both Gartner and Forrester (yes, that does happen every now and then) are the 3Vs – volume, velocity and variety.
Social media data clearly has volume, velocity and variety, so it appears to have influenced a large number of folks to conclude that big data is this ever growing data from social media. In fact for some folks in the corporate world, who do not have any presence in the social media space and further do not believe that that social media data has any value for them (another big fallacy), it appears they do NOT have to deal with big data.
Well for one, I do not believe any enterprise will be immune to what is happening in the social media world. In fact that is the reason, so many of the enterprises are taking an active role in defining their brand’s social media presence and messaging. Because, if the enterprises do not take up this responsibility, it will be filled in for them, but may not be in tune with their expectation.
Social media data aside, big data attempts to bring another valuable and completely untouched dataset into the information processing and insight generation systems. From the 3rd “v” – variety – the “unstructured” data. Unstructured data is a term used commonly used to refer to all of the data in the form of documents, presentations, web pages, emails etc most of which are on file servers and rarely on structured database platforms.
Although the potential value in mining all of this unstructured data can be huge, it is an enormous exercise to even attempt to do this.
One “crawl before you run” approach I recommend to leverage the power of big data, is to take advantage of some unstructured data that currently exists within our structured databases. All of the free text comments captured in almost every major enterprise application. From order processing systems to call center applications to technical support systems and so on. Specifically in the Healthcare and Life Sciences world these are things like patient disease diagnosis, trial enrollment, sales field interactions data, survey data and so on. In short its anything the data that gets captured into the free form text fields, that exist in almost every screen of any major enterprise application. Enterprises today have the opportunity to mine all of these free text datasets and bring these insights into the information management platforms. Using advanced text analytics, data mining and machine learning techniques, the insights from the traditional structured world can be significantly enhanced.
There are several tools out there that can be leveraged, and many of them have evolved and matured over the last couple years to make things a lot more easier. But be forewarned that even then this step can be a little daunting to begin with. But do persist and all of the comments / notes / feedback that got put into your system, will bring out some very insightful business nuggets. Also do not forget the importance of having people who understand both your business and the data, to work with this. Unlike a structured data analysis exercise which can also be tough, the “insightful business nuggets” from the unstructured world can be harder to glean for people who cannot relate them to the business.
1) Saama SixthSense for a Saama's point of view presentation on Big Data and why you need to listen to your Big Data.
2) CIO Analytics: Managing Business Value of IT for a whitepaper on how to measure, Plan and Cost Business Value of IT using a Service Oriented Delivery Model.
3) Big Data is the Answer - What was the Question? for a recorded version of the webinar hosted on February 17, 2012