Why large enterprises and EDW owners suddenly care about BigData

While most of big data is geared towards social media and stream analytics, traditional EDW can also best leverage the power of Big Data. The concept of Big Data is not new, banks have been doing it for a while using mainframe size computers. The reason it’s being talked so much now is that for the first time, cheap and massive computing power and even cheaper memory has put mainframe size power in the hands of every organization, right at the time when organizations have been struggling to justify the ROI in processing such exponential data volume.


Big Data is not a performance engine. i.e. it is not a traditional database that can run queries faster. It will also not replace traditional reporting strategies. What it can do is, it can batch process millions and billions of records both unstructured and structured much faster and cheaper. What has also become possible with BigData Analytics is the ability to merge all analysis into one platform. As a direct result, data analysis has become more accurate, well-rounded, reliable and focused on a specific business capability/advantage.

Before investing money in buying commodity hardware and calling consultants to wave the big data magic wands, companies should do a lot of soul-searching because once you set the wheels in motion, it is likely to take up lot of your organization’s focus. To decide where you are in the BigData spectrum it is important to look at the 4 V’s – Volume, Velocity, Variety and Variability of your data as shown in the info-graphic below.



A key question to ask would be, if you have enough data volumes at the source to justify the use of Big Data processing (Average Data set > 300GB). If you don’t, you should consider investing in building a traditional enterprise data warehouse and fine tuning your reporting metrics. If yes, you should move on to the next question of how you want to process this amount of data.

One of the key technologies that is widely being accepted by large Enterprises for BigData Processing is Hadoop. While this technology provides the processing power, the algorithms to make sense of this data will still need to be developed in-house. The most frequent application for Hadoop is to support the “Transform” in traditional ETL (Extract, Transform, Load), where the data is in myriad of unstructured, semi-structured, and structured formats and loaded into terabyte-scale analytical data marts where predictive modelers and other data scientists can work their magic.

Hadoop and traditional EDW technologies can co-exist in the same ecosystem as shown below. Each has its own strengths and when combined provides a potent mix for your analytical needs that we have seen in few large companies.


Traditional EDWs built on relational, columnar, and other approaches for storing, manipulating, and managing data will continue to exist. All of your investments in pre-Hadoop EDWs, data marts, operational data stores and the likes are reasonably safe from obsolescence.

The reality here is that the EDW is evolving into a virtualized cloud ecosystem in which all of these database architectures can and will coexist in a pluggable “Big Data” storage layer alongside HDFS, HBase (Hadoop’s columnar database), Cassandra (a sibling Apache project that supports peer-to-peer persistence for complex event processing and other real-time applications), Neo4j (graph database), and other “NoSQL” platforms.

Beginning with a Bigdata implementation really boils down to one basic question, do you have the use cases for it? We will post few sample use cases that are being adopted by large enterprises in our next posting. Stay tuned….


About Saama Executives

Saama Executives is an exclusive group of thinkers, leaders, mentors and innovators within the company. The members of this group come together from time-to-time to pen their thoughts on topics that would matter the most for the industry. Over the last few years, the group has written some brilliant pieces of Insurance, Life-Sciences, Healthcare and CPG industry.

Related Posts

All data may not need to be present in the memory. Moreover present data may be compressed. If the exercice is to have huge compuations, the scientist should know which data to be left in the memory. Machine Learning calculates hypotheses on parallel computations as neural networks are heavily parallel. Also interesting is data reduction. Sometimes algorithms to be looked after is not so important as data quality. I may help you with huge data. Write at todorf@videotron.ca

Durga Prasad says:

@claudius..Sincerely appreciate your feedback. We are happy to receive your active participation and hope you will continue to read and comment on our future postings too. Will surely get in touch with you when the need arises. Till then..happy blogging.

Tomas Kuzar says:

Big Data is very new topic and young open-source technology and I am not sure whether a big companies will move existing structures and data from EDW to Big Data after reaching some amount of data (>300GB). From my perspective Big Data gives the opportunity for new original use cases to be explored, e.g. processing of data captured from sensors or cameras corelated, social web data corelated with existing ERP systems etc.

Leave a Reply

Your email address will not be published. Required fields are marked *