Total Cost of Ownership or simply TCO is a topic that is amply discussed in the IT – Business coalition. Yet, most businesses struggle to provide evidence of value at reduced cost. Citing an example of a BI and Data Warehouse solution implementation initiative, Vasant Shetty gives an insider view on how discussions evolve and how cost can play a major role in risk mitigation and value maximization.
Data! Data Analytics!! Big Data!!! Data Scientists!!!!…I hear these words more often than ever before. If you meet any Enterprise Level CXO, they all say they are on a journey to implementing big data and analytics initiatives.
Each organization has some or the other initiative going on around data analytics.
Drawing on my experience from meeting a few enterprises IT senior execs, I can point to 3 major topics that usually get discussed quite often;
- Huge investments have already been made in large BI and Data warehouse solutions
- Defining ROI for big data projects is a challenge
- How can I leverage some use case for my big data and analytics initiative
Prediction: In the next few years, IT teams will be overwhelmed by the sheer amount of Data which the system generates or gets stored within a data warehouse.
Along with investments into analytics, there is always pressure on IT to reduce cost and also be innovative. Outsourcing, leveraging of cost arbitrage, etc. is a thing of the past.
So what should the next generation enterprises focus on doing? The current challenge is similar to the days in early 2000 when Business Intelligence was getting into main stream business.
In my opinion, the time is right for IT to relook at the whole ecosystem around data and look at the Total Cost of Ownership (TCO) for delivering analytics to business.
Most of the business intelligence systems were designed and developed almost a decade ago and have been improved and upgraded since then. In the last 3-4 years, the cost of Managing (Storing), Processing and Transferring data has dramatically reduced. Which should have us believe that the total cost of ownership has also reduced.
Well that’s not true! Large Companies are not able to derive the benefits of reduced cost and upgrades because of the existing sunk cost, inability to quickly adapt to new technology and failure to prove its value to business.
In today’s world, new start-ups or modern organizations are able to move faster because they don’t have to carry the baggage of system retirement. They can therefore perform the same feat as large enterprises with flatter technology stack and fewer people.
Now let’s talk a bit about that business case for moving to the next generation data and analytics platform. Cost, along with user experience and business value is a major driver for change. Businesses can realize direct savings by categorizing costs under;
A) Cost of Data Storage
Most enterprises are actually spending more money on managing, maintaining and enhancing warehouse of data rather than data analytics.
Let me explain. Enterprises developed various data warehouses at different times and have ended up with large volume of data being duplicated. So they need more storage and more processing time because same field is processed at various server, storages space, etc.
These systems used a Schema-on-Write method where you define your schema, then you write your data, then you read your data and it comes back in the schema you defined up-front.
Then arrived the unstructured data, with high volume and real-time in the form of voice, streaming, text, etc. which needs more storage, cleansing and structuring before it can be consumed and analyzed. This creates challenges in enhancing, changing and makes the whole process complex for IT to deliver.
Due to variety, you never can define the right schema as you don’t know how data will be used. The answer to the problem is, Schema-on-Read. Here, data is applied to a plan or schema as it is pulled out of the stored location, rather than when it is stored. This is a different paradigm altogether.
So the teams are struggling between the old and the new paradigm and the outcome is hybrid and more complicated. The solution path chosen is either to replicate the old data on the new system or to stream back the new data into the old system.
This increases the cost of ownership of data and the time to integrate and make the information available to the business.
Instead, perhaps a better approach is to take a step back and implement or build the system with the new paradigm where all the data is loaded in one place as raw data and develop new analytical apps for the business users based on their context or based on the schema required for the read (and not write).
The net effect is Data Stored Cost due to duplication is reduced to 1.
Let me cite an example. If Name, DOB, etc. of Customer or other information required is stored in multiple data warehouses when a central Data Warehouse is available, the cost of storage is high.
Old Paradigm DSC1+DSC2+DSC3….DSCn (Data is cleaned and stored in different business schemas – Pay all)
New Paradigm DSC1 only (Data is stored in raw format and accessed through various schema depending on the context – Pay one)
B) License cost
One of the costs for business analytics system is the license cost for the database management software. It is a lock-in and with no flexibility for change. The effort and cost of migration is expensive too.
Thankfully, there is a positive change with introduction of Hadoop (storage of data using HDFS) ,distributed processing using MapReduce and various other technology stack that has very low cost of ownership in terms of license and significantly reduces your Fixed Operating Costs.
C) Data processing cost.
As discussed earlier, multiple instances of data storage also adds to multiple times the same data is processed in standalone servers and datacenters before it is delivered to the business user.
Our Recommendation from our experience is that IT moves to a Cloud based system for analytics which is modelled around pay-per-use basis. It is based on the business user’s needs (business cost) and not as an IT centric standalone cost.
Processing will be based on data processed during consumption and not for storing and retrieval. Once the data is on Cloud, retrieval of data and information is simpler as analytic systems are designed on Schema-on-Read model.
D) People cost.
With this change, the number of people required for managing, maintaining the system is also reduced and consequently the reduced total cost of FTEs on these specific activities.
People bandwidth can then be invested in analyzing data and in deriving better insights rather than cleaning, processing and integrating data.
Today, organizations on this journey have reached a stage where they are providing data as a service (DaaS) to business users. Data is stored in raw format with a platform on top which provides data as a service.
While there are costs associated with any initiative, understanding of costs and its categorization helps in taking steps to reduce the TCO at a modular level.
In the next series, we will see the how Data and Analytics platforms are going to replace traditional business Intelligence product based systems.