When we talk about analytics in the cloud, the two big issues that historically came up were scale and security. While there has been a lot of progress and innovation in both these areas, larger issues have emerged around how to deal with fragmented data in the cloud.
Cloud computing has made enterprise data management far more complex. In the on-premise world, most enterprise apps resided within the firewall. Data was extracted from transactional systems and moved into EDW, data marts and now data lakes to perform analytics. Data movement was cheap, but data storage was expensive. Big data stacks in this context enabled moving compute to data as the economics of data storage changed.
Where in the cloud is my data?
In the cloud world, though, the data is no longer inside the firewall and not even in one cloud. This makes data movement very expensive. There is no clear answer as to how to solve this fragmented cloud data problem in the context of analytics.
The true vision of an Enterprise Data Warehouse was probably never achieved in practice, and a Data Lake strategy is now the new trend (hardware costs, distributed computing and open source serving as catalysts). Both of these strategies aim for the same goal–to bring data together from different systems into one logical place where data can be analyzed and mined.
In the cloud, as long as your incremental loads are not huge, the initial data load problems are solved by physically shipping data to the cloud. So if we figure bandwidth issues out, scale in terms of compute, memory and storage is to a large extent achieved. Architecturally, it is possible to imagine a cloud-based data infrastructure with elasticity around compute, memory and storage.
Security issues are of course real, but they are also about risk management. I think cloud data management could have faster adoption if the lawyers figure out how the liabilities play out around data privacy and security. But there is progress being made to get cloud infrastructure more secure and compliant with stringent data security norms, especially in government, healthcare and financial space.
The Future ahead…
One possible end state scenario for analytics on the cloud could be to consolidate with a cloud vendor for all your applications and do analytics on that stack. Along these lines, Microsoft recently purchased Metanautix technology to allow IT teams to connect information across private and public clouds, without having to go through the costly and complex process of moving data into a centralized system.
It will be interesting to see how this unfolds, but I believe the launch of basic cloud-based BI and analytics offerings from major cloud vendors is the first step in the right direction.
Contact us to find out how Saama can help you navigate this changing landscape.
Also see Saama’s three industry-specific analytics QuickStarts on the Microsoft Azure cloud platform – Real World Analytics for life sciences, Patient Experience Analytics for healthcare, and Fraud Analytics for insurance.