Saama integrates terabytes of data for a CPG customer using Google Cloud Platform
Google’s leading retail customer needed a solution to a complex data problem for market basket analysis and store traffic patterns by category, shelf space, and brands. Saama, using the Google Cloud Platform, loaded several petabytes of data into Google BigQuery for performing Point of Sale (POS) and Market Basket analysis during the busiest and most profitable hours of a given store.
Saama is a premier strategic implementation partner for the Google Cloud Platform division. We help Google customers with their Big Data implementation, strategy, architecture development, as well as ongoing operations of implementations. The case below is based on an implementation experience where we were tasked with developing an end-to-end data ingestion pipeline and incremental data loads with specific requirements.
A large U.S. retail chain wanted to load several terabytes of data into Google BigQuery for performing Point of Sale (POS) and Market Basket analysis. Specifically, they needed to quickly analyze the number of household trips for profitability during the busiest and most profitable hours within a given store for any selected product or brand. As a part of this project, five objectives were established, mainly associated with data loading, processing, and querying.
- Moving data from disparate locations to Google Cloud.
- One time data load under 10 hours.
- Pre-processing of data before loading data into Google BigQuery.
- Lights out operation.
- Scalable solution.
Saama chose Talend for it’s data integration capabilities which included native connectivity to Google Cloud Storage and Google BigQuery. While Talend was not immediately available on the Google Compute Engine, we were able to work with the highly responsive team at Talend to make some tweaks and get Talend multi-node deployment to work on the Google Compute Engine.
We made extensive use of the utilities provided by Google to manage data transfer into the Google Cloud Storage. These python based utilities are extremely versatile and can be easily written into scripts with proper error handling.
The result was an architecture, which could scale on demand, process the data sets with ease and can drive nights out operation with the help of inbuilt error handling.
Saama was able to demonstrate the power of the Google Cloud Platform, transforming terabytes of fast moving data into insights within seconds. The Google Cloud, Talend, and Saama Technologies solution gave the retailers’ data scientists an option to ask on-demand business questions, and scientists were productive in days as opposed to weeks or months, as the wait time for information was reduced to seconds.
The number of household trips can be analyzed for profitability at a speed that was not achievable in the past. The busiest and most profitable hours within a given store can now be analyzed within seconds for any selected product or brand.
Saama continues to work with CPG and other industry leading clients, converting repeatable problems into framework solutions with not only a daily query time savings, but an overall significant time-to-market value and delivery of critical insights.