Datawarehouse to Datalake Modernization

As per the recent research(s) being conducted across different types of industries, there has been a new addition in the list of the most valuable asset/resources available to an organization. Now the question is – what is it that is being discussed in the industry at a scale or standards of gold or oil?

The answer is- DATA

Is it right to conclude that DATA is the most valuable asset for an organization?

If we look around ourselves, our day to day routine – we as customers are expecting more and more personalized content with even better experiences without incurring any extra cost. This is where DATA becomes more valuable for any organization to provide their customer base with personalized content and a flavor of special attention to them.

Almost every organization is generating or capturing data from multiple sources. These sources may be systems/applications which exist within an organization like social media, CRM, marketing platforms, log files or systems which are not a part of the organization like competitor data and many more. Data exists all around but the variety, volume and velocity with which it is growing makes it unusable.

How can such data be made useful? A Datalake might be an answer

What is a Datalake?

A datalake is a system which stores the RAW data in its original format. We may or may not know what type of data it is, what is the format of the data – data can be structured, unstructured, machine to machine, log files, real time. Such vast amount of data is not processed and made actionable till the time its needed or the purpose of its usage has been identified by the business. Datalake is often blended or mislead with the term datawarehouse

What is the difference between Datalake and Datawarehouse?

Any DATA related discussion is often not concluded without these two terms being discussed. People often confuse these two as synonyms to each other but the fact is that they are very different with a resembling criterion of data storage. Therefore, it’s important to identify the difference between the two as they serve different objectives for the business.

The user base for these two systems is entirely different – Datalake is being often used by data scientists or data engineers to fetch the raw datasets for different operations e.g. machine learning operations etc. whereas Datawarehouse is being used by data analysts or business owners to get actionable insights for a specific identified business usage. Datalake is to make RAW data useful, accessible and identify a purpose to use it for business intelligence and analytical needs.

At IGT, we identified a business problem for a low cost carrier operational within APAC and Europe.

The carrier had ample amount of data available with them and most of it existed in raw and unprocessed state which was never analyzed to deliver any insights. The cost of storing, accessing and processing the raw data without any defined purpose was too high and could not be absorbed by the business. They already had a datawarehouse which was serving their analytical needs with a set of limitations- like inability to perform operational analytics and real time intelligence.

IGT proposed and implemented datalake solution. It provided them the ability to integrate and blend different types of data and different data types in a faster and efficient manner for their operational and real time intelligence needs. The data which was earlier in an unusable state and was never accessed and utilized became available to users at a minimal cost.

Datawarehouse was also migrated to a new scalable solution within the datalake. Some of data sources which existed in datalake now became source for datawarehouse as they became more readable and their usage was identified.

IGT’s datalake solution enabled unstructured data analytics for the users as they now had the capability of self-service BI, update the raw data in lower costs, analyze raw data to identify more useful business cases and then transforming the data as needed. The new system had capabilities to parse and provide complex analysis which was also not possible earlier

The solution also led to cost reduction in resources – resulting in reducing 40% hardware infrastructure and 50% performance improvement due to scalable nature of the architecture

After IGT’s datalake implementation, the customer is also introducing other business areas to be integrated in the same solution to provide more targeted content to business, marketing and sales. The data is now also being used for machine learning and predictive modeling for revenue and inventory management.