Back Issues

How To Modernize Data Lakes With A Data Lakehouse Architecture

IBM News, Wednesday, July 5,2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Some argue though that the vast majority of these deployments have now become data 'swamps'. Regardless of which side of this controversy you sit in, reality is that there is still a lot of data held in these systems. Such data volumes are not easy to move, migrate or modernize.

Data lakes are, at a high level, single repositories of data at scale. Data may be stored in its raw original form or optimized into a different format suitable for consumption by specialized engines.

In the case of Hadoop, one of the more popular data lakes, the promise of implementing such a repository using open-source software and having it all run on commodity hardware meant you could store a lot of data on these systems at a very low cost. Data could be persisted in open data formats, democratizing its consumption, as well as replicated automatically which helped you sustain high availability. The default processing framework offered the ability to recover from failures mid-flight. This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale.

more → · More from IBM →