Key Data Science

RSS
May
02

Lakes

I assume that everyone’s heard about Data Lake by now. Well implemented and managed can be a great addition to any organisation.

Let’s start with some advantages of the data lake:

  • Structured, semi-structured and unstructured data of any size stored in one place
  • Designed to be low cost
  • It is highly Agile
  • It allows faster data insights
  • With properly maintained central repository it allows finding the data needed faster
  • Can bring analytics to near real-time
  • It’s schema on read

On the major disadvantages is that your data lake can quickly become a data swamp. That’s why a central repository and periodical data cleaning is so important. By cleaning I not necessarily mean deleting anything although it would be the safest choice from the security perspective. Any data not touched for a year or more can for sure go to a secure and encrypted archive like Amazon Glacier.

The end users must be aware that the lake stores raw, often highly unstructured data. It’s a fantastic tool for Data Scientists and Data Analyst. However, for the business users, even if they are keen on doing analyses themselves, from my experience, most of them prefer more structured and easier to understand datasets.

Will Data Lake ever replace Data Warehouse?

Personally, I don’t think so. Both complement each other beautifully. The data lake can feed the data warehouse and at the same time be a playground for more data orientated people. The warehouse will ensure that the not so data orientated people can use tools like Tableau, PowerBI or QuickSight to digest the same data as well. And hopefully, it will ensure there are no arguments who’s data is correct. Everything comes from the same source.

So, use the best tool for the job and mix the tools, the old and the new to achieve better results.

*  There’s ‘lake’ in the title so I couldn’t resist. Here’s a photo of the Lake District – one of the most amazing and peaceful places I found in England.

Data Lake kk Comments Off on Lakes