Digitalization has become a very important transformation every organisation is having in their roadmap. Datalake forms a very important cog in this process. These ecosystems, need to be enable self-service; empower users to be agile to changes; ensure data security; provide capability to all and any data available within the organisation and help make data enabled decisions.
Datalake ecosystems consists of 7 key zones
Data import – landing place for all information which is external to the organisation
Datalake – landing place for all information which is internal to the organisation – most cases this is a replication of the operational systems
Data Lab – environment for the users where they create reports, build models for descriptive/predictive analytics or machine learning
BI Portal – environment where all the reporting enterprise wide is available
Model Store – container-based environment where models are executed
Data processing – a zone with compute power which helps transform data from data lake to either Data Lab, Model Store, BI Portal
Data Export – landing place for all information that is provided to external users
Each of these areas, hosts a variety of platforms to help users and the organizations in the quest for making decisions based on data.
Automation is in the DNA of this ecosystem and it is helped by CI/CD tools.
Data pipelines in the data processing zone are fully configurable