Our client is a global alliance of healthcare and technological companies built around the idea of digitizing different kinds of health data and utilizing AI to derive insights from that data.
The long-term goal of our client is to build a number of data-driven healthcare products, including web and mobile applications, medical devices and customer-oriented healthcare gadgets.
The client had a tremendous amount of data which was accumulated during a long period of time and was not concentrated in one location. The data was not utilized for business insights or decision-making purposes. . This created a need to build centralized data repositories to support data exploration and analytics workload, as well as data sources for other repositories. The data had to be cataloged and governed, while its quality had to be manageable.
Data Lake (DL) and Data Warehouse (DWH) were built as a pair of company-wide reference data repositories. DL used as a centralized storage of non-structured and semi-structured information, as well as the storage for “raw” structured data from individual products. DWH, in opposite, used as a repository of well-prepared, trusted datasets for data analysis and self-service analytics.
From data governance perspective, we have made a company wide assessment using Stanford Maturity model approach. Basing on results, we have implemented “Data governance layer”- set of practices and processes, which allowed customer to catalogue existing data across DL. Establish such important processes like metadata management, data audit, data lineage and data quality.