Our client is a leading technology-enabled life science services company that simplifies how patients get on and stay on drug therapies.
The company works with biopharmaceutical manufacturers to help commercialize and maximize the benefits of branded and specialty medications.They leverage industry-leading, tech-enabled services to accelerate speed to therapy throughout the medication journey. The unparalleled patient and provider network and advanced analytics power our ability to improve medication adherence and access.
After acquisition the company faced challenges with integrating data from various data sources. Due to the diversity of data, data analysis became difficult and time-consuming. As the company operates with personal, sensitive data, they faced the need to introduce a centralized repository of all business information that is needed for internal and external purposes.
As a solution to the problem we proposed to design and implement Data Warehouse aimed to serve as central repository for all integrated data sources. Since customer required to gather data from diverse data sources like flat files and different databases, we had to define the approach for data integration
We have implemented data extract, validation and integration in staging area (database), hosted in customer colocated environment. Sensitive data de-identification was implemented to meet customer data security requirements.
As the next step we have identified data sources load priority and designed the rules for rewriting data into enterprise data warehouse (EDW).
We have introduced audit functionality that ensures capturing matrix of data load status changes (Quantity, load duration etc ). We’ve also introduced a unique identifier for each load which ensures that data is connected to some identifier. This allows to clean data in case of load issues or failures caused by incorrect source data received or reload of data with updated data set.
We’ve chosen AWS Redshift as a main storage. And loaded data via s3 bucket by using delete-insert flow. Not to affect users, all of the synchronization is done during the nightly performance window.
The developed system performs a set of validation processes on connection, file and data field level to ensure effective exception handling and data consistency. In the scope of the project, we have ensured data exchange between state and three internal systems each with a different integration point and constraints.
We’ve implemented integrated star schema storage that contains structured data from diverse sources and provided an option of effortless data analysis in it. EDW Star schema allows the use of ad hoc reporting. And kept sensitive data in co-located Stage with corresponding mapping to hashed data in EDW. This allows to analyze data with keeping all metrics available.
Outcomes of Project & Success Metrics:
- Structured and up-to-date key data co-location in cloud EDW
- Separate cloud DB to load and store low value or inconsistent data for further review and analysis
- Ability to review data load progress and track the process
- PHI sensitive data de-identified