ConnectiveRx is an innovative and leading technology-enabled life science services company that is dedicated to simplifying the process of getting patients on and keeping them on drug therapies. The company works closely with biopharmaceutical manufacturers to commercialize and optimize the benefits of branded and specialty medications.
One of their unique aspects is leveraging industry-leading, tech-enabled services to accelerate the speed at which patients receive therapy throughout the medication journey. They offer a wide range of services, from patient and provider networks to advanced analytics, that help to improve medication adherence and access.
When our team first began working with ConnectiveRx, we quickly identified a major challenge: the company had recently undergone an acquisition, resulting in a complex data landscape with data spread across multiple sources. This diversity of data made data analysis a time-consuming and difficult task.
Moreover, since the client deals with personal and sensitive data as part of their life science services business, they recognized the need for a centralized data repository to store all their business information. This data repository would serve as a single source of truth for both internal and external purposes, making it easier to manage and secure data.
To address the challenge of integrating data from multiple sources, we proposed designing and implementing a Data Warehouse that would serve as a central data repository for all integrated data sources. Since ConnectiveRx required gathering data from diverse sources like flat files and different databases, we had to define an approach for data integration using fuzzy matching.
To achieve this, we implemented data extract, validation, and integration in a staging area hosted in the client's colocated environment. We also implemented sensitive data de-identification framework to meet the client's data security requirements
Next, we identified the priority of data sources load and designed the rules for rewriting data into the enterprise data warehouse (EDW). We also introduced an audit functionality that captures a matrix of data load status changes, including quantity and load duration. We also introduced a unique identifier for each load, ensuring that data is connected to a specific identifier. This allows for easy data cleaning in case of load issues or failures caused by incorrect source data received or reload of data with an updated data set.
To store the data, we chose Amazon Redshift as the main storage solution and loaded data via an S3 bucket using a delete-insert flow. To minimize the impact on users, we performed all synchronization during the nightly performance window.
Overall, our solution allowed the client to consolidate data from multiple sources into a centralized repository, facilitating easy access to accurate and secure data. Our approach to data integration, along with the use of Amazon Redshift as the main storage solution, provided the client with a scalable and efficient solution that met their unique needs and requirements.
We implemented an integrated star schema storage solution that contains structured data from diverse sources, allowing for effortless data analysis. The EDW star schema supports ad hoc reporting while keeping sensitive data in the co-located stage with corresponding mapping to hashed data in EDW. This allows for data analysis while keeping all metrics available and maintaining the security of sensitive data.
The project outcomes and success metrics include:
Structured and up-to-date key data co-located in the cloud-based EDW
Separate cloud database to load and store low-value or inconsistent data for further review and analysis
Ability to review data load progress and track the process
PHI sensitive data de-identified
Our solution enabled the client to consolidate data from diverse sources, providing an easy-to-use platform for data analysis while maintaining data security. The EDW star schema design allowed for ad hoc reporting, giving the client flexibility in their data analysis. The ability to track data load progress and separate low-value data for review and analysis provided valuable insights and streamlined the client's data management processes. Overall, our solution delivered significant value to the client and improved their ability to manage data effectively.