Mergers, acquisitions and consolidations regularly introduce new challenges for healthcare managers and administrators - a pattern we're likely to see increase in our present economic climate. For example, merging and consolidating hospitals often use different information systems and data vocabularies for key markers such as providers, allergies, and drug formularies. As a result, many medical centers must shoehorn differently organized data into a common warehouse, as well as combine dissimilar information systems into coherent, manageable wholes. We, at Penn Medicine, were one of them.
In the past year, following a series of network expansions and additions, we faced the need to assimilate separate information systems. Our first step was to determine what we called our clinical data aggregation (CDA) score (see page 50). A CDA score helps establish baseline knowledge across the health system as a first step toward harmonizing disparate data pools. CDA score assessments can also yield information to help correct source-system errors, maintain vocabulary server mappings, and monitor workflow process changes to ensure data quality and completeness.
The Penn Medicine experience
At Penn Medicine, we embarked on the creation of an aggregated data warehouse - we called it Penn Data Store - in 2007. Our goal? A single data repository with all patient, administrative, financial, and supply chain data mapped to a standard data model and vocabulary. We created a number of objectives in support of the goal in areas such as research, finance and patient care.
At the outset, we focused on clinical data sources because of their wide applicability and key to improving patient outcomes. For example, knowing the average length of stay for a given diagnosis is necessary for reducing it. However, determining why lengths of stay are high requires data on such measures as drug administration timing, infections and patient falls. Such data is generally only available in clinical systems or paper charts.
Based on our experience, we came up with five questions to consider when developing or refining your own plans for a unified clinical data warehouse, questions that can be clarified through an assessment of your CDA score.
1. Which patient data are housed on common electronic systems by function across entities?
Most major medical centers have many parts: hospitals, specialized centers, physician practices, home health agencies, etc. A given patient may have critical data stored in four or five places on four or five systems. Getting convenient access to this data is crucial. The obvious solution is common EMRs across all entities, yet even that is not enough. Data models, field values, and patient identification techniques also need to be the same even if a common EMR is in use. Your CDA score can identify gaps in these areas.
2. Which administrative and clinical vocabularies are consistent across systems and entities?
While data can be mapped from source-system values to data-warehouse values, this process slows down the ability to provide near real-time data in the warehouse and requires constant vigilance over the mapping process to ensure consiste ncy. Switching context between the warehouse and real-time systems also creates room for error and misinterpretation. This section of your CDA score can help direct efforts in rectifying divergence in your administrative and clinical vocabularies.
Clinical Data Aggregation (CDA) scoring tool | Average Score | 0% |
Dimension | Significance | Percentage Calculated Score |
Percentage of Patients on Electronic Common Clinical Systems by Function across entities | 0% | |
Patient Registration | 1 | 0% |
Patient Scheduling | 2 | 0% |
Inpatient CPOE | 1 | 0% |
Inpatient EMR | 1 | 0% |
Outpatient EMR | 1 | 0% |
LAB | 2 | 0% |
Radiology | 2 | 0% |
Pharmacy | 1 | 0% |
Emergency Room | 2 | 0% |
Operating Room | 3 | 0% |
Anesthesia | 3 | 0% |
Infection Control | 3 | 0% |
Patient tracking | 3 | 0% |
Percentage of consistent vocabularies in administrative and clinical systems | 0% | |
Race | 1 | 0% |
Religion | 3 | 0% |
Gender | 1 | 0% |
Marital Status | 2 | 0% |
Patient Type | 1 | 0% |
Insurance Plans | 3 | 0% |
Subscriber relation to patient | 3 | 0% |
Providers | 1 | 0% |
Locations | 1 | 0% |
Specialties | 1 | 0% |
Orderable procedures | 1 | 0% |
Orderable drugs | 1 | 0% |
Diagnosis codes | 1 | 0% |
Procedure codes | 1 | 0% |
Percentage of source system values that are mapped to industry standard values using a Vocabulary Server to correct inconsistencies across entities and systems in current data AND historical data? | 1 | 0% |
Percentage of source systems that are included in a data governance structure that assigns and organizes Data Stewards for each source system across all entities? | 1 | 0% |
Percentage of governance duties that are owned by the data stewards? | 0% | |
Source system error correction | 2 | 0% |
Maintenance of Vocabulary Server mappings | 1 | 0% |
Workflow process changes to ensure data quality and completeness | 2 | 0% |
Percentage of the target data model for the clinical repository that adheres to national standards | 2 | 0% |
Percentage of patients that have a single Universal Identifier (UI) number assigned by an Electronic Master Patient Index (EMPI) to ensure each human being has only one unique identifying number and that all previously assigned numbers are mapped to the UI?' | 1 | 0% |
Percentage of patient, provider, and family member data that can be quickly de-identified | 2 | 0% |
Percentage of patient data in clinical trial databases that uses the same EMPI number and technology to identify patients? | 2 | 0% |
What percentage of the source data can be extracted from clinical systems into a Clinical Data Warehouse on a monthly or more frequent basis? | 0% | |
Monthly | 1 | 0% |
Weekly | 2 | 0% |
Daily | 3 | 0% |
3. What percentage of source system values is mapped to industry standard values?
Many medical centers collaborate with other research institutions. To build effective partnerships, data in the common warehouse should be mapped to national standards - not simply to common internal standards. Since consensus on which ones to use is often lacking, it's prudent to map to multiple standards.
Standards for vocabularies such as race, religion, gender, and marital status should be the initial focus, since they are easier to address than vocabularies for lab codes, drug codes, and allergies. To develop mapping for these more difficult areas, third party vocabulary server software and initial and ongoing mapping services should be considered. Subscribing to an ongoing mapping service will automate the capacity to handle new terms and concepts as they are developed in the normal course of healthcare advancement.
4. What percentage of data in clinical applications and clinical trial databases uses the same tracking numbers and technology to identify patients?
Achieving consistent patient management and tracking is one of the largest challenges to attaining full clinical data aggregation. Your CDA score can help by identifying key steps that may need to be taken. The first is to put into effect central patient identification software and services to ensure that each human (and in many research settings, each animal) in the clinical or research system is assigned a Universal Identification (UI) number. The second step is more complex, and involves the review and re-identification of potentially millions of patient records to ensure that only one UI exists for each patient and that all existing numbers are mapped to this UI. There are vendors that can assist with this phase of duplicate identification. Once the duplicate sets have been determined, the clean up phase can begin. Unfortunately, not all applications enable identifier-merge transactions. In addition, since “unmerging” an incorrect merge is more costly and labor intensive than manually verifying mergers, staff members may prefer the manual option.
5. What percentage of the source data can be frequently extracted from clinical systems into a clinical data warehouse for integration and analysis?
After decades of requests from users, most vendors of the core applications (EMR, pharmacy, radiology, etc.) provide data extraction tools for the purpose of integration and analysis. However, many departmental or niche application vendors are new to the scene or too small to invest in these add-on features. If you have successfully established a connection to the needed data, the next challenge will be establishing a schedule for extracts.
This can be complicated by technical and political considerations. The technical challenges center around the impact to production systems that large extract queries may create, the availability of full backups that can be restored to a staging area, or the fact that the application may be remotely hosted. The political challenges will focus on the “owners” of the application and their willingness to share “their” data with others outside their department. Often an unwilling department head can be swayed by providing access to the data warehouse that will now include their data integrated with many others.
Building a powerful clinical data warehouse is arguably a much more challenging task than warehouses focused on financial and administrative data. In fact, we found that there are few, if any, “off the shelf” vendor products available that provide the advanced data management and analytic capabilities clinical data requires. Faced with the reality that academic medical centers will need to develop one from scratch, CIOs should invest the time to ascertain their likelihood of success up front. Conducting a detailed assessment of the suitability, availability and applicability of myriad clinical data sources can be completed using this clinical data aggregation scoring tool. And the CDA score can serve as a high level gap analysis to determine organization-specific needs in areas such as staffing, toolsets, infrastructure and organizational and process change.
Having a predictor of the likelihood of success in advance may eliminate one of the items that keeps today's CIO up at night.