OFRA

​State ​agencies in Oregon, and often discrete programs within those agencies, collect and store data about the people they serve using systems and platforms that do not interact with one another. In 2005, and in response to this critical need to integrate agency program data, Oregon established the Oregon Department of Human Services (ODHS)/Oregon Heath Authority (OHA) Integrated Client Services Data Warehouse (ICS). For more information, please read our Memorandum of Understanding with ODHS and OHA​. ​​​

ICS is a small unit within the Office of Forecasting, Research and Analysis (OFRA)​ and is shared service between the ODHS and OHA. Our primary task is to connect individuals across multiple state programs and agencies. This individual level link allows us to a) support a group of forecasters/researchers who forecast ODHS and OHA caseloads used for program budgeting; and b) support statewide need for bespoke integrate data sets for ongoing statewide research projects, program improvement projects, and data requests requiring​ integrated data.
​Since ICS maintains an individual level link across multiple programs and agencies, we are often leveraged to create bespoke datasets used for research, program improvement, and data requests.

During our monthly ETL process, ICS primarily receives data elements necessary for linking individuals across systems (i.e., name, DOB, gender, race, ethnicity, etc.) while the majority of source level attribute data about individuals (i.e., services received, diagnoses, test scores, offenses, etc.) remain with the data owners at the programs and agencies.  A request to leverage ICS for the creation of an integrated data set generally follow the same set of steps.

Preparation
  1. Researcher or analyst (requestor) has a need for an integrated data set.
  2. Requestor reaches out to data sources (programs and/or agencies) to discover where the data reside and to learn more from the data owners (experts) about the needed variables.
  3. Requestor reaches out to ICS to work out details of how the requested data elements can be connected into an integrated data set.
    • ​This sometimes involves a meeting between requestors, ICS, and data sources to work out logistics of creating a dataset.
    • Data sources need to communicate limitations with requested data elements.
Data Use Requests
  1. Requestor officially requests data elements from sources; almost always requires requestor to complete data use agreements with each individual data source.
    • ICS requests copies of all approved data use agreements and project IRB where applicable.
  2. Requestor completes ICS Data Use Agreement, provided by ICS.
    • Complete page 2 and 4 with detailed information about project.
    • Sign as requestor and have all individuals who will have access to data set sign acknowledging they have read terms and conditions of agreement.
    • Return DUA to ICS.
    • ICS to obtain approval from all data sources contributing data to data set.
    • ICS to fully execute agreement and provide final copy of agreement to requestor.
Data Matching
  1. A match generally begins with a researcher wanting to connect additional information to a universe of individuals.
  2. ICS receives source identifiers for study universe
    • ​If individuals in study universe are not part of ICS's client index, full names and demographics of individuals are required. Please see ICS Record Linkage Standards​ for more information about ad hoc linkages.
  3. ICS creates a Study Identifier for everyone in the study universe.
  4. ICS uses its client index to connect source identifiers of data sources involved in project to everyone in the study universe.
  5. Each source requested to contribute data to the integrated data set receives a study crosswalk from ICS consisting of the Study Identifier and their Source Identifier.
  6. Each source uses their Source Identifier in the ICS provided crosswalk to attach the requested data elements.
  7. Each source removes the identifiable Source Identifier and sends de-i​dentified source dataset to requestor with the Study Identifier attached.
  8. Researcher receives source datasets and uses Study Identifier to connect datasets.
ICS brings in data monthly from nearly all ODHS and OHA programs, as well as a few Oregon agencies (see ICS Key Partners). Specifically, we extract, transfer and load (ETL) the necessary data elements needed to identify and link an individual across systems (i.e. names, demographics, etc.) as well as limited service level data from ODHS and OHA programs required by our forecasters/researchers. This data is processed into caseload where changes over time are forecasted and used in the calculations to create program budgets. Please see the OFRA​ webpage for more information about forecasting.

During our monthly client matching, ICS uses a rigoro​us combination of probabilistic, deterministic, and manual matching allowing us to create and maintain the best possible individual level link across programs and agencies. Every individual in ICS receives a unique ICS Master Identification number that allows us to identify​ an individual across data sources. ICS essentially maintains a master crosswalk for all its key partners and gives the state a better view of individuals utilizing multiple state programs and agencies.​
​ICS provides ad hoc record linkage (aka “matching", “fuzzy matching", “probabilistic matching") between person records when unique identifiers are not available. Names and dates of birth are not unique identifiers but can be used to link the records of the same person between data systems or data files. Record linkage of administrative data is rarely, if ever, 100% accurate, but some data practices can increase both the accuracy and completeness of record linkage. Where possible, ICS will report record linkage metrics (e.g. precision, recall, F-score) that can assist researchers in assessing whether a record linkage project is of sufficiently high accuracy to support research needs. Researchers should recognize that high rates of false matches, and/or missed matches (incomplete linkage) can result in incorrect or biased research conclusions. Researchers should also be aware of their responsibility to correctly interpret the possible effects of inadequate or incorrect record linkage.

Minimal fields for record linkage:
    • full first name
    • full last name
    • full middle name (middle initial, if available, in lieu of middle name when not available)
    • date of birth
    • gender/sex
    • Additional fields may increase the match rate, including social security number, county of residence, zip code, race/ethnicity. Consult with ICS to determine if any program-specific fields may help increase match rates or if specific fields are available. Record linkage projects will ideally include fields in addition to the minimal fields in order to improve linkage accuracy, especially when a very large number of records require linkage. 
  • Names: first, last, and middle names separated into columns.
    • Last names with multiple names should remain in the same column (e.g., hyphenated names, compound surnames, multiple surnames).
    • Non-alphanumeric characters (e.g. accent marks, symbols) should be removed (“cleaned") from the data whenever possible.
  • Dates: dates can be in any standard format, including separation of date components into different fields (i.e. day, month, year of birth in separate fields). The dates represented in any field should be consistent for the entire field (e.g. avoid mixing date formats: “12/25/2001", “2003-12-25").
  • Unique persons should be de-duplicated in program records: each individual person should occupy one row within a data set, not multiple rows. If programs cannot de-duplicate records, then be sure to let ICS know that records have not been de-duplicated.
Some issues to consider when determining whether to match two data sources, or an external data source to ICS.
  • The proportion of records with missing identifiers (e.g., % of records with missing last names).
  • How data sources have bee​n validated (e.g., are any variables estimated or leveraged from an unreliable source).
  • The variables available for linkage (e.g., data sources without a middle name or SSN, for instance, can lead to high rates of false matches).
  • Previous match metrics or evaluation of match outcomes.
  • How data owners and ICS want to handle false matches or low rates of overall matches.
  • How issues with linked data (e.g., bias, incomplete matches, low match rate) be communicated to data end-users

​​​

The Integrated Client Services unit (ICS), within the Office of Forecasting, Research, and Analysis, was established in 2005 to support data integration across OHA and ODHS programs (“the Agencies"). ICS has since grown to support integrated data efforts beyond OHA and ODHS to include external data (i.e., K-12 education (ODE), Department of Corrections (DOC), etc.). The integrated data that ICS maintains is leveraged widely for analysis and research, both within state agencies and other intuitions such as universities.

ICS widely shares state integrated data, both at the identified and deidentified level, which requires policies and processes for data governance that incorporate legal agreements for the protection of personally identifiable or sensitive data. The primary ICS governance structure was established through a memorandum of understanding (MOU​) between OHA and ODHS to set forth the respective commitments of the Agencies concerning network and information systems access, and information sharing between the Agencies. This MOU allows ICS to receive the necessary program data to both support statistical forecasting efforts and maintain an agency wide client index (“CI": a unique person-level identifier that can be applied to each client record across Agency programs, regardless of the type of unique identifier that programs use within their own data systems). Additional agreements are also maintained between entities that share data with ICS but do not fall under the OHA and ODHS MOU (i.e., Center for Health Statistics, Women Infants and Children program, Oregon Department of Education, Oregon Employment Department, etc.).

Concerning the information received and stored in ICS, a governance committee was established with members representing each of the data sources contributing data to ICS. Each member contributes to determining overall ICS data use and use of their own program data (stored in ICS).  

Additionally, when analysts or researchers request to leverage the ICS client index to create an integrated data product (i.e., a deidentified research data set), ICS will convene a data use committee comprised of representatives for the programs from which data are requested. Prior to ICS leveraging its CI to create an integrated data product, an official request must be made through the completion of the ICS Data Use Agreement (DUA) request form, which must be approved by all members of the data use committee. 

A request for a state integrated data set generally requires the requestor (analyst, university researcher, etc.) to complete data use agreements with each individual data source. ICS requests copies of all approved source data use agreements and project IRBs where applicable.


 Link to Governance Diagram

​​