5. Qualifications from MSD data

cheryl.remington
30 October 2023

Output Table

SQL: [IDI_Community].cm_read_MSD_QUALS.msd_quals_YYYYMM
SAS: libname cm_msd_q ODBC dsn=idi_community_srvprd schema=cm_read_MSD_QUALS;
How to access a code module in the Data Lab : Read here

Context

This script uses the qualification data collected as part of MSD benefits to derive a non-MoE qualifications spells for individuals. The business key for the output spells is one record per individual(snz_uid) per qualification attained date per snz_unique_nbr.

The data is collected based on the highest qualifications self-reported by MSD’s Work & Income clients. Whenever a client reports a higher qualification than previously reported to MSD, the older record gets retired (with an end date as the reporting date) and a new record is started from that point onwards,thus providing a continuous time-series. However, there are a small number of records that have overlapping time periods with different qualifications, which can mostly be attributed to different SWN numbers even though the snz_uid is the same. An SWN number is an MSD identifier for individuals - when this MSD data gets loaded into the IDI, StatsNZ creates snz_msd_uid identifier from encrypted SWNs. StatsNZ then applies various matching algorithms to ascertain individuals by linking this with individual identifiers from other agencies and creates “snz_uid”. This linking and matching process may lead to multiple SWNs mapping to the same snz_uid (that is, same individuals have different SWNs from the perspective of the IDI).

Key Business Rules

  1. The spell structure assumes that a qualification is gained for life, so the start date of the qualification is the date it was self-reported (qual_recorded_date) and the qualification expiry date is assumed to be 31-Dec-9999. Note that the start date is not the qualification attained date - it is merely the date on which the qualification is reported to MSD. Hence this is an overestimate of the actual date on which the qualification was attained - and it is not clear if it can be reliably used as a qualification attained date at all. In the absence of any other data from Education sources, this might be an indicative but crude estimate, so the data can be considered left-censored if the qual_recorded_date is used as the qualification attainment date.
  2. The day on which a new, higher qualification date is self-reported to MSD by the individual is also retained in the output as “qual_highest_until_date”. This becomes the date as from which the current qual was the highest qualifcation (as known to MSD).
  3. The education code is a source-system specific code that represents the kind of qualification earned by the individual. A detailed description column also exists to interpret this code. Using these code values, MSD has identified a mapping to NZQF levels.
    • “NO FORMAL SCHOOL QUALS OR <3 YRS” (Code: A), “LESS THAN 3 SC PASSES OR EQUIV.” (Code: B), “(NCEA) : 1-79 CREDITS” (Code: I) map to NZQF “0”.
    • “3 OR MORE SC PASSES OR EQUIV.” (Code: C), “(NCEA) LEVEL 1: > = 80 CREDITS” (Code: J) map to NZQF “1”.
    • “SIXTH FORM CERT UE OR EQUIV.” (Code: D), “OTHER SCHOOL QUALS” (Code: F), “(NCEA) LEVEL 2: > = 80 CREDITS” (Code: K) map to NZQF “2”.
    • “SCHOLARSHIP BURSARY HS” (Code: E), “POST SECONDARY QUALS” (Code: L), “(NCEA) LEVEL 3: > = 80 CREDITS” (Code: G) map to NZQF “3”. These include University Scholarship, University Bursary, or Higher School Certificate.
    • “DEGREE OR PROFESSIONAL QUALS” (Code: H), “(NCEA) LEVEL 4: > = 72 CREDITS” (Code: M) map to NZQF “4”. Note this is a conservative estimate; Degrees usually map to NZQF level 7+, but professional qualifications range from 4 upwards. Because of this ambiguity, the NZQF is assigned as 4.

Note regarding Education Code-to NZQF Mapping

An example of the rationale behind MSD’s Education Code-to-NZQF mapping can be observed in the case of NCEA-related Education codes. At least 80 credits are required to achieve NCEA Level 1. Hence 1 - 79 credits are treated as ‘0’ on NZQF scale, and NCEA Level 1 >=80 credits map to NZQF ‘1’. Note that it is unclear if this mapping takes into account the need to pass both literacy and numeracy requirements under NCEA - for example, if a client has 85 credits but has not met NCEA numeracy requirement, then it is not clear if MSD frontiline staff maps this individual to Code ‘B’ or Code ‘C’- the former will correctly map to NZQF ‘0’ but the latter will incorrectly map to NZQF ‘1’.

The actual NCEA level 2 requirement is 60+ credits achieved at level 2 and 20+ further credits achieved at any level plus meeting the level 1 literacy and numeracy requirements being met. It is not clear if the MSD frontline takes these criteria into consideration while assigning Education Code ‘J’. The code description states 80+ credits, which might not exactly be the same as an NCEA Level 2 achievement. Hence there might be errors in the mapping here as well.

The rule for passing School Certificate (SC), which was the pre-NCEA standard for secondary qualification, changed over time. It started out as averaging 50%+, getting a ‘C’ grade or better in all subjects, and passing English and Mathematics. Over time, it changed to the 3+ Cs rule. Lastly, it was weakened to mean an overall score of 180 in the top four subjects, at school discretion.

Lastly, it is not clear what NCEA Level 4 refers to in the Education Code description - such an NCEA level is unknown.

Inputs

  1. IDI_UserCode: The SQL database on which the spell datasets are to be created.
  2. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  3. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets with the same name.

Outputs

Column name Description
snz_uid The unique StatsNZ person identifier for the student
data_source A tag signifying a source dataset description
msd_qual_code A source system code that identifies qualification levels self-reported by MSD clients - this is the highest qualification as on the recording day
msd_qual_description The narrative descriptions for the MSD qualification code
qual_recorded_date The date on which a specific qualification level was disclosed to MSD by the client
qual_expiry_date A pseudo-date when the qualification expires - this is hard-coded to 31 Dec 9999.
qual_highest_until_date The date on which a new, higher qualification was reported by the MSD client
nzqflevel A derived value signifying the academic level of the highest qualification as reported to MSD by the client. The qualification code is mapped to the NZ Qualifications Framework scale. The source of this equivalence logic is unclear. (Refer Business rules section)

Dependencies

[IDI_Clean_202310].msd_clean.msd_education_history

Version Log

Date Version Comments
12-Oct-22 Changes made after consultation with the Community of Interest (Marianna Pekar)
15-Sep-20 Initial version from Michele Morris’s code (Vinay Benny)

List of key contacts from the Community of Interest

Domain Agency Person
Business Rules Independent researcher Michele Morris
Lead SME MSD Marc de Boer
IDI Expert SWA Simon Anastasiadis
Data Supply Experts MoE/MSD Agathe Ponder-Sutton, Michael Smit, Marian Loader
Policy or operational SWA/MSD/MoE Andrew Weber, Samuel Mortlock, David Scott

Code


/* Establish database for writing views */
/*
:setvar targetdb "IDI_UserCode"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "[IDI_Clean_202310]"
*/

-- Example
-- SQLCMD mode

:setvar targetdb "IDI_UserCode"
:setvar targetschema "DL-MAA2020-47"
:setvar projprefix "tmp"
:setvar idicleanversion "IDI_Clean_202306"
GO


/* Assign the target database to which all the components need to be created in. */
USE $(targetdb);
GO

/* Delete the database object if it already exists */
DROP VIEW IF EXISTS [$(targetschema)].[$(projprefix)_msd_quals];
GO

/* Create the database object */
create view [$(targetschema)].[$(projprefix)_msd_quals] as

--<!

    select 
        b.snz_uid
        ,cast('MSD_EDU_HIST' as varchar(50)) as data_source
        ,b.msd_edh_education_code as msd_qual_code
        ,b.msd_edh_education_desc_text as msd_qual_description
        ,b.msd_edh_educ_lvl_start_date as qual_recorded_date
        ,datefromparts(9999, 12, 31) as qual_expiry_date
        /* <NOTE>: Is the below logic required at all? there isnt a single instance where msd_edh_educ_lvl_end_date is null or less than start date.
            Use startdate if enddate is missing or enddate is less than startdate */
        ,case 
            when msd_edh_educ_lvl_start_date > isnull(msd_edh_educ_lvl_end_date,msd_edh_educ_lvl_start_date) 
                then msd_edh_educ_lvl_start_date  
            else isnull(msd_edh_educ_lvl_end_date,msd_edh_educ_lvl_start_date) 
        end as qual_highest_until_date
        ,case   
            when msd_edh_education_code in ('A' ,'B' ,'I') then 0
            when msd_edh_education_code in ('C', 'J') then 1
            when msd_edh_education_code in ('D', 'F', 'K') then 2
            /* 'G' is post secondary school quals so it might be potentially higher than 4 on NQF level.*/
            when msd_edh_education_code in ('E', 'L','G') then 3
            /* H is degree or prof quals so could be potentially higher than 4 on NQF level.*/
            when msd_edh_education_code in ('H','M')   then 4 
            else null 
        end  as nzqflevel
        --,row_number() over(partition by 1 order by snz_uid) + 1000000 as provider_code /* generate pseudo provider code. each row is unique */   
    from [$(idicleanversion)].msd_clean.msd_education_history b
    /*where msd_edh_education_code in ('A','B','I','C','J','D','F','K','E','L','G','H','M')*/ /* keep only those with an education code (remove those that do not have an education code) */
;
GO

--!>