Outputs:
SQL: [IDI_Community].[edu_quals_msd].[quals_msd_202603]
SAS: libname cm_quals_msd dsn=IDI_community_srvprd schema=edu_quals_msd; run ;
How to access a code module in the Data Lab: Read here
Inputs:
| Dependency | Dependency Type |
|---|---|
[IDI_Clean_202603].[msd_clean].[msd_education_history] |
Source |
Output Table
SQL: [IDI_Community].edu_quals_msd.quals_msd_YYYYMM
SAS: libname cm_msd_q ODBC dsn=IDI_community_srvprd schema=edu_quals_msd;
How to access a code module in the Data Lab : Read here
Context
This script uses the qualification data collected as part of MSD benefits to derive a non-MoE qualifications spells for individuals. The business key for the output spells is one record per individual(snz_uid) per qualification attained date per snz_unique_nbr.
The data is collected based on the highest qualifications self-reported by MSD’s Work & Income clients. Whenever a client reports a higher qualification than previously reported to MSD, the older record gets retired (with an end date as the reporting date) and a new record is started from that point onwards,thus providing a continuous time-series. However, there are a small number of records that have overlapping time periods with different qualifications, which can mostly be attributed to different SWN numbers even though the snz_uid is the same. An SWN number is an MSD identifier for individuals - when this MSD data gets loaded into the IDI, StatsNZ creates snz_msd_uid identifier from encrypted SWNs. StatsNZ then applies various matching algorithms to ascertain individuals by linking this with individual identifiers from other agencies and creates “snz_uid”. This linking and matching process may lead to multiple SWNs mapping to the same snz_uid (that is, same individuals have different SWNs from the perspective of the IDI).
Key Business Rules
- The spell structure assumes that a qualification is gained for life, so the start date of the qualification is the date it was self-reported (qual_recorded_date) and the qualification expiry date is assumed to be 31-Dec-9999. Note that the start date is not the qualification attained date - it is merely the date on which the qualification is reported to MSD. Hence this is an overestimate of the actual date on which the qualification was attained - and it is not clear if it can be reliably used as a qualification attained date at all. In the absence of any other data from Education sources, this might be an indicative but crude estimate, so the data can be considered left-censored if the qual_recorded_date is used as the qualification attainment date.
- The day on which a new, higher qualification date is self-reported to MSD by the individual is also retained in the output as “qual_highest_until_date”. This becomes the date as from which the current qual was the highest qualifcation (as known to MSD).
- The education code is a source-system specific code that represents the kind of qualification earned by the individual. A detailed description column also exists to interpret this code. Using these code values, MSD has identified a mapping to NZQF levels.
- “NO FORMAL SCHOOL QUALS OR <3 YRS” (Code: A), “LESS THAN 3 SC PASSES OR EQUIV.” (Code: B), “(NCEA) : 1-79 CREDITS” (Code: I) map to NZQF “0”.
- “3 OR MORE SC PASSES OR EQUIV.” (Code: C), “(NCEA) LEVEL 1: > = 80 CREDITS” (Code: J) map to NZQF “1”.
- “SIXTH FORM CERT UE OR EQUIV.” (Code: D), “OTHER SCHOOL QUALS” (Code: F), “(NCEA) LEVEL 2: > = 80 CREDITS” (Code: K) map to NZQF “2”.
- “SCHOLARSHIP BURSARY HS” (Code: E), “POST SECONDARY QUALS” (Code: L), “(NCEA) LEVEL 3: > = 80 CREDITS” (Code: G) map to NZQF “3”. These include University Scholarship, University Bursary, or Higher School Certificate.
- “DEGREE OR PROFESSIONAL QUALS” (Code: H), “(NCEA) LEVEL 4: > = 72 CREDITS” (Code: M) map to NZQF “4”. Note this is a conservative estimate; Degrees usually map to NZQF level 7+, but professional qualifications range from 4 upwards. Because of this ambiguity, the NZQF is assigned as 4.
Note regarding Education Code-to NZQF Mapping
An example of the rationale behind MSD’s Education Code-to-NZQF mapping can be observed in the case of NCEA-related Education codes. At least 80 credits are required to achieve NCEA Level 1. Hence 1 - 79 credits are treated as ‘0’ on NZQF scale, and NCEA Level 1 >=80 credits map to NZQF ‘1’. Note that it is unclear if this mapping takes into account the need to pass both literacy and numeracy requirements under NCEA - for example, if a client has 85 credits but has not met NCEA numeracy requirement, then it is not clear if MSD frontiline staff maps this individual to Code ‘B’ or Code ‘C’- the former will correctly map to NZQF ‘0’ but the latter will incorrectly map to NZQF ‘1’.
The actual NCEA level 2 requirement is 60+ credits achieved at level 2 and 20+ further credits achieved at any level plus meeting the level 1 literacy and numeracy requirements being met. It is not clear if the MSD frontline takes these criteria into consideration while assigning Education Code ‘J’. The code description states 80+ credits, which might not exactly be the same as an NCEA Level 2 achievement. Hence there might be errors in the mapping here as well.
The rule for passing School Certificate (SC), which was the pre-NCEA standard for secondary qualification, changed over time. It started out as averaging 50%+, getting a ‘C’ grade or better in all subjects, and passing English and Mathematics. Over time, it changed to the 3+ Cs rule. Lastly, it was weakened to mean an overall score of 180 in the top four subjects, at school discretion.
Lastly, it is not clear what NCEA Level 4 refers to in the Education Code description - such an NCEA level is unknown.
Inputs
- {targetdb}: The SQL database on which the spell datasets are to be created.
- {targetschema}: The project schema under the target database into which the spell datasets are to be created.
- {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets with the same name.
Version Log
| Date | Version Comments |
|---|---|
| 12-Oct-22 | Changes made after consultation with the Community of Interest (Marianna Pekar) |
| 15-Sep-20 | Initial version from Michele Morris’s code (Vinay Benny) |
List of key contacts from the Community of Interest
| Domain | Agency | Person |
|---|---|---|
| Business Rules | Independent researcher | Michele Morris |
| Lead SME | MSD | Marc de Boer |
| IDI Expert | SWA | Simon Anastasiadis |
| Data Supply Experts | MoE/MSD | Agathe Ponder-Sutton, Michael Smit, Marian Loader |
| Policy or operational | SWA/MSD/MoE | Andrew Weber, Samuel Mortlock, David Scott |
Code
/* Establish database for writing views */
:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
/* Assign the target database to which all the components need to be created in. */
USE $(targetdb);
GO
DROP VIEW IF EXISTS [$(targetschema)].[$(projprefix)_quals_msd];
GO
CREATE VIEW [$(targetschema)].[$(projprefix)_quals_msd] AS
SELECT b.snz_uid
, CAST('MSD_EDU_HIST' AS VARCHAR(50)) AS data_source
, b.msd_edh_education_code AS msd_qual_code
, b.msd_edh_education_desc_text AS msd_qual_description
, b.msd_edh_educ_lvl_start_date AS qual_recorded_date
, DATEFROMPARTS(9999, 12, 31) AS qual_expiry_date
/* NOTE: Is the below logic required at all? there isnt a single instance where msd_edh_educ_lvl_end_date is null or less than start date.
Use startdate if enddate is missing or enddate is less than startdate */
, CASE
WHEN msd_edh_educ_lvl_start_date > ISNULL(msd_edh_educ_lvl_end_date,msd_edh_educ_lvl_start_date)
THEN msd_edh_educ_lvl_start_date
ELSE ISNULL(msd_edh_educ_lvl_end_date,msd_edh_educ_lvl_start_date)
END AS qual_highest_until_date
, CASE
WHEN msd_edh_education_code IN ('A' ,'B' ,'I') THEN 0
WHEN msd_edh_education_code IN ('C', 'J') THEN 1
WHEN msd_edh_education_code IN ('D', 'F', 'K') THEN 2
WHEN msd_edh_education_code IN ('E', 'L','G') THEN 3 /* 'G' is post secondary school quals so it might be potentially higher than 4 on NQF level.*/
WHEN msd_edh_education_code IN ('H','M') THEN 4 /* H is degree or prof quals so could be potentially higher than 4 on NQF level.*/
ELSE NULL
END AS nzqflevel
, ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY snz_uid) + 1000000 AS provider_code /* generate pseudo provider code. each row is unique */
FROM [$(idicleanversion)].msd_clean.msd_education_history b
/*where msd_edh_education_code in ('A','B','I','C','J','D','F','K','E','L','G','H','M')*/ /* keep only those with an education code (remove those that do not have an education code) */
;
GO