Highest Qualifications - from Census data
Output Table
SQL: [IDI_Community].[edu_quals_census_2013].quals_census_2013_YYYYMM
SAS: libname cm_c13_q ODBC dsn=idi_community_srvprd schema=edu_quals_census_2013;
How to access a code module in the Data Lab: Read here
Context
This module creates the highest qualification for individuals as observed in the census data from 2013, 2018 and 2023 in a spell-based format. The business key is one record per individual.
Key Business Rules
- The census datasets do not state when the qualification was attained. They only provide the date it was self-reported and recorded.
- The data sources do not exhaustively list all qualifications for an individual. They provide the highest qualification attained as on census date. The census is a good source of qualification data for individuals who attained their qualification from institutions outside New Zealand and for older individuals who gained qualifications before the early 2000s.
- The date on which the qualification is recorded is assumed to be the respective census nights: for Census 2013, this is 05 March 2013, for Census 2018, this is 06 March 2018, and for Census 2023 this is 07 March 2023. Qualifications are assumed to never expire, and the pseudo-expiry date is therefore set to 31 -Dec 9999. For post-school qualifications, the subject area of the qualification is also provided. There are several records where the Census census response might be “Not Stated” or “Response Unidentifiable” - these are not currently filtered out currently.
- The metadata tables for the codes used in each census may be different.
- Census 2018 suffered from various population coverage problems, which led to several variables being imputed from other administrative sources. This spell dataset filters out such imputed values for qualifications and only retains the values directly measured from the population.
- While NZQF (NZ Qualifications Framework) levels are not directly provided, the “cen_ind_std_highest_qual_code” column in the census is closely based on the NZQF level codes, except for values “11” Overseas Secondary Qual, “97” Response Unidentifiable, “99”- Not Stated. Overseas-based secondary school qualifications do not directly have an NZQF level directly assigned- hence so we assign a conservative NZQF estimate of NZQF level 1to these records a conservative NZQF estimate of NZQF level 1. This might be an underestimate for the actual qualification level. In cases where the response is unidentifiable or unstated, these are mapped to a NULL NZQF level.
- A ““Level-10 ID” column has been created as an output column for this dataset - this is an approximation of the NZQF Qualifications Framework levels (NZQF), and are is used internally by the Ministry of Education (MoE). The rationale of this use is not clear, but the column has been added here nevertheless. This business logic that maps the “cen_ind_std_highest_qual_code” to derive the Level 10 ID is as follows, but the origin of this equivalence is also uncertain. Hence(Note: there is no 7 on this mapping.)
- Levels “01” and Overseas Secondary school quals (“11”) are mapped to 1 on the Level 10 scale
- Levels “02” to “04” are mapped to Level 10 scale 2 to 4 respectively.
- Levels “05” and “06” are mapped to 5 on the Level 10 scale
- Levels “07” is mapped to 6 on the Level 10 scale
- Levels “08” to “10” are mapped to Level 10 scale 8 to 10 respectively. Hence there is no 7 on this mapping.
- Levels “00” is mapped to 0 on the Level 10 scale
- Other levels are mapped to NULL on the Level 10 scale
- The scope of Census census data covers only individuals aged 15 and above.
Inputs
- IDI_UserCode: The SQL database on which the spell datasets are to be created.
- {targetschema}: The project schema under the target database into which the spell datasets are to be created.
- {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have with the same name.
Outputs
| Column name | Description |
|---|---|
| snz_uid | The unique STATSNZ person identifier for the student |
| data_source | A tag signifying a source dataset description |
| highest_qual_code | A code representing the highest qualification obtained by the individual as on Census night |
| highest_qual_text | The description of the highest qualification obtained by the individual as on Census night |
| subject_area_code | A code representing the general subject area of the highest qualification |
| subject_area_text | The description of the general subject area of the highest qualification |
| qual_recorded_date | The census night on which the data is recorded. |
| qual_expiry_date | A pseudo-date when the attained qualification expires - this is hard-coded to 31 Dec 9999. |
| high_qual_nqf | A derived value that is equivalent to NZ Qualifications Framework level for the qualification - represents the academic level of the highest qualification. This is a 10-point scale, and higher values represent more advanced qualifications. (Refer Business rules section) |
Dependencies
[IDI_Clean_YYYYMM].[cen_clean].census_individual_2013
[IDI_metadata].[clean_read_CLASSIFICATIONS].cen_stdhq06
[IDI_Metadata].[clean_read_CLASSIFICATIONS].[CEN_QFIELD]
[IDI_Clean_YYYYMM].[cen_clean].census_individual_2018
[IDI_Metadata].[clean_read_CLASSIFICATIONS].[CEN_QFIELD2_V3.0]
[IDI_Clean_YYYYMM].[cen_clean].[census_individual_2023]
Version Log
| Date | Version Comments |
|---|---|
| 13-Mar-25 | Added in Census 2023 (Todd Nicholson) |
| 12-Oct-22 | Changes made after consultation with the Community of Interest (Marianna Pekar) |
| 15-Sep-20 | Initial version from Michele Morris’s code (Vinay Benny) |
List of key contacts from the Community of Interest
| Domain | Agency | Person |
|---|---|---|
| Business Rules | Independent researcher | Michele Morris |
| Lead SME | MSD | Marc de Boer |
| IDI Expert | SWA | Simon Anastasiadis |
| Data Supply Experts | MoE/MSD | Agathe Ponder-Sutton, Michael Smit, Marian Loader |
| Policy or operational | SWA/MSD/MoE | Andrew Weber, Samuel Mortlock, David Scott |
Code
/* Establish database for writing views */
:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
/*
:setvar targetdb "IDI_UserCode"
:setvar targetschema "DL-MAA2020-47"
:setvar projprefix "mdb"
:setvar idicleanversion "IDI_Clean_202410"
*/
/* Assign the target database to which all the components need to be created in. */
USE $(targetdb);
GO
DROP VIEW IF EXISTS [$(targetschema)].[$(projprefix)_cen2013_quals];
GO
/* Create the database object */
CREATE VIEW [$(targetschema)].[$(projprefix)_cen2013_quals] AS
SELECT
b.snz_uid
,cast('CENSUS_2013' as varchar(50)) as data_source
/*Census highest qualification is obtained for people aged 15 years and over. This census variable combines highest
secondary school qualification and post-school qualification to obtain a single highest qualification by category
of attainment.
*/
,b.cen_ind_std_highest_qual_code as highest_qual_code
,c.descriptor_text as highest_qual_text
,b.cen_ind_post_scl_subjt_code as subject_area_code
,subj.descriptor_text as subject_area_text
/* 05 March 2013 was Census night for 2013, hence this is assumed to be the date the qualification was recorded.*/
,datefromparts(2013, 3, 5) as qual_recorded_date
,datefromparts(9999, 12, 31) as qual_expiry_date
,case
when b.cen_ind_std_highest_qual_code ='00' then 0
when b.cen_ind_std_highest_qual_code ='01' then 1
when b.cen_ind_std_highest_qual_code ='02' then 2
when b.cen_ind_std_highest_qual_code ='03' then 3
when b.cen_ind_std_highest_qual_code ='04' then 4
when b.cen_ind_std_highest_qual_code ='05' then 5
when b.cen_ind_std_highest_qual_code ='06' then 6
when b.cen_ind_std_highest_qual_code ='07' then 7
when b.cen_ind_std_highest_qual_code ='08' then 8
when b.cen_ind_std_highest_qual_code ='09' then 9
when b.cen_ind_std_highest_qual_code ='10' then 10
/*Code value: 11 is for overseas secondary school qualification. This has been tentatively assigned to
NQF level 1, which may be underestimating highest qual. Change this if you want a different assumption. */
when b.cen_ind_std_highest_qual_code ='11' then 1
/* Code values: 97- Response Unidentifiable, 99-Not Stated.*/
when b.cen_ind_std_highest_qual_code in ('97','99') then null
else null
end as high_qual_nqf
from [$(idicleanversion)].[cen_clean].[census_individual_2013] b
left join [IDI_Metadata].[clean_read_CLASSIFICATIONS].[CEN_STDHQ06] c
on (b.cen_ind_std_highest_qual_code = c.cat_code)
left join [IDI_Metadata].[clean_read_CLASSIFICATIONS].[CEN_QFIELD] subj
on (b.cen_ind_post_scl_subjt_code = subj.cat_code)
where cen_ind_indiv_substitute_code = 0 ;
/*Ensure that the census response is from Non-Dummy Individual Form. */