Outputs:
SQL: [IDI_Community].[edu_school_enrolment_status].[school_enrolment_status_202603]
SAS: libname cm_school_enrolment_status dsn=IDI_community_srvprd schema=edu_school_enrolment_status; run ;
How to access a code module in the Data Lab: Read here
Inputs:
| Dependency | Dependency Type |
|---|---|
[edu_school_enrolment_spells].[school_enrolment_spells] |
Code module |
[IDI_Clean_202603].[data].[personal_detail] |
Source |
[IDI_Clean_202603].[moe_clean].[provider_profile] |
Source |
[IDI_Clean_202603].[moe_clean].[school_student_attendance] |
Source |
[IDI_Clean_202603].[moe_clean].[student_enrol] |
Source |
[IDI_Metadata_202603].[moe_school].[rsn_for_leaving_code] |
Source |
Purpose
This code creates spells of school enrolment for akonga who have enrolled in school in NZ at some point. There are two distinct purposes for this data:
i) information about whether a student is enrolled in any school at a given time (enrolment status)
ii) (NOT INCLUDED - use $(idicleanversion).[moe_clean].[student_enrol] table) information about which schools a student is enroleld in at a given time (‘total’ or all enrolments)
iii) information about a which single school is the ‘main’ or ‘prioritised’ school at a given time if there are overlaps in the underlying data (unique school-student pairs)
Key Concepts
Comparison agiants other sources
At a given snapshot of time the enrolment spells can be used to re-create school roll counts. Snapshot roll counts are summarised in the tables [IDI_Clean].$(idicleanversion).[moe_clean].[school_roll_return]
at two different dates in each year: March and July.
Additionally, school enrolments can be validated against the school attendance table [IDI_Clean].$(idicleanversion).[moe_clean].[school_student_attendance].
References & Contacts
- James Swindells
- Cheryl Remington
- Angus Prain
Code module technical information
The following section provides detail on how moe_school_enrolments is built in the IDI.
Community of Interest
| Domain | Agency | Person |
|---|---|---|
| Lead SME | SIA | Andrew Webber |
| Lead SME | MOE | Peng Sun |
| Lead SME | MOE | Mercey Mhuru |
| Lead SME | MOE | Vahid Arabnejad |
| IDI analysts | Stats NZ | Angus Prain |
| Data Supply | MOE | ENROL |
| Data Supply | MOE | student_attendance |
| Policy/Operational Experts | MOE |
Module Business Rules
The script uses the following business rules to break overlapping spells, where possible:
0. short spells are removed
- Missing enddates are imputed to extraction date
- Same startdate, same enddate, same school, direct de-duplication
- Same startdate, same school - take longest spell
- Same enddate, same school - take longest spell
- Same startdate, same enddate - use atendance data
a. if either school has global null attendance* then no comparison, both spells left
b. if both schools have global non-null attendance*, take school with highest attendance count (null attendance is ranked lower) - Same startdate - use attendance data
a. if either school has global null attendance* then no comparison, both spells are left
b. if both schools have global non-null attendance*, take school with highest attendance count (null attendance is ranked lower)
c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping - Same enddate - use attendance data
a. if either school has global null attendance* then no comparison, both spells are left
b. if both schools have global non-null attendance, take school with highest attendance count (null attendance is ranked lower)
c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping - Overlapped spells with no matching dates - use attendance data
a. if either school has global null attendance* then no comparison, both spells are left
b. if both schools have global non-null attendance, take school with highest attendance count (null attendance is ranked lower)
c. snip spells to the overlap region and the overhanging bits, carrying out prioritsation only on the overlapping bit
d. if one of the spells happens to be an inner spell of the previous one, modify the rule to make it harder to prioritise it
-
Open Issues/Comments
-
Missing enrolment enddates
-
Poorly joined snz_uid
-
Logitudinal school profile metadata
-
No use has been made of the [tekura] tables for enrolments in Te Kura (Correspondence School)
-
When attendance ranking is tied, perhaps both schools should be kept
-
Attendance data is used in a critical way in this script, and therefore its quality and coverage plays a key role in the overall quality and coverage of thie module.
Attendance data coverage is currently as follows:- Term 2 data only from 2011 until 2018
- Terms 1 through to 4 from 2019 until term 3 2024
- Max attendance date is 2024-09-27
-
TODO: additional rows in #inner_spells_indicated fro somewhere… not sure!
-
TODO: implement a further refinement of the attendance comparison algorithm
- when comparing attendance, we keep both spells if one spell has no attendance records at all for any student in a given year (we call this “global null attendance”)
- the update would be to relax this when the total attendance days in the overlap in the other spell is “almost everyday” i.e. n_days_in_overlap >~ 5/7 * overlap length (weekdays, holidays etc need to be removed)
Parameters
The following parameters should be supplied to this module to run it in the database:
- {targetdb}: The SQL database on which the spell datasets are to be created.
- “{idicleanversion}”: The IDI Clean version that the spell datasets need to be based on.
- {targetschema}: The project schema under the target database into which the spell datasets are to be created.
- {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.
Variable Descriptions
[school_enrolment_status]
| Column name | Description |
|---|---|
[school_enrolment_enrolment_spells_clean]
| Column name | Description |
|---|---|
Module Version & Change History
| Date | Version | Comments |
|---|---|---|
| 27 May 2025 | Initial | Version based on specifications from Commissioning document. |
Code
:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
:setvar idimetaversion "{idimetaversion}"
:setvar max_attendance_date "{max_attendance_date}"
/*
:setvar targetdb "[IDI_Sandpit]"
:setvar targetschema "[DL-MAA2020-47]"
:setvar projprefix "aprain"
:setvar idicleanversion "[IDI_Clean_202503]"
:setvar idimetaversion "[IDI_Metadata_202503]"
:setvar max_attendance_date "2024-09-27"
*/
/*
:::::::::::::::::::::
0. GLOBAL SETUP
:::::::::::::::::::::
*/
DROP TABLE IF EXISTS #t1_complete_spells;
SELECT *
, ROW_NUMBER() OVER(PARTITION BY snz_uid ORDER BY startdate) AS rn
INTO #t1_complete_spells
FROM [$(targetdb)].[$(targetschema)].[$(projprefix)_school_enrolment_spells];
GO
PRINT('school_enrolment_spells TABLE imported to scope');
/*
BEGIN check
select * from #complete_spells order by snz_uid, startdate, enddate
END check
*/
/*
::::::::::::::::::::::::::::::::::::::::::
6. Generate the derived 'enrolment_status' table
::::::::::::::::::::::::::::::::::::::::::
This table is only enrolment statud, as per the commissioning document
*/
DROP TABLE IF EXISTS #clean_spells; ;
WITH long_dates AS(
SELECT snz_uid
, startdate AS date_value
, 'startdate' AS date_type
, CASE WHEN unique_spell_reason = 'enrolment_gap' THEN 'not_enrolled' ELSE 'enrolled' END AS spell_type
, rn
, sch_enddate_imputed
, spell_end_reason
, start_level
FROM #t1_complete_spells
UNION ALL
SELECT snz_uid
, enddate AS date_value
, 'enddate' AS date_type
, CASE WHEN unique_spell_reason = 'enrolment_gap' THEN 'not_enrolled' ELSE 'enrolled' END AS spell_type
, rn
, sch_enddate_imputed
, spell_end_reason
, start_level
FROM #t1_complete_spells
)
, pre_spells AS(
SELECT snz_uid
, date_value
, date_type
, spell_type
, rn
, start_level
, spell_end_reason
, sch_enddate_imputed
, MAX(rn) OVER(PARTITION BY snz_uid) AS max_rn
, CASE WHEN rn = MAX(rn) OVER(PARTITION BY snz_uid) AND sch_enddate_imputed <> 1
AND date_type = 'enddate' THEN 'END' WHEN LEAD(spell_type) OVER(PARTITION BY snz_uid ORDER BY rn, date_value) <> spell_type THEN 'END' WHEN rn = 1
AND date_type = 'startdate' THEN 'start' WHEN LAG(spell_type) OVER(PARTITION BY snz_uid ORDER BY rn, date_value) <> spell_type
AND rn <> 1 THEN 'start' WHEN sch_enddate_imputed = 1
AND date_type = 'enddate' THEN 'extraction_date' END AS boundary
FROM long_dates
)
, long_spells AS(
SELECT *
, CASE WHEN LEAD(rn) OVER(PARTITION BY snz_uid ORDER BY rn) >= rn THEN LEAD(date_value) OVER(PARTITION BY snz_uid ORDER BY rn) END AS enddate
, CASE WHEN LEAD(rn) OVER(PARTITION BY snz_uid ORDER BY rn) >= rn THEN LEAD(spell_end_reason) OVER(PARTITION BY snz_uid ORDER BY rn) END AS reason_for_leaving_next
, CASE WHEN LEAD(boundary) OVER(PARTITION BY snz_uid ORDER BY rn) = 'extraction_date' THEN 'extraction_date' END AS is_extraction_date
FROM pre_spells
WHERE boundary IS NOT NULL
)
SELECT a.snz_uid
, spell_type
, DATEDIFF(d, date_value, enddate) AS spell_length
, date_value AS startdate
, enddate
, start_level
, CASE WHEN is_extraction_date IS NOT NULL THEN 'Enddate Imputed'
WHEN spell_type = 'not_enrolled' THEN 'New Enrolment'
ELSE reason_for_leaving_next END AS spell_end_reason
INTO #clean_spells
FROM long_spells a
WHERE date_type = 'startdate';
GO
PRINT('#clean_spells TABLE generated');
/* adding age at start and end of spell */
DROP TABLE IF EXISTS #clean_spells_with_age;
SELECT a.*
, DATEDIFF(YEAR, snz_birth_date_proxy, startdate) AS age_at_startdate
, DATEDIFF(YEAR, snz_birth_date_proxy, enddate) AS age_at_enddate
INTO #clean_spells_with_age
FROM #clean_spells a
JOIN $(idicleanversion).[data].[personal_detail] b
ON a.snz_uid = b.snz_uid ;
GO
PRINT('#clean_spells_with_age TABLE generated');
DROP TABLE IF EXISTS [$(targetdb)].[$(targetschema)].[$(projprefix)_school_enrolment_status];
SELECT *
INTO [$(targetdb)].[$(targetschema)].[$(projprefix)_school_enrolment_status]
FROM #clean_spells_with_age;