School Enrolment Status

grace.fox
27 June 2025

Outputs:

SQL: [IDI_Community].[edu_school_enrolment_status].[school_enrolment_status_202603]
SAS: libname cm_school_enrolment_status dsn=IDI_community_srvprd schema=edu_school_enrolment_status; run ;
How to access a code module in the Data Lab: Read here

Inputs:

Dependency Dependency Type
[edu_school_enrolment_spells].[school_enrolment_spells] Code module
[IDI_Clean_202603].[data].[personal_detail] Source
[IDI_Clean_202603].[moe_clean].[provider_profile] Source
[IDI_Clean_202603].[moe_clean].[school_student_attendance] Source
[IDI_Clean_202603].[moe_clean].[student_enrol] Source
[IDI_Metadata_202603].[moe_school].[rsn_for_leaving_code] Source

Purpose

This code creates spells of school enrolment for akonga who have enrolled in school in NZ at some point. There are two distinct purposes for this data:
i) information about whether a student is enrolled in any school at a given time (enrolment status)
ii) (NOT INCLUDED - use $(idicleanversion).[moe_clean].[student_enrol] table) information about which schools a student is enroleld in at a given time (‘total’ or all enrolments)
iii) information about a which single school is the ‘main’ or ‘prioritised’ school at a given time if there are overlaps in the underlying data (unique school-student pairs)

Key Concepts

Comparison agiants other sources

At a given snapshot of time the enrolment spells can be used to re-create school roll counts. Snapshot roll counts are summarised in the tables [IDI_Clean].$(idicleanversion).[moe_clean].[school_roll_return]
at two different dates in each year: March and July.
Additionally, school enrolments can be validated against the school attendance table [IDI_Clean].$(idicleanversion).[moe_clean].[school_student_attendance].

References & Contacts

  1. James Swindells
  2. Cheryl Remington
  3. Angus Prain

Code module technical information

The following section provides detail on how moe_school_enrolments is built in the IDI.

Community of Interest

Domain Agency Person
Lead SME SIA Andrew Webber
Lead SME MOE Peng Sun
Lead SME MOE Mercey Mhuru
Lead SME MOE Vahid Arabnejad
IDI analysts Stats NZ Angus Prain
Data Supply MOE ENROL
Data Supply MOE student_attendance
Policy/Operational Experts MOE

Module Business Rules

The script uses the following business rules to break overlapping spells, where possible:
0. short spells are removed

  1. Missing enddates are imputed to extraction date
  2. Same startdate, same enddate, same school, direct de-duplication
  3. Same startdate, same school - take longest spell
  4. Same enddate, same school - take longest spell
  5. Same startdate, same enddate - use atendance data
    a. if either school has global null attendance* then no comparison, both spells left
    b. if both schools have global non-null attendance*, take school with highest attendance count (null attendance is ranked lower)
  6. Same startdate - use attendance data
    a. if either school has global null attendance* then no comparison, both spells are left
    b. if both schools have global non-null attendance*, take school with highest attendance count (null attendance is ranked lower)
    c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping
  7. Same enddate - use attendance data
    a. if either school has global null attendance* then no comparison, both spells are left
    b. if both schools have global non-null attendance, take school with highest attendance count (null attendance is ranked lower)
    c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping
  8. Overlapped spells with no matching dates - use attendance data
    a. if either school has global null attendance* then no comparison, both spells are left
    b. if both schools have global non-null attendance, take school with highest attendance count (null attendance is ranked lower)
    c. snip spells to the overlap region and the overhanging bits, carrying out prioritsation only on the overlapping bit
    d. if one of the spells happens to be an inner spell of the previous one, modify the rule to make it harder to prioritise it
    -

Open Issues/Comments

  1. Missing enrolment enddates

  2. Poorly joined snz_uid

  3. Logitudinal school profile metadata

  4. No use has been made of the [tekura] tables for enrolments in Te Kura (Correspondence School)

  5. When attendance ranking is tied, perhaps both schools should be kept

  6. Attendance data is used in a critical way in this script, and therefore its quality and coverage plays a key role in the overall quality and coverage of thie module.
    Attendance data coverage is currently as follows:

    • Term 2 data only from 2011 until 2018
    • Terms 1 through to 4 from 2019 until term 3 2024
    • Max attendance date is 2024-09-27
  7. TODO: additional rows in #inner_spells_indicated fro somewhere… not sure!

  8. TODO: implement a further refinement of the attendance comparison algorithm

    • when comparing attendance, we keep both spells if one spell has no attendance records at all for any student in a given year (we call this “global null attendance”)
    • the update would be to relax this when the total attendance days in the overlap in the other spell is “almost everyday” i.e. n_days_in_overlap >~ 5/7 * overlap length (weekdays, holidays etc need to be removed)

Parameters

The following parameters should be supplied to this module to run it in the database:

  1. {targetdb}: The SQL database on which the spell datasets are to be created.
  2. “{idicleanversion}”: The IDI Clean version that the spell datasets need to be based on.
  3. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  4. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.

Variable Descriptions

[school_enrolment_status]

Column name Description

[school_enrolment_enrolment_spells_clean]

Column name Description

Module Version & Change History

Date Version Comments
27 May 2025 Initial Version based on specifications from Commissioning document.

Code

 :setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
:setvar idimetaversion "{idimetaversion}"
:setvar max_attendance_date "{max_attendance_date}"

/*
:setvar targetdb "[IDI_Sandpit]"
:setvar targetschema "[DL-MAA2020-47]"
:setvar projprefix "aprain"
:setvar idicleanversion "[IDI_Clean_202503]"
:setvar idimetaversion "[IDI_Metadata_202503]"
:setvar max_attendance_date "2024-09-27"
*/

/*
	:::::::::::::::::::::
	0. GLOBAL SETUP
	:::::::::::::::::::::
*/

DROP TABLE IF EXISTS #t1_complete_spells;
SELECT *
    , ROW_NUMBER() OVER(PARTITION BY snz_uid ORDER BY startdate) AS rn
INTO #t1_complete_spells
FROM [$(targetdb)].[$(targetschema)].[$(projprefix)_school_enrolment_spells];

GO
 PRINT('school_enrolment_spells TABLE imported to scope');

/*
	BEGIN check

		select * from #complete_spells order by snz_uid, startdate, enddate

	END check
*/

/*

	::::::::::::::::::::::::::::::::::::::::::
	6. Generate the derived 'enrolment_status' table
	::::::::::::::::::::::::::::::::::::::::::
	This table is only enrolment statud, as per the commissioning document

*/

DROP TABLE IF EXISTS #clean_spells; ;
WITH long_dates AS(
    SELECT snz_uid
        , startdate AS date_value
        , 'startdate' AS date_type
        , CASE WHEN unique_spell_reason = 'enrolment_gap' THEN 'not_enrolled' ELSE 'enrolled' END AS spell_type
        , rn
        , sch_enddate_imputed
        , spell_end_reason
        , start_level
    FROM #t1_complete_spells 

    UNION ALL 

    SELECT snz_uid
        , enddate AS date_value
        , 'enddate' AS date_type
        , CASE WHEN unique_spell_reason = 'enrolment_gap' THEN 'not_enrolled' ELSE 'enrolled' END AS spell_type
        , rn
        , sch_enddate_imputed
        , spell_end_reason
        , start_level
    FROM #t1_complete_spells
)
    , pre_spells AS(
    SELECT snz_uid
        , date_value
        , date_type
        , spell_type
        , rn
        , start_level
        , spell_end_reason
        , sch_enddate_imputed
        , MAX(rn) OVER(PARTITION BY snz_uid) AS max_rn
        , CASE WHEN rn = MAX(rn) OVER(PARTITION BY snz_uid) AND sch_enddate_imputed <> 1
    AND date_type = 'enddate' THEN 'END' WHEN LEAD(spell_type) OVER(PARTITION BY snz_uid ORDER BY rn, date_value) <> spell_type THEN 'END' WHEN rn = 1
    AND date_type = 'startdate' THEN 'start' WHEN LAG(spell_type) OVER(PARTITION BY snz_uid ORDER BY rn, date_value) <> spell_type
    AND rn <> 1 THEN 'start' WHEN sch_enddate_imputed = 1
    AND date_type = 'enddate' THEN 'extraction_date' END AS boundary
    FROM long_dates
)
    , long_spells AS(
    SELECT *
        , CASE WHEN LEAD(rn) OVER(PARTITION BY snz_uid ORDER BY rn) >= rn THEN LEAD(date_value) OVER(PARTITION BY snz_uid ORDER BY rn) END AS enddate
        , CASE WHEN LEAD(rn) OVER(PARTITION BY snz_uid ORDER BY rn) >= rn THEN LEAD(spell_end_reason) OVER(PARTITION BY snz_uid ORDER BY rn) END AS reason_for_leaving_next
        , CASE WHEN LEAD(boundary) OVER(PARTITION BY snz_uid ORDER BY rn) = 'extraction_date' THEN 'extraction_date' END AS is_extraction_date
    FROM pre_spells
    WHERE boundary IS NOT NULL
)
SELECT a.snz_uid
    , spell_type
    , DATEDIFF(d, date_value, enddate) AS spell_length
    , date_value AS startdate
    , enddate
    , start_level
    , CASE WHEN is_extraction_date IS NOT NULL THEN 'Enddate Imputed' 
	WHEN spell_type = 'not_enrolled' THEN 'New Enrolment'
	ELSE reason_for_leaving_next END AS spell_end_reason
INTO #clean_spells
FROM long_spells a
WHERE date_type = 'startdate'; 

GO
 PRINT('#clean_spells TABLE generated'); 

/* adding age at start and end of spell */
DROP TABLE IF EXISTS #clean_spells_with_age;
SELECT a.*
    , DATEDIFF(YEAR, snz_birth_date_proxy, startdate) AS age_at_startdate
    , DATEDIFF(YEAR, snz_birth_date_proxy, enddate) AS age_at_enddate
INTO #clean_spells_with_age
FROM #clean_spells a
JOIN $(idicleanversion).[data].[personal_detail] b
ON a.snz_uid = b.snz_uid ;

GO
 PRINT('#clean_spells_with_age TABLE generated'); 

DROP TABLE IF EXISTS [$(targetdb)].[$(targetschema)].[$(projprefix)_school_enrolment_status];

SELECT *

INTO [$(targetdb)].[$(targetschema)].[$(projprefix)_school_enrolment_status]

FROM #clean_spells_with_age;