School attendance - Half-day granularity

cheryl.remington
24 March 2024

Outputs:

SQL: [IDI_Community].[edu_sch_att_gran].[sch_att_gran_202603]
SAS: libname cm_sch_att_gran dsn=IDI_community_srvprd schema=edu_sch_att_gran; run ;
How to access a code module in the Data Lab: Read here

Inputs:

Dependency Dependency Type
[IDI_Clean_202603].[moe_clean].[provider_profile] Source
[IDI_Clean_202603].[moe_clean].[school_roll_return] Source
[IDI_Clean_202603].[moe_clean].[school_student_attendance] Source
[IDI_Clean_202603].[security].[concordance] Source
[IDI_Metadata].[clean_read_CLASSIFICATIONS].[moe_attendance_codes] Source
[IDI_Metadata_202603].[moe_school].[sch_funding_type_code] Source
[IDI_Metadata_202603].[moe_school].[student_type_code] Source
[IDI_Metadata_202603].[moe_school].[ter_funding_type_code] Source

NOTE THAT:

Context:

Note that the where clause in the “where SchoolAttendanceCode is not NULL” limits the data to just school attendance.

Data quality (source data)

Scope and coverage (source data)

The following tables are created:

Granular

Summaries

Key Business Rules

  1. The lockdowns that occurred when covid was spreading quickly would have a big impact on attendance. So, as recommended by MoE we exclude any observations that look like they occur when a school is in lockdown. The attendance code = ‘F’ when the school is lockdown (note that this attendance code is also used sporadically on other days). To reliably identify when a school or region (TLA) is in lockdown, one of these two criteria need to be met:
  1. For reproducing the official numbers we only include state schools (which are the vast majority of schools) and state integrated schools. This is done using the variable provider_funding_type. However, for completeness we leave in all schools here.

  2. Similarly the offical numbers only include the following student types:

  1. Entities have been included in the three output tables:
  1. The code limits the output to 2016 onwards. However, the data goes back to 2011 (almost entirely term 2).

Inputs:

Parameters:

  1. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  2. {targetprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.
  3. {idicleanversion}: The version of IDI_clean to use as the inputs
  4. {idimetaversion}: The version of the metadata that corresponds to the input data
  5. {targetdb}: The database to run your table to.

Note that we expect the source dataset to move from the IDI_Adhoc library to IDI_clean

Version Log:

Date Version Comments
29 Nov 2022 Initial version (Todd Nicholson)
16 Sep 2023 Updated to allow the easy exclusion of different types of students (e.g. exchange students) and different types of schools (e.g. private schools) (Todd Nicholson)
18 Nov 2023 Updated to rename the output and adjust the inputs, which have now moved to the IDI_Clean area with new variable names (Todd Nicholson)
10 Mar 2024 Updated to add entities to the output tables, rename tables to be consistent with other modules, add in 2011 and 2012 rolls to allow the code to work all the way back to the start of the attendance data, add in comment about pre-2016 data. (Todd Nicholson)
22 Sep 2024 Updated the way summarisation works for the table #moe_sch_att_by_day in response to feedback from community
31 Mar 2025 Updated the module into three scripts a Tier 1, Tier 2, and Tier 3 script to be compliant with the modular coding standard.
10 Mar 2026 Updated the _gran table with the addition of a new column halfday_key varchar(14) to deduplicate rows that originate from a repeated letter fulldaycode code e.g. ‘UU’

Community of Interest:

Person/Artifact/Agency Involvement
Cheryl Remington Steward
Alan Bristow SME (at MoE)
Simon Anastasiadis IDI expert
Marc De Boer IDI expert
Andrew Webber SME
Todd Nicholson Module Coder
/* PART 0: Parameters */

/*PARAMETERS
SQLCMD only (Activate by clicking Query->SQLCMD Mode)
Already in master.sql; Uncomment when running individually 
*/
 :setvar targetprefix "{targetprefix}" 
:setvar idicleanversion "{idicleanversion}"
:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar idimetaversion "{idimetaversion}"

/* Part 1: Identify covid lockdown regions and times */
/*Start with categorising all the days into lockdown (F) or not and summarising by school.*/

DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_sch_pre;

SELECT moe_ssa_provider_code
    , moe_ssa_attendance_date
    , CAST(AVG(lockdown) AS float) AS school_lockdown
    , COUNT(moe_ssa_attendance_date) AS school_count
INTO #moe_sch_att_ld_hunt_sch_pre
FROM (
    SELECT a.moe_ssa_attendance_date
        , CAST(LEFT(a.moe_ssa_schl_attendance_code, 2) AS VARCHAR(2)) AS moe_ssa_schl_attendance_code
        , CAST(LEFT(a.moe_ssa_provider_code, 5) AS VARCHAR(5)) AS moe_ssa_provider_code
        , CAST(LEFT(a.moe_ssa_duration, 5) AS VARCHAR(5)) AS moe_ssa_duration
        , CASE WHEN a.moe_ssa_schl_attendance_code = 'F' THEN 1.0 ELSE 0.0 END lockdown
    FROM [$(idicleanversion)].[moe_clean].[school_student_attendance] a
    WHERE moe_ssa_schl_attendance_code IS NOT NULL
) AS c
GROUP BY moe_ssa_provider_code
    , moe_ssa_attendance_date
ORDER BY moe_ssa_provider_code
    , moe_ssa_attendance_date ;

DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_sch;

SELECT b.moe_pp_provider_ta_auth_code
    , a.moe_ssa_provider_code
    , a.moe_ssa_attendance_date
    , a.school_lockdown
    , a.school_count
INTO #moe_sch_att_ld_hunt_sch
FROM #moe_sch_att_ld_hunt_sch_pre AS a
LEFT JOIN [$(idicleanversion)].[moe_clean].[provider_profile] b
ON a.moe_ssa_provider_code = b.moe_pp_provider_code ; 

/* ANGUS UPDATE: New optimisation step that helps with the transaction logs */
DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_sch_pre;

/* Now categorising all the days into lockdown (F) or not and summarising by TLA.*/
DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_tla;

SELECT moe_pp_provider_ta_auth_code
    , moe_ssa_attendance_date
    , CAST(AVG(lockdown) AS float) AS tla_lockdown
    , COUNT(moe_ssa_attendance_date) AS tla_count
INTO #moe_sch_att_ld_hunt_tla
FROM (
    SELECT moe_ssa_attendance_date AS moe_ssa_attendance_date
        , CAST(LEFT(a.moe_ssa_schl_attendance_code, 2) AS VARCHAR(2)) AS moe_ssa_schl_attendance_code
        , CAST(LEFT(a.moe_ssa_provider_code, 5) AS VARCHAR(5)) AS moe_ssa_provider_code
        , CAST(LEFT(a.moe_ssa_duration, 5) AS VARCHAR(5)) AS moe_ssa_duration
        , b.moe_pp_provider_ta_auth_code
        , CASE WHEN moe_ssa_schl_attendance_code = 'F' THEN 1.0 ELSE 0.0 END lockdown

    FROM [$(idicleanversion)].[moe_clean].[school_student_attendance] a
    LEFT JOIN [$(idicleanversion)].[moe_clean].[provider_profile] b
    ON a.moe_ssa_provider_code = b.moe_pp_provider_code
    WHERE moe_ssa_schl_attendance_code IS NOT NULL
) AS c
GROUP BY moe_pp_provider_ta_auth_code
    , moe_ssa_attendance_date
ORDER BY moe_pp_provider_ta_auth_code
    , moe_ssa_attendance_date ;

/* Put it together and summarise into a binary indicator for lockdown by school and date*/
/* The thresholds here are based on analysis of the distributions during known lockdown periods 
	and known non-lockdown periods.  */
DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_sum;

SELECT a.moe_pp_provider_ta_auth_code
    , a.moe_ssa_provider_code
    , a.moe_ssa_attendance_date
    , a.school_lockdown
    , a.school_count
    , b.tla_lockdown
    , b.tla_count
    , CASE
	 	WHEN a.moe_ssa_attendance_date > '2020-03-24'
AND b.TLA_lockdown > 0.35 THEN 1 WHEN a.moe_ssa_attendance_date > '2020-03-24'
AND a.school_lockdown > 0.7 THEN 1 ELSE 0 END lockdown
INTO #moe_sch_att_ld_hunt_sum
FROM #moe_sch_att_ld_hunt_sch AS a
LEFT JOIN #moe_sch_att_ld_hunt_tla AS b
ON a.moe_pp_provider_ta_auth_code = b.moe_pp_provider_ta_auth_code
AND a.moe_ssa_attendance_date = b.moe_ssa_attendance_date ;

/* ANGUS UPDATE : optimisation */
DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_sch;
DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_tla;

/* using the calculated last, first attendance of the previous record created in the query before, create a new last attendance date. this is to fill in the end dates
of each student type for a student and provider (as end dates only exist generally for the last collection for that student at the particular provider). if there are 
no last attendance dates for a student and provider, assume the student is still attending at the provider and give a default end date of 31-12-9999. */

DROP TABLE IF EXISTS #sr_holding;

SELECT snz_moe_uid
    , moe_srr_provider_code
    , moe_srr_student_type_code
    , MIN([moe_srr_first_attendance_date]) AS FirstAttendance
    , COALESCE(MAX([moe_srr_last_attendance_date]),'2099-01-01') AS max_lastattendance
INTO #sr_holding
FROM [$(idicleanversion)].[moe_clean].[school_roll_return]
GROUP BY snz_moe_uid
    , moe_srr_provider_code
    , moe_srr_student_type_code ;

DROP TABLE IF EXISTS #sr_holding2;

SELECT snz_moe_uid
    , moe_srr_provider_code
    , moe_srr_student_type_code
    , FirstAttendance
    , max_lastattendance
    , LAG(FirstAttendance, 1, '2030-01-01') OVER (PARTITION BY snz_moe_uid ORDER BY snz_moe_uid, 
	FirstAttendance DESC) AS prev_FirstAttendance
    , LAG(max_lastattendance, 1, '2000-01-01') OVER (PARTITION BY snz_moe_uid ORDER BY snz_moe_uid, FirstAttendance DESC) AS prev_max_lastattendance
INTO #sr_holding2
FROM #sr_holding ;

/* ANGUS UPDATE - optimisation */
DROP TABLE IF EXISTS #sr_holding;

DROP TABLE IF EXISTS #sr_holding3;

SELECT snz_moe_uid
    , moe_srr_provider_code
    , moe_srr_student_type_code
    , FirstAttendance
    , CASE 
		WHEN max_lastattendance < prev_FirstAttendance THEN max_lastattendance WHEN max_lastattendance = prev_FirstAttendance THEN DATEADD(DAY,-1,max_lastattendance) WHEN max_lastattendance > prev_FirstAttendance THEN DATEADD(DAY,-1,prev_FirstAttendance) END AS LastAttendance
    , max_lastattendance
    , prev_FirstAttendance
    , prev_max_lastattendance
INTO #sr_holding3
FROM #sr_holding2 ;

/* ANGUS UPDATE - optimisation */
DROP TABLE IF EXISTS #sr_holding2;

DROP TABLE IF EXISTS #moe_sch_att_roll_stud_type;

SELECT a.snz_moe_uid
    , a.moe_srr_student_type_code AS student_type_code
    , b.StudentTypeName AS student_type_desc
    , a.moe_srr_provider_code AS providernumber
    , a.FirstAttendance AS schoolroll_date_start
    , a.LastAttendance AS schoolroll_date_end
INTO #moe_sch_att_roll_stud_type
FROM #sr_holding3 AS a
LEFT JOIN [$(idimetaversion)].[moe_school].[student_type_code] AS b
ON a.moe_srr_student_type_code = b.studenttypecode ;

/*Combine the two funding tables just in case the tertiary one is required*/

DROP TABLE IF EXISTS #funding_type_code;

SELECT *
INTO #funding_type_code
FROM (
    

    SELECT SchoolFundingTypeID AS ProviderFundingTypeID
        , SchoolFundingTypeDescription AS ProviderFundingType
    FROM [$(idimetaversion)].[moe_school].[sch_funding_type_code]
    UNION ALL
    SELECT [TertiaryFundingTypeID] AS ProviderFundingTypeID
        , TertiaryFundingTypeDescription AS ProviderFundingType
    FROM [$(idimetaversion)].[moe_school].[ter_funding_type_code]
) AS a ;

/* Part 2: Join attendance data and then exclude days when a school was in lockdown */

/*Bring in the attendance data and link it to school dates, attendance codes and provider information */
DROP TABLE IF EXISTS #moe_sch_att_daily_attend;

/*Create intermediate table of school attendance: daily attendance by person, provider, attendance code and concordance*/
/*~1 hour*/
SELECT concord.snz_uid
    , a.snz_moe_uid
    , a.moe_ssa_attendance_date AS moe_ssa_attendance_date
    , a.[moe_ssa_term_yr] AS [YEAR]
    , a.[moe_ssa_term_code] AS term
    , CAST(LEFT(a.moe_ssa_provider_code, 5) AS VARCHAR(5)) AS moe_ssa_provider_code
    , CAST(LEFT(a.moe_ssa_schl_attendance_code, 2) AS VARCHAR(2)) AS moe_ssa_schl_attendance_code
    , CAST(LEFT(c.reporting_category, 2) AS VARCHAR(2)) AS reporting_category
    , a.moe_ssa_duration
    , CASE 
		WHEN ptype.ProviderFundingTypeID IN (46000,46001,46002,46004,46005,46006,46012,46016,46022) THEN 'State School' 
		ELSE CAST(LEFT(ptype.ProviderFundingType, 60) AS VARCHAR(60))
	END AS provider_funding_type
    , COALESCE(student_type_code, 'DEFAULT RE') AS student_type_code
    , CAST(LEFT(COALESCE(student_type_desc, 'DEFAULT Regular student'), 60) AS VARCHAR(60)) AS student_type_desc
INTO #moe_sch_att_daily_attend
FROM [$(idicleanversion)].[moe_clean].[school_student_attendance] a
LEFT JOIN [IDI_Metadata].[clean_read_CLASSIFICATIONS].[moe_attendance_codes] c
ON a.moe_ssa_schl_attendance_code = c.school_code
LEFT JOIN [$(idicleanversion)].[security].[concordance] concord
ON a.snz_moe_uid = concord.snz_moe_uid
LEFT JOIN [$(idicleanversion)].[moe_clean].[provider_profile] pp
ON a.moe_ssa_provider_code = pp.moe_pp_provider_code
LEFT JOIN #funding_type_code ptype
ON pp.moe_pp_sch_fund_type_code = ptype.ProviderFundingTypeID
LEFT JOIN #moe_sch_att_roll_stud_type st
ON a.snz_moe_uid = st.snz_moe_uid
AND a.moe_ssa_provider_code = st.providernumber
AND moe_ssa_attendance_date BETWEEN schoolroll_date_start AND schoolroll_date_end
WHERE a.moe_ssa_schl_attendance_code IS NOT NULL
AND [moe_ssa_term_yr] >= 2016
AND c.reporting_category IS NOT NULL ;

/* ANGUS UPDATE : optimisation */
DROP TABLE IF EXISTS #moe_sch_att_roll_stud_type;
DROP TABLE IF EXISTS #funding_type_code;

 ALTER TABLE #moe_sch_att_daily_attend REBUILD PARTITION = ALL WITH (data_compression = page);

/* Exclude days when a school is likely to be in lockdown ~ 1.25hrs */
DROP TABLE IF EXISTS #moe_sch_att_daily_wo_lock;

SELECT snz_uid
    , snz_moe_uid
    , a.moe_ssa_attendance_date
    , YEAR
    , term
    , a.moe_ssa_provider_code
    , moe_ssa_schl_attendance_code
    , reporting_category
    , moe_ssa_duration
    , provider_funding_type
    , student_type_code
    , student_type_desc
INTO #moe_sch_att_daily_wo_lock
FROM #moe_sch_att_daily_attend AS a
LEFT JOIN #moe_sch_att_ld_hunt_sum AS b
ON a.moe_ssa_provider_code = b.moe_ssa_provider_code
AND a.moe_ssa_attendance_date = b.moe_ssa_attendance_date
WHERE lockdown ! = 1 ;

/* ANGUS UPDATE: optimisation */
DROP TABLE IF EXISTS #moe_sch_att_ld_hunt_sum;
DROP TABLE IF EXISTS #moe_sch_att_daily_attend;

 ALTER TABLE #moe_sch_att_daily_wo_lock REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE) ;

/* Part 3: Summarise the 26 codes into 4 main ones and turn time periods reported into half days */
/*School attendance by day: this table provides a breakdown of halfday codes by person, day, term and school.  ~ 1.42hrs */

DROP TABLE IF EXISTS #moe_sch_att_by_day;

WITH school_attendance_aggregated AS(
    SELECT snz_uid
        , snz_moe_uid
        , YEAR
        , term
        , moe_ssa_provider_code
        , moe_ssa_attendance_date
        , provider_funding_type
        , student_type_code
        , student_type_desc
        , SUM(time_p) AS time_p
        , SUM(time_j) AS time_j
        , SUM(time_u) AS time_u
        , SUM(time_x) AS time_x
        , SUM(tot_dur) AS tot_dur
    FROM (
        SELECT snz_uid
            , snz_moe_uid
            , YEAR
            , term
            , moe_ssa_provider_code
            , moe_ssa_attendance_date
            , provider_funding_type
            , student_type_code
            , student_type_desc
            , CASE WHEN reporting_category = 'P' THEN SUM(moe_ssa_duration) END AS time_p
            , CASE WHEN reporting_category = 'J' THEN SUM(moe_ssa_duration) END AS time_j
            , CASE WHEN reporting_category = 'U' THEN SUM(moe_ssa_duration) END AS time_u
            , CASE WHEN reporting_category = 'X' THEN SUM(moe_ssa_duration) END AS time_x
            , SUM(moe_ssa_duration) AS tot_dur
        FROM #moe_sch_att_daily_wo_lock
        GROUP BY snz_uid
            , snz_moe_uid
            , YEAR
            , term
            , moe_ssa_attendance_date
            , moe_ssa_provider_code
            , reporting_category
            , provider_funding_type
            , student_type_code
            , student_type_desc
    ) AS a
    GROUP BY snz_uid
        , snz_moe_uid
        , YEAR
        , term
        , moe_ssa_attendance_date
        , moe_ssa_provider_code
        , provider_funding_type
        , student_type_code
        , student_type_desc
)

SELECT snz_uid
    , snz_moe_uid
    , YEAR
    , term
    , moe_ssa_attendance_date
    , moe_ssa_provider_code
    , provider_funding_type
    , student_type_code
    , student_type_desc
    , CASE
		WHEN tot_dur < 120 THEN 'None'
		WHEN tot_dur < 240 THEN
			CASE
				WHEN time_p >= 120 THEN 'P'
				WHEN time_u >    0 THEN 'U'
				WHEN time_x >= 120 THEN 'X'
				ELSE 'J'
			END WHEN time_p >= 240 THEN 'PP' WHEN time_p >= 120 THEN CASE
				WHEN time_u >	 0 THEN 'PU'
				WHEN time_x >= 120 THEN 'PX'
				ELSE 'PJ'
			END WHEN time_u > 0 THEN CASE
				WHEN time_p + time_j + time_x >= 120 THEN
					CASE
						WHEN time_x >= 120 THEN 'UX'
						ELSE 'JU'
					END ELSE 'UU' END WHEN time_x >= 240 THEN 'XX' WHEN time_x >= 120 THEN 'JX' ELSE 'JJ' END AS fulldaycode
INTO #moe_sch_att_by_day
FROM school_attendance_aggregated ;

/* UPDATE ANGUS : optimisation */
DROP TABLE IF EXISTS #moe_sch_att_daily_wo_lock;

 ALTER TABLE #moe_sch_att_by_day REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);

/* Expand the days so that the proportion calculation will work.  Double any day with two halfdays ~ 3.16hrs*/
DROP TABLE IF EXISTS #moe_sch_att_gran;

SELECT *
INTO #moe_sch_att_gran
FROM (
    SELECT *
        , SUBSTRING(fulldaycode,1,1) AS halfdaycode
        , CAST(moe_ssa_attendance_date AS VARCHAR(12))+ '-a' AS halfday_key
    FROM #moe_sch_att_by_day
    WHERE fulldaycode ! = 'None'
    UNION ALL
    SELECT *
        , SUBSTRING(fulldaycode,2,1) AS halfdaycode
        , CAST(moe_ssa_attendance_date AS VARCHAR(12)) + '-b' AS halfday_key
    FROM #moe_sch_att_by_day
    WHERE fulldaycode IN ('PP','PU','PX','PJ','UX','JU','UU','XX','JX','JJ')
) AS a ;  ALTER TABLE #moe_sch_att_gran REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);

 /* UPDATE ANGUS : optimisation */
DROP TABLE IF EXISTS #moe_sch_att_by_day;

DROP TABLE IF EXISTS [$(targetdb)].[$(targetschema)].[$(projprefix)_sch_att_gran];

SELECT *

INTO [$(targetdb)].[$(targetschema)].[$(projprefix)_sch_att_gran]

FROM #moe_sch_att_gran
 ;