School attendance

cheryl.remington
24 March 2024

Output Tables

SQL:
[IDI_Community].cm_read_MOE_SCH_ATT_YEAR.sch_att_year_YYYYMM
[IDI_Community].cm_read_MOE_SCH_ATT_GRAN.sch_att_gran_YYYYMM
[IDI_Community].cm_read_MOE_SCH_ATT_TERM.sch_att_term_YYYYMM
SAS:
How to access a code module in the Data Lab : Read here
NOTE THAT EACH OF THE QUERIES FROM PART 2 ONWARDS TAKES ABOUT 100 MINUTES TO RUN

Context

This script creates daily and term-level attendance education spells derived from regular clean data loads that the Ministry of Education have supplied. This program only looks at attendance for schools.

Attendance is compulsory at school for all students aged from 6 to 16 years of age. Participation at school could be considered a broad indicator of engagement and absence could be an indicator of barriers (relating to economic, health, engagement or wellbeing factors) for learners or their whanau.

As part of the Education and Training Act 2020 and supported by School Attendance Rules 2025, schools are required to record the attendance of their students at least twice daily. This information is recorded in their student management systems (SMS), using a standard specification provided by the Ministry. The information in the Student Attendance table is based on files that schools submit to the Ministry, which are automatically generated by the SMS.

Note that the where clause in the “where SchoolAttendanceCode is not NULL” limits the data to just school attendance.

Data quality (source data)

The school attendance collection is monitored for data quality and has some data cleaning processes applied by the Ministry of Education. This takes the form of business rules that flag implausible or incomplete scenarios, such as missing student identifiers, dates outside of the expected range, or durations that are greater than an expected school day. Generally, these are rectified by either flagging the business rule violation at the point of entry or discarding offending records while processing.

With the introduction of Daily Attendance data collection from the beginning of term 1 in 2025, the business rules that check a school’s file as it is submitted have been enhanced to ensure that only those student records with the identified issue are removed.

Due to the extremely large amount of records in this data, these records are not otherwise closely monitored, verified or audited. The Ministry cannot guarantee that any quality issues that are picked up are fixed at source. For this reason, they recommended that we perform basic data cleaning steps, such as identifying and removing outlier learners with very large recorded durations. A limit of 450 minutes has been set, so any duration longer than this showing in the earlier data would be considered an error.

A student’s attendance at school is recorded by allocating the appropriate school attendance code. Prior to 2025 there were 26 different attendance codes. From 2025, these have been refreshed and consolidated to 15 attendance codes. While some guidelines are provided by the Ministry of Education, schools each have their own attendance policies and exercise their own judgment as to how any particular attendance/absence should be recorded. This means inevitably there will be some grey areas that are recorded differently in different schools, particularly with respect to distinguishing between justified and unjustified absences (as well as post-COVID use of the “F” code for learning off-site).

To align with the Ministry of Education definition for counting student attendance, only students with student type codes of RE (regular student), EX (exchange student), FF (Foreign fee-paying student) and FE (MFAT scholarship) should be counted. The source data where student type is stored required cleaning steps so that for each student, provider and student type, the spells are continuous (with no overlaps or duplicate dates). The official counts also only include providers that are government funded (where provider funding is state or state integrated schools and from term 1 2025, charter schools). Private schools should be excluded (note that private schools are rare so in practice this makes little actual difference). This is identified through the school funding type ‘State School’.

The Ministry is continually working to improve the quality of the data. This includes working with the SMS vendors to standardise the data collections, as well as implementing more granular business rules that effectively allow more “good” data through which will include more schools, students and attendance codes, including the unknown code of “?”.

Significant events (such as covid, cyclones etc) can cause a large amount of noise in the data so you should be careful around these events. Covid can be filtered out reasonably well because we know in advance what it will look like in the data. Other major events may be more difficult to spot and remove.

Scope and coverage (source data)

Prior to 2024, the school attendance collection was voluntary but strongly encouraged; from 2025 schools are now required to submit their daily school attendance collection. While the table includes attendance records for almost all state and state integrated schools in New Zealand, a small number do not submit attendance data to the Ministry and a slightly larger number miss some years of the collection. School attendance has been collected and reported in its current form since 2011. Over the first few years of the collection, the number of participating schools increased until it stabilised at a relatively high level around 2015. In 2023 for term 2, data was received from 2,358 schools, representing approximately 99% of enrolled students at state and state integrated schools across New Zealand (775,457).

Up until 2019, the Ministry only requested schools to send in attendance data for Term 2, which formed the basis for national reporting. Schools could also send in attendance data for other terms, but relatively few did. Since 2019, the Ministry has also requested data for Terms 1, 3 and 4. Response rates for these other terms are now broadly like the Term 2 response rates (covering 95%+ enrolled students).

The following tables are created:

Granular

Summaries

Key Business Rules

  1. The lockdowns that occurred when covid was spreading quickly had a big impact on attendance. So, as recommended by MoE, we exclude any observations that look like they occur when a school is in lockdown. The attendance code = ‘F’ when the school is in lockdown (note that this attendance code is also used sporadically on other days). To reliably identify when a school or region (TLA) is in lockdown, one of these two criteria needs to be met:

    • over 70% of observations for the school and date have an attendance code of ‘F’
    • over 35% of observations for the region and date have an attendance code of ‘F’
  2. For reproducing the official numbers, we only include state schools (which are most schools) and state integrated schools. This is done using the variable provider_funding_type. However, for completeness we leave in all schools here.

  3. Similarly the offical numbers only include the following student types:

    • Regular student
    • Exchange student
    • Foreign fee paying student
    • MFAT scholarship
      In the case where the student type is NULL we replace the NULL values with Default Regular student. Again we leave in all student types here for completeness but you can exclude them if it suits your particular use case.
  4. Entities have been included in each of the three output tables. The entity variable is called moe_ssa_provider_code. In cases where a student went to more than one school in a term, only the most common provider is included in the output table. Note that the days from all schools are retained in all the counts.

  5. The code limits the output to 2016 onwards. However, the data goes back to 2011 (almost entirely term 2).

    If you remove the line “and [moe_ssa_term_yr] >= 2016” then the code will also include the 2011 to 2015 data. However, the completeness of the data dimishes the further back you go. Anecdotally, it is likely that the schools who didn’t report in earlier years had lower attendance rates.

Inputs

Parameters

  1. [$(targetschema)]: The project schema under the target database into which the spell datasets are to be created.
  2. [$(targetprefix)]: A (short) prefix that enables you to identify the spell dataset easily in the schema and prevent overwriting any existing datasets that have the same name.
  3. [$(idicleanversion)]: The version of IDI_clean to use as the inputs
  4. [$(metaversion)]: The version of the metadata that corresponds to the input data
  5. {populationdata}: A population table - the output spell tables/views in the spell dataset will be subset only to include the subjects in this population, further filtered to the dates specified for each subject. the table is expected to contain the ID, start date and end date columns
  6. {idcolumn}: The name of ID column in the population table, that identifies a subject of the population. This column is used to filter the database objects- the output spell tables will only contain these subjects.
  7. {startdatecolumn}: The name of start date column in the population table. The events in the spell tables are subset to those intersecting with this column for that subject. Each subject may have a different start & end date.
  8. {enddatecolumn}: The name of end date column in the population table.

Note that we expect the source dataset to move from the IDI_Adhoc library to IDI_clean

Data sources

  1. [$(idicleanversion)].[clean_read_MOE].[StudentAttendance] - A very large dataset showing attendance in minutes by person as reported by schools
  2. [$(idicleanversion)].[moe_clean].[Provider_Profile] - Contains the school type and the region of the school
  3. [$(metaversion)].[clean_read_CLASSIFICATIONS].[moe_attendance_codes] - Shows how to group the 26 attendance codes into present, justify absence, unjustified absence
  4. [$(metaversion)].[clean_read_CLASSIFICATIONS].[moe_school_term_dates] - Shows the start and end date of each term. This table will need to be updated as more term dates are released.
  5. [$(idicleanversion)].[security].[concordance] - links snz_uid and snz_moe_uid
  6. [$(metaversion)].[moe_school].[sch_funding_type_code] - turns the student type codes into English versions
  7. [$(metaversion)].[moe_school].[ter_funding_type_code] - turns the provider type into English version
  8. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2011 - used to identify who is a regular student etc
  9. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2012 - used to identify who is a regular student etc
  10. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2013 - used to identify who is a regular student etc
  11. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2014 - used to identify who is a regular student etc
  12. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2015 - used to identify who is a regular student etc
  13. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2016 - used to identify who is a regular student etc
  14. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2017 - used to identify who is a regular student etc
  15. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2018 - used to identify who is a regular student etc
  16. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2019 - used to identify who is a regular student etc
  17. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2020 - used to identify who is a regular student etc
  18. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2021 - used to identify who is a regular student etc
  19. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2022 - used to identify who is a regular student etc
  20. IDI_Adhoc.clean_read_MOE.School_Roll_Return_2023 - used to identify who is a regular student etc

Outputs

[moe_sch_att_gran_$(YYYYMM)];
[moe_sch_att_term_$(YYYYMM)];
[moe_sch_att_year_$(YYYYMM)];

The main output is [moe_sch_att_year_$(YYYYMM)] which includes the following columns

Column name Description
snz_uid The STATSNZ person identifier for each individual.
year Year of attendance at school
P Student attends - is present
U Unjustified absence
J Justified absence
attendance How regular a student attends school. E.g. regular attendance, irregular absence, chronic absence

Dependencies

To be updated

Version Log

Date Version Comments
29 Nov 2022 Initial version (Todd Nicholson)
16 Sep 2023 Updated to allow the easy exclusion of different types of students (e.g. exchange students) and different types of schools (e.g. private schools) (Todd Nicholson)
18 Nov 2023 Updated to rename the output and adjust the inputs, which have now moved to the idi_clean area with new variable names
10 Mar 2024 Updated to add entities to the output tables, rename tables to be consistent with other modules, add in 2011 and 2012 rolls to allow the code to work all the way back to the start of the attendance data, add in comment about pre-2016 data.

Community of Interest

Person/Artifact/Agency Involvement
Cheryl Remington Steward
Alan Bristow SME (at MoE)
Simon Anastasiadis IDI expert
Marc De Boer IDI expert
Andrew Webber SME
Todd Nicholson Module Coder

Code

***************************************************************************************************************************/
/* PART 0: Parameters */

/*PARAMETERS
SQLCMD only (Activate by clicking Query->SQLCMD Mode)
Already in master.sql; Uncomment when running individually 
*/

:setvar targetprefix "tmp" 
:setvar idicleanversion "IDI_Clean_202406" 
:setvar targetschema "DL-MAA2020-47"
:setvar metaversion "IDI_Metadata_202406"
:setvar yyyymm "202406"

/* Assign the target database to which all the components need to be created in. */
use IDI_UserCode;
go

/* Part 1: Identify covid lockdown regions and times */

/*Start with categorising all the days into lockdown (F) or not and summarising by school.
*/

drop table if exists [#moe_sch_att_ld_hunt_sch_pre_$(yyyymm)]
go

select  
	moe_ssa_provider_code, 
	moe_ssa_attendance_date, 
	cast(avg(lockdown) as float) as school_lockdown, 
	count(moe_ssa_attendance_date) as school_count
into [#moe_sch_att_ld_hunt_sch_pre_$(yyyymm)]
from (
select a.moe_ssa_attendance_date,
	cast(left(a.moe_ssa_schl_attendance_code, 2) as varchar(2)) as moe_ssa_schl_attendance_code,
	cast(left(a.moe_ssa_provider_code, 5) as varchar(5)) as moe_ssa_provider_code,
	cast(left(a.moe_ssa_duration, 5) as varchar(5)) as moe_ssa_duration,
	case when a.moe_ssa_schl_attendance_code = 'F' then 1.0
		else 0.0
	end lockdown
from [$(idicleanversion)].[moe_clean].[school_student_attendance] a
where moe_ssa_schl_attendance_code is not NULL) as c
group by moe_ssa_provider_code, moe_ssa_attendance_date
order by moe_ssa_provider_code, moe_ssa_attendance_date
;

drop table if exists [#moe_sch_att_ld_hunt_sch_$(yyyymm)]
go

select b.moe_pp_provider_ta_auth_code, 
	a.moe_ssa_provider_code, 
	a.moe_ssa_attendance_date, 
	a.school_lockdown, 
	a.school_count 
into [#moe_sch_att_ld_hunt_sch_$(yyyymm)]
from [#moe_sch_att_ld_hunt_sch_pre_$(yyyymm)] as a
left join [$(idicleanversion)].[moe_clean].[Provider_Profile] b on a.moe_ssa_provider_code = b.moe_pp_provider_code
; 

/* Now categorising all the days into lockdown (F) or not and summarising by TLA.*/
drop table if exists [#moe_sch_att_ld_hunt_tla_$(yyyymm)]
go

select moe_pp_provider_ta_auth_code, 
	moe_ssa_attendance_date, 
	cast(avg(lockdown) as float) as tla_lockdown, 
	count(moe_ssa_attendance_date) as tla_count
into #moe_sch_att_ld_hunt_tla_$(yyyymm)
from (
select moe_ssa_attendance_date as moe_ssa_attendance_date 
	, cast(left(a.moe_ssa_schl_attendance_code, 2) as varchar(2)) as moe_ssa_schl_attendance_code
	, cast(left(a.moe_ssa_provider_code, 5) as varchar(5)) as moe_ssa_provider_code 
	, cast(left(a.moe_ssa_duration, 5) as varchar(5)) as moe_ssa_duration
	, b.moe_pp_provider_ta_auth_code
	, case when moe_ssa_schl_attendance_code = 'F' then 1.0
		else 0.0
	end lockdown

from [$(idicleanversion)].[moe_clean].[school_student_attendance] a
left join [$(idicleanversion)].[moe_clean].[Provider_Profile] b
on a.moe_ssa_provider_code = b.moe_pp_provider_code
where moe_ssa_schl_attendance_code is not NULL) as c
group by moe_pp_provider_ta_auth_code, moe_ssa_attendance_date
order by moe_pp_provider_ta_auth_code, moe_ssa_attendance_date
;

/* Put it together and summarise into a binary indicator for lockdown by school and date*/
/* The thresholds here are based on analysis of the distributions during known lockdown periods 
	and known non-lockdown periods.  */
drop table if exists [#moe_sch_att_ld_hunt_sum_$(yyyymm)]
;

select a.moe_pp_provider_ta_auth_code, 
	a.moe_ssa_provider_code, 
	a.moe_ssa_attendance_date, 
	a.school_lockdown, 
	a.school_count,
	b.tla_lockdown, 
	b.tla_count,
	case
	 	when a.moe_ssa_attendance_date > '2020-03-24' AND b.TLA_lockdown > 0.35 then 1
	 	when a.moe_ssa_attendance_date > '2020-03-24' AND a.school_lockdown > 0.7 then 1
	 	else 0
	end lockdown
into [#moe_sch_att_ld_hunt_sum_$(yyyymm)]
from [#moe_sch_att_ld_hunt_sch_$(yyyymm)] as a
left join [#moe_sch_att_ld_hunt_tla_$(yyyymm)] as b
on a.moe_pp_provider_ta_auth_code = b.moe_pp_provider_ta_auth_code and a.moe_ssa_attendance_date = b.moe_ssa_attendance_date
;

/* Put together all school roll information from all school roll return year tables to identify students classified as a regular student, exchange student etc. */
drop table if exists [#moe_sch_att_srr_$(yyyymm)];

create table [#moe_sch_att_srr_$(yyyymm)] (
	snz_moe_uid int,
	collectiondate date,
	studenttype varchar(25),
	providernumber int,
	FirstAttendance date,
	LastAttendance date)

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2011]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2012]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2013]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2014]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2015]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2016]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2017]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2018]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2019]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2020]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2021]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2022]

insert into [#moe_sch_att_srr_$(yyyymm)]
select distinct snz_moe_uid, collectiondate, studenttype, providernumber, FirstAttendance, LastAttendance
from [IDI_Adhoc].[clean_read_MOE].[School_Roll_Return_2023]

/* identify the start and end time period for each student enrolment type. there are multiple rolls per student as snapshot at each collection date is collected. 
students can also have different student type descriptions over time, so we identify the start and end dates at each collection point*/ 

/* joining student type description to the school roll return data. identify the earliest and latest collection date and attendance dates, by student, student type and provider. 
a student may have different student types per provider and a student may have different student type codes for different schools they attend. The student first attendance at a 
provider appears to be the first ever attendance for a given provider. e.g. if a student leaves school A to go to school B, but then returns to school A. The date of first 
attendance for school A for when the student returns, is the first date of attendance EVER at school A. */

drop table if exists [#moe_sch_att_stud_type_$(yyyymm)]
;

select 
	sr.snz_moe_uid
	,sr.studenttype
	,left(st.StudentTypeName,50) as student_type_code
	,sr.providernumber
	,sr.firstattendance
	,min(sr.collectiondate) as min_collectiondate
	,max(sr.collectiondate) as max_collectiondate
	,max(sr.lastattendance) as max_lastattendance
	
into [#moe_sch_att_stud_type_$(yyyymm)]
from [#moe_sch_att_srr_$(yyyymm)] as sr
	left join [$(metaversion)].[moe_school].[student_type23_code] st on sr.studenttype = st.[StudentTypeCode]

group by
	sr.snz_moe_uid
	,sr.studenttype
	,st.StudentTypeName 
	,sr.providernumber
	,sr.firstattendance 
;

/* if there are multiple student types per student, take the minimum and maximum collection date of the previous record (for the same provider). this will be used to
identify start and end dates for records inbetween the first attendance and last attendance records. this will help to identify duplicate periods for the same provider
that may overlap for a given provider */
drop table if exists [#moe_sch_att_stud_lag_$(yyyymm)]

select
	snz_moe_uid
	,studenttype
	,student_type_code
	,providernumber
	,min_collectiondate
	,max_collectiondate
	,firstattendance
	,max_lastattendance
	,lag(min_collectiondate) over(partition by snz_moe_uid, providernumber order by snz_moe_uid, providernumber, min_collectiondate) as previous_min_collectiondate
	,lag(max_collectiondate) over(partition by snz_moe_uid, providernumber order by snz_moe_uid, providernumber, min_collectiondate) as previous_max_collectiondate
into [#moe_sch_att_stud_lag_$(yyyymm)]
from
	[#moe_sch_att_stud_type_$(yyyymm)]
;

/* identify duplicate periods for the same provider that overlap each other */ 
drop table if exists [#moe_sch_att_stud_dups_$(yyyymm)]

select
	snz_moe_uid
	,studenttype
	,student_type_code
	,providernumber
	,min_collectiondate
	,max_collectiondate
	,firstattendance
	,max_lastattendance
	,previous_min_collectiondate
	,previous_max_collectiondate
	,case when min_collectiondate >= previous_min_collectiondate and max_collectiondate <= previous_max_collectiondate then 1 else 0 end as dup_ind
into [#moe_sch_att_stud_dups_$(yyyymm)]
from [#moe_sch_att_stud_lag_$(yyyymm)]
;

drop table if exists [#moe_sch_att_stud_type_cln_$(yyyymm)]
;
/* remove records that overlap / duplicate. calculate the previous collection date once duplicates are removed. */
	
	select
	snz_moe_uid
	,studenttype
	,student_type_code
	,providernumber
	,min_collectiondate
	,max_collectiondate
	,firstattendance
	,max_lastattendance
	,lag(firstattendance) over (partition by snz_moe_uid, providernumber order by snz_moe_uid, min_collectiondate) as lag_firstattendance
	,lag(max_collectiondate) over (partition by snz_moe_uid, providernumber order by snz_moe_uid, providernumber, max_collectiondate) as lag_max_collectiondate
	into [#moe_sch_att_stud_type_cln_$(yyyymm)]
	from [#moe_sch_att_stud_dups_$(yyyymm)]
	where
		dup_ind = 0
;

/* calculate a new variable where if for a given student and provider, if there is a change in student type, take the latest collection date of the previous record.
this allows the start and end dates for a student type to be continuous. however, the end date of one record may be the same date as the start date of the next record. */
drop table if exists [#moe_sch_att_stud_type_stt_$(yyyymm)]
;

select
	*
	,case 
		when firstattendance = lag_firstattendance then lag_max_collectiondate 
		else firstattendance
	end as firstattendance2
into [#moe_sch_att_stud_type_stt_$(yyyymm)]
from
	[#moe_sch_att_stud_type_cln_$(yyyymm)]
;

/* calculate the last, first attendance of the previous record of a given student and provider. */

drop table if exists [#moe_sch_att_stud_type_end_$(yyyymm)]
;

select
	*
	,lag(firstattendance2) over (partition by snz_moe_uid, providernumber order by snz_moe_uid, firstattendance2 desc) as end_firstattendance
into [#moe_sch_att_stud_type_end_$(yyyymm)]
from
	[#moe_sch_att_stud_type_stt_$(yyyymm)]
;

/* using the calculated last, first attendance of the previous record created in the query before, create a new last attendance date. this is to fill in the end dates
of each student type for a student and provider (as end dates only exist generally for the last collection for that student at the particular provider). if there are 
no last attendance dates for a student and provider, assume the student is still attending at the provider and give a default end date of 31-12-9999. */
drop table if exists [#moe_sch_att_roll_stud_type_$(yyyymm)]
;

select
	snz_moe_uid
	,studenttype as student_type_code
	,student_type_code as student_type_desc
	,providernumber as providernumber
	,firstattendance2 as schoolroll_date_start
	,coalesce(case 
		when max_lastattendance is null and firstattendance <> end_firstattendance then dateadd(day,-1,end_firstattendance)
		when max_lastattendance is null and firstattendance = end_firstattendance then max_collectiondate
		else max_lastattendance
	end, '9999-12-31') as schoolroll_date_end
into [#moe_sch_att_roll_stud_type_$(yyyymm)]
from
	[#moe_sch_att_stud_type_end_$(yyyymm)]
;

/*Combine the two funding tables just in case the tertiary one is required*/
drop table if exists #funding_type_code

select *
into #funding_type_code
from (

select SchoolFundingTypeID as ProviderFundingTypeID, SchoolFundingTypeDescription as ProviderFundingType
from [$(metaversion)].[moe_school].[sch_funding_type_code]
union all 
select [TertiaryFundingTypeID] as ProviderFundingTypeID, TertiaryFundingTypeDescription as ProviderFundingType
from [$(metaversion)].[moe_school].[ter_funding_type_code]
) as a

/* Part 2: Join attendance data and then exclude days when a school was in lockdown */

/*Bring in the attendance data and link it to school dates, attendance codes and provider information */
drop table if exists [#moe_sch_att_daily_attend_$(yyyymm)]
;

/*Create intermediate table of school attendance: daily attendance by person, provider, attendance code and concordance*/
/*~1 hour*/
select
	concord.snz_uid
	,a.snz_moe_uid
	,a.moe_ssa_attendance_date as moe_ssa_attendance_date
	,a.[moe_ssa_term_yr] as [year]
	,a.[moe_ssa_term_code] as term
	,cast(left(a.moe_ssa_provider_code, 5) as varchar(5)) as moe_ssa_provider_code 
	,cast(left(a.moe_ssa_schl_attendance_code, 2) as varchar(2)) as moe_ssa_schl_attendance_code
	,cast(left(c.reporting_category, 2) as varchar(2)) as reporting_category
	,a.moe_ssa_duration 
	,case 
		when ptype.ProviderFundingTypeID in (46000,46001,46002,46004,46005,46006,46012,46016,46022) then 'State School' 
		else cast(left(ptype.ProviderFundingType, 60) as varchar(60))
	end as provider_funding_type
	,coalesce(student_type_code, 'Default RE') as student_type_code
	,cast(left(coalesce(student_type_desc, 'Default Regular student'), 60) as varchar(60)) as student_type_desc
into [#moe_sch_att_daily_attend_$(yyyymm)]
from [$(idicleanversion)].[moe_clean].[school_student_attendance] a 

left join [IDI_Metadata].[clean_read_CLASSIFICATIONS].[moe_attendance_codes] c on a.moe_ssa_schl_attendance_code = c.school_code
left join [$(idicleanversion)].[security].[concordance] concord on  a.snz_moe_uid = concord.snz_moe_uid
left join [$(idicleanversion)].[MOE_clean].[Provider_Profile] pp on a.moe_ssa_provider_code = pp.moe_pp_provider_code
left join #funding_type_code ptype on pp.moe_pp_sch_fund_type_code = ptype.ProviderFundingTypeID
left join [#moe_sch_att_roll_stud_type_$(yyyymm)] st on a.snz_moe_uid = st.snz_moe_uid and a.moe_ssa_provider_code = st.providernumber and moe_ssa_attendance_date between schoolroll_date_start and schoolroll_date_end
where a.moe_ssa_schl_attendance_code is not null 
and [moe_ssa_term_yr] >= 2016
and c.reporting_category is not null
;

alter table [#moe_sch_att_daily_attend_$(yyyymm)] rebuild partition = all with (data_compression = page)

/* Exclude days when a school is likely to be in lockdown ~ 1.25hrs */
drop table if exists [IDI_Sandpit].[$(targetschema)].[moe_sch_att_daily_wo_lock_$(yyyymm)]
;

select snz_uid
	, snz_moe_uid
	, a.moe_ssa_attendance_date
	, year
	, term
	, a.moe_ssa_provider_code
	, moe_ssa_schl_attendance_code
	, reporting_category
	, moe_ssa_duration
	, provider_funding_type
	, student_type_code
	, student_type_desc
into [IDI_Sandpit].[$(targetschema)].[moe_sch_att_daily_wo_lock_$(yyyymm)]
from [#moe_sch_att_daily_attend_$(yyyymm)] as a
left join [#moe_sch_att_ld_hunt_sum_$(yyyymm)] as b
on a.moe_ssa_provider_code = b.moe_ssa_provider_code and a.moe_ssa_attendance_date = b.moe_ssa_attendance_date
where lockdown != 1
;

alter table [IDI_Sandpit].[$(targetschema)].[moe_sch_att_daily_wo_lock_$(yyyymm)] rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE) 
;

/* Part 3: Summarise the 26 codes into 4 main ones and turn time periods reported into half days */
/*School attendance by day: this table provides a breakdown of halfday codes by person, day, term and school.  ~ 1.42hrs */

drop table if exists [#moe_sch_att_by_day_$(yyyymm)];
go

with school_attendance_aggregated as (
	select snz_uid,
		snz_moe_uid,
		year, 
		term,
		moe_ssa_provider_code,
		moe_ssa_attendance_date,
		provider_funding_type,
		student_type_code,
		student_type_desc,
		sum(time_p) as time_p,
		sum(time_j) as time_j,
		sum(time_u) as time_u,
		sum(time_x) as time_x,
		sum(tot_dur) as tot_dur
	from (
	select 
		snz_uid,
		snz_moe_uid,
		year, 
		term,
		moe_ssa_provider_code,
		moe_ssa_attendance_date,
		provider_funding_type,
		student_type_code,
		student_type_desc,
		case when reporting_category = 'P' then sum(moe_ssa_duration) end as time_p, 
		case when reporting_category = 'J' then sum(moe_ssa_duration) end as time_j,
		case when reporting_category = 'U' then sum(moe_ssa_duration) end as time_u,
		case when reporting_category = 'X' then sum(moe_ssa_duration) end as time_x,
		sum(moe_ssa_duration) as tot_dur
	from 
		[IDI_Sandpit].[$(targetschema)].[moe_sch_att_daily_wo_lock_$(yyyymm)]
	group by 	
		snz_uid,
		snz_moe_uid,
		year, 
		term,
		moe_ssa_attendance_date,
		moe_ssa_provider_code,
		reporting_category,
		provider_funding_type,
		student_type_code,
		student_type_desc
	) as a
	group by snz_uid,
		snz_moe_uid,
		year, 
		term,
		moe_ssa_attendance_date,
		moe_ssa_provider_code,
		provider_funding_type,
		student_type_code,
		student_type_desc)

select
	snz_uid,
	snz_moe_uid,
	year, 
	term,	
	moe_ssa_attendance_date,
	moe_ssa_provider_code,
	provider_funding_type,
	student_type_code,
	student_type_desc,
	case
		when tot_dur < 120 then 'None'
		when tot_dur < 240 then
			case
				when time_p >= 120 then 'P'
				when time_u >    0 then 'U'
				when time_x >= 120 then 'X'
				else 'J'
			end
		when time_p >= 240 then 'PP'
		when time_p >= 120 then
			case
				when time_u >	 0 then 'PU'
				when time_x >= 120 then 'PX'
				else 'PJ'
			end
		when time_u > 0 then
			case
				when time_p + time_j + time_x >= 120 then
					case
						when time_x >= 120 then 'UX'
						else 'JU'
					end
				else 'UU'
			end
		when time_x >= 240 then 'XX'
		when time_x >= 120 then 'JX'
		else 'JJ'
	end as fulldaycode
	into [#moe_sch_att_by_day_$(yyyymm)]
from	
	school_attendance_aggregated
;
go

alter table [#moe_sch_att_by_day_$(yyyymm)] rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE)

/* Expand the days so that the proportion calculation will work.  Double any day with two halfdays ~ 3.16hrs*/
drop table if exists [#moe_sch_att_gran_$(yyyymm)];
go

select *
into [#moe_sch_att_gran_$(yyyymm)]
from 
(select *, substring(fulldaycode,1,1) as halfdaycode
from [#moe_sch_att_by_day_$(yyyymm)]
where fulldaycode != 'None'
union all
select *, substring(fulldaycode,2,1) as halfdaycode
from [#moe_sch_att_by_day_$(yyyymm)]
where fulldaycode in ('PP','PU','PX','PJ','UX','JU','UU','XX','JX','JJ')) as a
;

alter table [#moe_sch_att_gran_$(yyyymm)] rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE)

/* Part 4: Person summary - Summarise by term and categorise into Regular attendance, Irregular absence etc  */

/*moe_sch_att_term_$(yyyymm): this table provides a breakdown of halfday code proportions by person, year and term*/

drop table if exists [#moe_sch_att_pre_term_$(yyyymm)];
go

with cte_count as (
	select
		snz_uid,
		year,
		term,
		halfdaycode,
		count(*) as count
	from
		[#moe_sch_att_gran_$(yyyymm)]
	group by		
		snz_uid,
		year,
		term,
		halfdaycode
),
cte_prop as (
	select
		cnt.*,
		tot.count as total,
		cnt.count / (tot.count * 1.0) as prop,
		1 as one
	from
		cte_count cnt
	inner join
		(
			select 
				snz_uid,
				year,
				term,
				sum(count) as count
			from
				cte_count
			group by
				snz_uid,
				year,
				term
				
		) tot
	on
		cnt.snz_uid = tot.snz_uid and
		cnt.year = tot.year and
		cnt.term = tot.term
)
select 
	snz_uid,
	year,
	term,
	coalesce(sum(case when halfdaycode = 'P' then prop end), 0) as P,
	coalesce(sum(case when halfdaycode = 'U' then prop end), 0) as U,
/*	coalesce(sum(case when halfdaycode = 'X' then prop end), 0) as X,  */
	coalesce(sum(case when halfdaycode = 'J' then prop end), 0) as J,
	case 
		when coalesce(sum(case when halfdaycode = 'P' then prop end), 0) > 0.9 then 'Regular Attendance'
		when coalesce(sum(case when halfdaycode = 'P' then prop end), 0) > 0.8 then 'Irregular Absence'
		when coalesce(sum(case when halfdaycode = 'P' then prop end), 0) > 0.7 then 'Moderate Absence'
		else 'Chronic Absence'
	end as attendance
into [#moe_sch_att_pre_term_$(yyyymm)]
from 
	cte_prop 
group by
	snz_uid,
	year,
	term
;
GO

/*Add in the entities to make output checking easier.  We will use the rule that only the most common provider 
is retained.  This will slightly undercount the number of entities but allows the output table to remain as 
one row per snz_uid, year, term*/

drop table if exists [#moe_sch_att_term_ent_$(yyyymm)];
go

with cte as (
select
		snz_uid,
		year,
		term,
		moe_ssa_provider_code,
		row_number() over (partition by snz_uid, year, term order by count(moe_ssa_provider_code) desc) rn
	from
		[#moe_sch_att_gran_$(yyyymm)]
	group by		
		snz_uid,
		year,
		term,
		moe_ssa_provider_code)

select snz_uid,
		year,
		term,
		moe_ssa_provider_code
into [#moe_sch_att_term_ent_$(yyyymm)]
from cte 
where rn = 1

/*Join the attendance counts to the entities*/
drop table if exists [#moe_sch_att_term_$(yyyymm)];
go

select
		a.snz_uid,
		a.year,
		a.term,
		a.P,
		a.U,
		a.J,
		a.attendance,
		b.moe_ssa_provider_code
into [#moe_sch_att_term_$(yyyymm)]
from [#moe_sch_att_pre_term_$(yyyymm)] as a
left join [#moe_sch_att_term_ent_$(yyyymm)] as b
on a.snz_uid = b.snz_uid and a.year = b.year and a.term = b.term

alter table [#moe_sch_att_term_$(yyyymm)] rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE)

/*moe_sch_att_year_$(yyyymm): this table provides a breakdown of halfday code proportions by person and year*/

drop table if exists [#moe_sch_att_pre_year_$(yyyymm)];
go

with cte_count as (
	select
		snz_uid,
		year,
		halfdaycode,
		count(*) as count
	from
		[#moe_sch_att_gran_$(yyyymm)]
	group by		
		snz_uid,
		year,
		halfdaycode
),
cte_prop as (
	select
		cnt.*,
		tot.count as total,
		cnt.count / (tot.count * 1.0) as prop,
		1 as one
	from
		cte_count cnt
	inner join
		(
			select 
				snz_uid,
				year,
				sum(count) as count
			from
				cte_count
			group by
				snz_uid,
				year
				
		) tot
	on
		cnt.snz_uid = tot.snz_uid and
		cnt.year = tot.year
)
select 
	snz_uid,
	year,
	coalesce(sum(case when halfdaycode = 'P' then prop end), 0) as P,
	coalesce(sum(case when halfdaycode = 'U' then prop end), 0) as U,
/*	coalesce(sum(case when halfdaycode = 'X' then prop end), 0) as X,  */
	coalesce(sum(case when halfdaycode = 'J' then prop end), 0) as J,
	case 
		when coalesce(sum(case when halfdaycode = 'P' then prop end), 0) > 0.9 then 'Regular Attendance'
		when coalesce(sum(case when halfdaycode = 'P' then prop end), 0) > 0.8 then 'Irregular Absence'
		when coalesce(sum(case when halfdaycode = 'P' then prop end), 0) > 0.7 then 'Moderate Absence'
		else 'Chronic Absence'
	end as attendance
into [#moe_sch_att_pre_year_$(yyyymm)]
from 
	cte_prop 
group by
	snz_uid,
	year
;
GO

alter table [#moe_sch_att_pre_year_$(yyyymm)] rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE)

/*Add in the entities to make output checking easier.  We will use the rule that only the most common provider 
is retained.  This will slightly undercount the number of entities but allows the output table to remain as 
one row per snz_uid and year*/

drop table if exists [#moe_sch_att_year_ent_$(yyyymm)];
go

with cte as (
select
		snz_uid,
		year,
		moe_ssa_provider_code,
		row_number() over (partition by snz_uid, year order by count(moe_ssa_provider_code) desc) rn
	from
		[#moe_sch_att_gran_$(yyyymm)]
	group by		
		snz_uid,
		year,
		moe_ssa_provider_code)

select snz_uid,
		year,
		moe_ssa_provider_code
into [#moe_sch_att_year_ent_$(yyyymm)]
from cte 
where rn = 1

/*Join the attendance counts to the entities*/
drop table if exists [#moe_sch_att_year_$(yyyymm)];
go

select
		a.snz_uid,
		a.year,
		a.P,
		a.U,
		a.J,
		a.attendance,
		b.moe_ssa_provider_code
into [#moe_sch_att_year_$(yyyymm)]
from [#moe_sch_att_pre_year_$(yyyymm)] as a
left join [#moe_sch_att_year_ent_$(yyyymm)] as b
on a.snz_uid = b.snz_uid and a.year = b.year

alter table [#moe_sch_att_year_$(yyyymm)] rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE)

drop table if exists [IDI_Sandpit].[$(targetschema)].[moe_sch_att_term_hash_$(yyyymm)]

select *
into [IDI_Sandpit].[$(targetschema)].[moe_sch_att_term_hash_$(yyyymm)]
from [#moe_sch_att_term_$(yyyymm)]

drop table if exists [IDI_Sandpit].[$(targetschema)].[moe_sch_att_year_hash_$(yyyymm)]

select *
into [IDI_Sandpit].[$(targetschema)].[moe_sch_att_year_hash_$(yyyymm)]
from [#moe_sch_att_year_$(yyyymm)]

drop table if exists [IDI_Sandpit].[$(targetschema)].[moe_sch_att_gran_hash_$(yyyymm)]

select *
into [IDI_Sandpit].[$(targetschema)].[moe_sch_att_gran_hash_$(yyyymm)]
from [#moe_sch_att_gran_$(yyyymm)]