School Enrolment Spells

angus.prain
27 June 2025

Module Output

**SQL: [IDI_Community].[edu_school_enrolment_spells].[school_enrolment_spells_YYYYMM]
**SAS: libname cm_dlr_ODBC dsn=idi_community_srvprd schema=edu_school_enrolment_spells; proc print data = cm_dlr.school_enrolment_spells_YYYYMM ;run ;
How to access a code module in the Data Lab:Read here

Purpose

This code creates spells of school enrolment for ākonga enrolled in school in New Zealand. There are two output tables associated with this module corresponding to two use cases:

  1. a table containing information about which school each student is enrolled in (enrolment spells).
  2. a table containing information about whether a student is enrolled in any school (enrolment status).

This script is responsible for the first table.

Key Concepts

ENROL vs School Roll Return data sources

ENROL is a register of student enrolments. Schools are legally required to update enrolment records in ENROL within 5 school days as students enrol, change schools or leave the school system. The source of the student rolls in the school directory come from ENROL. This is updated on a monthly basis (during the school year). ENROL is also the source data for this module.

School Roll Return (SRR) is another source of roll information. There are four school roll returns every year. All schools complete a snapshot of roll information as at 1 March and 1 July. The Ministry of Education also collects 1 June and 1 September rolls from state and state-integrated schools with students in funding year levels 9 to 15. Charter Schools also complete the 1 June and 1 September returns. Most schools use a student management system (SMS) to manage their student records, and to complete the roll returns for the Ministry. SRR data is reported on the MoE education counts public facing page.

Enrolments vs Attendance

If a student is attending a school, then the school has a legal requirement to enrol that student in ENROL. Please see this public facing Minsitry page.

A child can start school when they turn 5, but they are not legally required to start until they turn 6. Once they are in enrolled in school they are legally required to attend until they are 16. If a student does not attend school while they are enrolled, this is dealt with based on the school’s internal policies. Once the student has been absent for 20 consecutive school days (unless they have been told that the student’s absence is only temporary) the school should then withdraw them with their last date of attendance at the school. Please see this public facing Ministry page

There are a small number of situations where a student can be enrolled at multiple schools simultaneously. These include dual enrolments with Te Kura, or off-site enrolments with Health Schools or Alternative Education Providers.

Comparison against other sources

At a given snapshot of time, the enrolment spells module can be used to re-create school roll counts. Snapshot roll counts can be summarised from the table [IDI_Clean_].[moe_clean].[school_roll_return] at two different dates every year: March 1st and July 1st. Mismatches can be investigated at the student level.

Public facing school roll summaries, derived from the snapshot School Roll Return, are also available from MoE’s EdCounts page. These public facing summaries do not always match those obtained from the [moe_clean].[school_roll_return] table.

The module tables are derived from the source table [IDI_Clean_].[moe_clean].[student_enrol] through a cleaning process. A comparison to this source can be used to understand the extent of the cleaning process.

References & Contacts

  1. James Swindells (MoE)
  2. Cheryl Remington (MoE)
  3. Angus Prain (Stats)

Development team

Domain Agency Person
Lead SME SIA Andrew Webber
Lead SME UC Peng Sun
IDI analysts Stats Angus Prain
Data Supply MoE ENROL
Policy/Operational Experts MoE Chelsea Shaw

Module Business Rules

The enrolment_spells output table consists of unique combinations of (student, spell start, spell end, school) (which are snz_uid, startdate, enddate, schoolnbr as column names in the table), with overlapping spells reduced to only those overlaps which cannot be disentangled using attendance data prioritisation.

The script uses the following business rules:

  1. Short spells are removed (length of shortest spell determined by user parameter).
  2. Missing spell enddates are imputed (imputed date determined by user parameter, intended to be equal to last attendance date in [moe_clean].[student_attendance] table).
  3. Same startdate, same enddate, same school: remove duplicate rows.
  4. Same startdate, different enddate, same school: take longest spell.
  5. Different startdate, same enddate, same school: take longest spell.

We also apply the following general business rule to a range of different cases of overlapped enrolment spells:

General Rule. When a student is enrolled at more than one school simultaneously, the school that the student attends for more days in the overlapping period is prioritised and other spells have the overlapping region snipped away, leaving shorter, non-overlapping spells. When attendance data is not available for comparison due to historical missing data or school-specific reasons, no prioritisation is carried out and overlapping spells remain in the table.

The following 4 business rules are special cases of the general rule:

  1. Same startdate, same enddate, different school: use attendance data to break overlap
    a. if either school has global null attendance*, no comparison is possible, both spells left
    b. if both schools have global non-null attendance**, select school with highest attendance count in overlap.
  2. Same startdate, different enddate, different school: use attendance data to break overlap
    a. if either school has global null attendance*, no comparison is possible, both spells left
    b. if both schools have global non-null attendance**, select school with highest attendance count in overlap
    c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping.
  3. Different startdate, same enddate, different school: use attendance data to break overlap
    a. if either school has global null attendance*, no comparison is possible, both spells left
    b. if both schools have global non-null attendance**, select school with highest attendance count in overlap
    c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping.
  4. Overlapping spells with no matching dates: use attendance data to break overlap
    a. if either school has global null attendance*, no comparison is possible, both spells left
    b. if both schools have global non-null attendance**, select school with highest attendance count in overlap
    c. Snip overlapped portion of de-prioritised spell off, leaving the on-overlapped bit(s).

*Global null attendance is a property that a school can have in a given year when the school has zero attendance records in the table [moe_clean].[student_attendance] in that year.
**Global non-null attendance is where a school has some attendance records in the table [moe_clean].[student_attendance] in a given year.

Note: Attendance data collection is terms 1-4 from 2019 onwards, with only Term 2 attendance recorded from 2018 and older. This results in some overlaps before 2019 being difficult to prioritise if they do not occur in Term 2. In these cases, a school for which the student has attended at any point, even outside the overlapped region (zero overlapped attendance) is prioritised over a school for which the student has never attended (NULL overlapped attendance). When both schools have zero overlapped attendance or both have null overlapped attendance, then the overlaps are broken using the rule of thumb that enddates are less reliable than startdates - in this case we snip the enddate of the earlier spell to match the startdate of the later starting spell. If this situation occurs in the special case of an inner / outer spell pair, we prioritise the outer spell, removing the inner spell from the table.

Open Issues/Comments

  1. Missing enddates that do not correspond to students enrolled at extraction (source data issue)
  2. Poorly joined snz_uid (IDI linking issue)
  3. Absence of a longitudinal school profile metadata (proposals exist to address this non-critical issue)
  4. We have not used the [te_kura] datasets for enrolments in Te Kura (Correspondence School) (future implementation is possible)
  5. Student attendance data is used in a critical way in this script, and therefore its quality and coverage plays a key role in the overall quality and coverage of the module. Attendance data coverage is currently (June 2025) as follows:
    a. Term 2 data only from 2011 until 2018
    b. Terms 1 through to 4 from 2019 until term 3 2024
    c. Max attendance date is 2024-09-27
  6. The [moe_clean].[student_enrol] table goes back to 1985 does not have any good coverage until 2005.

Parameters

The following parameters should be supplied to this module to run it in the database:

  1. {targetdb}: The SQL database on which the spell datasets are to be created.
  2. {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
  3. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  4. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.
  5. {max_attendance_date}: the max date in student_attendance table

Dependencies

  1. [moe_clean].[student_enrol]
  2. [moe_clean].[student_attendance]
  3. [moe_clean].[provider_profile]

Outputs

{targetdb}.{targetschema}.{projprefix}_school_enrolment_spells

Variable Descriptions

Column name Description
snz_uid Stats nz unique identifier
startdate Spell start date
enddate Spell end date
enrol_days Length of spell in days
schoolnbr School profile number
unique_spell_reason Description of any cleaning that has occurred on the spell or on an overlapping adjacent spell
special_flag Flag indicating Te Kra and Homeschool
start_level School level of the student at spell start date
spell_end_reason Description of why the spell ended - this derived from SpellEndReason in the [student_enrol] table
this_spell_inner Binary flag indicating that the spell is entirely containes in a larger (outer) spell
overlap_with_next_startdate Binary flag indicating that the enddate overlaps with the startdate of an adjacent spell (it has not been deprioritised)
final_spell The last spell in the table for the student (ordered by startdate, enddate)
enrolment_gap The spell is a period of non-enrolment
sch_enddate_imputed Binary flag indicating that the spell enddate was NULL in the source data
summer_break Binary flag indicating that the spell is a gap, but it’s a gap over the summer break

Module Version & Change History

Date Version Comments
4 June 2025 Initial Version based on specifications from Commissioning document.

Code

:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
:setvar idimetaversion "{idimetaversion}"
:setvar max_attendance_date "{max_attendance_date}"
/*
	:::::::::::::::::::::
	0. GLOBAL SETUP
	:::::::::::::::::::::
*/

/*
	::::::::::::::::::::::::::::
	0.1. GLOBAL
	::::::::::::::::::::::::::::
	Some parameters that get used later on
*/

DROP TABLE IF EXISTS #global;
SELECT start_year = 2018
    , end_year = 2025
    , max_summer_break = 60	/* maximum gap over summer that we allowe without inducing non_enrolment */
    , short = 4				/* minimum length of shortest spells pulled through into the base table before processing */
    , max_attendance_date = '2024-09-27'
    , lowest_age = 4
    , highest_age = 21		/* ORS students may be enrolled up to age 21 */	
INTO #global;

GO
 PRINT('#global TABLE generated');

/*
	:::::::::::::::::::::::::::::::
	SCHOOL METADATA 
	:::::::::::::::::::::::::::::::
	for convenience a useful table that has all the relevant school metadata

		drop table if exists #school_metadata;
		select *
		into #school_metadata
		from [IDI_Sandpit].[DL-MAA2020-47].[school_metadata];

*/

/*
	:::::::::::::::::::::::::::::::::::
	STUDENT POPULATION SAMPLE DEFINITION
	:::::::::::::::::::::::::::::::::::
	This is a tmp student sample for testing purposes
	The sample population consists of ~20k students who were enrolled 
	on June 15 of the year given by global.start_year (defined above)

		drop table if exists #tmp_student_sample;
			select 
			distinct top 20000 
			snz_uid	
			into #tmp_student_sample
			from $(idicleanversion).[moe_clean].[student_enrol]
			join #global g
			on 1 = 1
			/* everyone with a spell overlapping with June 15 on start year */	
WHERE DATEFROMPARTS(g.start_year, 6, 15) BETWEEN moe_esi_start_date AND ISNULL(moe_esi_end_date, g.max_attendance_date); 

GO
 PRINT('#tmp_student_sample TABLE generated')
 */

/*	
	:::::::::::::::::::::::::::::::::::::::::
	0.2 Schools that are closed
	:::::::::::::::::::::::::::::::::::::::::
*/

DROP TABLE IF EXISTS #closed_schools;
SELECT moe_pp_provider_code
    , moe_pp_provider_close_date
INTO #closed_schools
FROM $(idicleanversion).[moe_clean].[provider_profile]
WHERE moe_pp_provider_close_date IS NOT NULL;

GO
 PRINT('#closed_schools TABLE generated');

/*
	::::::::::::::::::::::::::::
	1. STUDENT ENROLMENT DATA 
	::::::::::::::::::::::::::::
	Note home schooling episodes are included in these data, provider code = 972 (labelled as 'homeschool' in special_flag in next section)
	Note also that Te Kura spells are included, provider code = 498 (labelled as 'correspondance' in special_flag in next section)
*/

/*  
	::::::::::::::::::::::::::::::::::::::::::::::
	1.1. Get the (mostly) raw records of students
	::::::::::::::::::::::::::::::::::::::::::::::
	Spells are required to fit in the appropriate window of dates
	Short spells are removed

*/

DROP TABLE IF EXISTS #student_enrol_all_raw;
SELECT a.snz_uid
    , a.snz_moe_uid
    , moe_esi_start_date AS startdate
	  /* if enddate is missing use extraction date*/	  
      /* ,isnull(moe_esi_end_date, moe_esi_extrtn_date) as enddate */
	  /* change in code to map to max_attendance date only, for comparison  */
    , ISNULL(moe_esi_end_date, g.max_attendance_date) AS enddate
    , moe_esi_leave_rsn_code
    , DATEDIFF(d,moe_esi_start_date,ISNULL(moe_esi_end_date,moe_esi_extrtn_date)) AS enrol_days
    , moe_esi_provider_code AS schoolnbr
    , moe_esi_entry_year_lvl_nbr AS start_level
    , CASE WHEN moe_esi_end_date IS NULL THEN 1 ELSE 0 END AS sch_enddate_imputed
	  /* include a row number for selecting the record if duplicate rows with same school (threee types of duplicate) */
    , ROW_NUMBER() OVER(PARTITION BY a.snz_uid, moe_esi_start_date, moe_esi_end_date, moe_esi_provider_code ORDER BY moe_esi_entry_year_lvl_nbr) AS rn_both	/*same startdate and enddate: choose randomly*/
    , ROW_NUMBER() OVER(PARTITION BY a.snz_uid, moe_esi_start_date, moe_esi_provider_code ORDER BY moe_esi_end_date DESC) AS rn_start /* same startdate: choose latest enddate */
    , ROW_NUMBER() OVER(PARTITION BY a.snz_uid, moe_esi_end_date, moe_esi_provider_code ORDER BY moe_esi_start_date) AS rn_end /* same enddate: choose earliest startdate */
    , DATEDIFF(YEAR, pd.snz_birth_date_proxy, moe_esi_start_date) AS age_at_startdate
    , DATEDIFF(YEAR, pd.snz_birth_date_proxy, ISNULL(moe_esi_end_date, moe_esi_extrtn_date)) AS age_at_enddate
    , CASE WHEN closed.moe_pp_provider_close_date IS NOT NULL 
			AND ISNULL(moe_esi_end_date, moe_esi_extrtn_date) > closed.moe_pp_provider_close_date 
			THEN 1 ELSE NULL END AS closed_school_at_enddate
INTO #student_enrol_all_raw
FROM $(idicleanversion).[moe_clean].student_enrol a
JOIN #global g
ON 1 = 1
JOIN $(idicleanversion).[data].personal_detail pd
ON pd.snz_uid = a.snz_uid
LEFT JOIN #closed_schools closed
ON closed.moe_pp_provider_code = a.moe_esi_provider_code
WHERE DATEDIFF(d,moe_esi_start_date, ISNULL(moe_esi_end_date, g.max_attendance_date)) > g.short /* delete short spells */ 
/* BEGIN STUDENT SAMPLE FILTER */
 /*and a.snz_uid in (select snz_uid from #tmp_student_sample ) -- temp student sample */
/* END STUDENT SAMPLE FILTER */
AND moe_esi_start_date <= g.max_attendance_date
AND ISNULL(moe_esi_end_date, g.max_attendance_date) >= DATEFROMPARTS(g.start_year, 1,1); /* some overlap with selected time period */

GO
 PRINT('#student_enrol_all_raw TABLE built'); 

/*
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	1.2. Find all students who have a 'triple lag'
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	- Table finds all lags
	- Lags are where the start date of one spell overlaps with an enddate of a spell that started earlier
	- Triple lagged spells occur when a student is enrolled in 4 distinct schools simultaneously
	we coinsider this to be an indicatio that something is wrong with the linking of such student 
	and therefore they are removed from the table

*/

DROP TABLE IF EXISTS #multiple_lags;
SELECT *
    , CASE WHEN LAG(enddate, 2) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS double_lag
    , CASE WHEN LAG(enddate, 3) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS triple_lag
    , CASE WHEN LAG(enddate, 4) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS quad_lag
    , CASE WHEN LAG(enddate, 5) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS pent_lag
    , CASE WHEN LAG(enddate, 6) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS hex_lag
    , CASE WHEN LAG(enddate, 7) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS hep_lag
    , CASE WHEN LAG(enddate, 8) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS oct_lag
    , CASE WHEN LAG(enddate, 9) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS non_lag
    , CASE WHEN LAG(enddate, 10) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 
	ELSE 0 END AS dec_lag
INTO #multiple_lags
FROM #student_enrol_all_raw;

GO
 PRINT('#multiple_lags TABLE generated');

/*

	BEGIN testing 

		select distinct snz_uid from #multiple_lags where triple_lag = 1 -- 
		select distinct snz_uid from #multiple_lags where quad_lag = 1 -- 
		select distinct schoolnbr from #multiple_lags where triple_lag = 1

		select 
		sum(double_lag) as double_lags --
		, sum(triple_lag) as triple_lag --
		, sum(quad_lag) as quad_lag -- 
		, sum(pent_lag) as pent_lag -- 
		, sum(hex_lag) as hex_lag -- 
		, sum(hep_lag) as hep_lag -- 
		, sum(oct_lag) as oct_lag -- 
		, sum(non_lag) as non_lag -- 
		, sum(dec_lag) as dec_lag -- 
		from #multiple_lags

	BEGIN testing 

*/

/*
	:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	1.3. Filtering raw data to appropriate spells
	:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	Build the base table by deduplicating on snz_uid, schoolnbr, 
	startdate, enddate and adding a special flag for homeschool and te kura
	filtering the triple lagged spells and spells associated with closed schools 
	and filtering to the appropriate age range

*/

DROP TABLE IF EXISTS #student_enrol_all_without_att_check;
SELECT snz_uid
    , startdate
    , enddate
    , moe_esi_leave_rsn_code
    , enrol_days
    , schoolnbr
    , start_level
    , sch_enddate_imputed
    , CASE WHEN schoolnbr = 498 THEN 'correspondance' 
	WHEN schoolnbr = 972 THEN 'homeschool' 
	ELSE NULL END AS special_flag
INTO #student_enrol_all_without_att_check
FROM #student_enrol_all_raw
WHERE rn_both = 1
AND rn_start = 1
AND rn_end = 1 /* duplicate spells for a given student at the same school, select only the longest spell */
AND age_at_enddate BETWEEN 4 AND 19
AND age_at_startdate BETWEEN 4 AND 19 
/* filter out spells that end after the school lis closed */
AND closed_school_at_enddate IS NULL 
/* this is where we filterf out students enrolled in 4 schools simultaneously */
AND snz_uid NOT IN (SELECT DISTINCT snz_uid FROM #multiple_lags WHERE triple_lag = 1); 

GO
 PRINT('#student_enrol_all_without_att_check generated'); 

/*
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	1.4. Build a table of the overlaps between attendance data and enrolment data
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	The idea is we should not allow comparisons to schools which did not have any attendance footprint (for whatever reason)	

*/

DROP TABLE IF EXISTS #att_enrol_coverage ;
WITH cte AS(
    SELECT DISTINCT schoolnbr
        , YEAR(startdate) AS start_year
    FROM #student_enrol_all_without_att_check 

    UNION

    SELECT DISTINCT schoolnbr
        , YEAR(enddate) AS start_year
    FROM #student_enrol_all_without_att_check
)
    , all_attendance AS(
    SELECT DISTINCT moe_ssa_provider_code
        , YEAR(moe_ssa_attendance_date) AS att_year
    FROM $(idicleanversion).[moe_clean].[school_student_attendance]
)
SELECT a.*
    , b.*
INTO #att_enrol_coverage
FROM cte a
LEFT JOIN all_attendance b
ON a.schoolnbr = b.moe_ssa_provider_code
AND a.start_year = b.att_year;

GO
 PRINT('#att_enrol_coverage  TABLE generated');

/*

	BEGIN check

		select * from #att_enrol_coverage order by schoolnbr
		select count(*) from #att_enrol_coverage		

		select start_year, count(*) from #att_enrol_coverage 
		where att_year is null
		group by start_year order by start_year

	END check

*/

/* 
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	1.5. remove the schools that are not comparible using attendance data
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	This table is our base table, that is used in the rest of this script

*/

DROP TABLE IF EXISTS #student_enrol_all ;
WITH all_spells AS(
    SELECT b.*
        , CASE WHEN (c.att_year IS NOT NULL OR d.att_year IS NOT NULL) THEN 1 ELSE 0 END AS att_footprint_year
    FROM #student_enrol_all_without_att_check b
    LEFT JOIN #att_enrol_coverage c
    ON c.schoolnbr = b.schoolnbr
    AND c.start_year = YEAR(b.startdate)
    LEFT JOIN #att_enrol_coverage d
    ON d.schoolnbr = b.schoolnbr
    AND d.start_year = YEAR(b.enddate)
)
SELECT *
INTO #student_enrol_all
FROM all_spells
WHERE att_footprint_year = 1; 

GO
 PRINT('#student_enrol_all TABLE generated');

/*
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	1.6. Keep a record of those removed spells 
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	These records will be added back at the end

*/

DROP TABLE IF EXISTS #student_enrol_removed_spells ;
WITH all_spells AS(
    SELECT b.*
        , CASE WHEN (c.att_year IS NOT NULL OR d.att_year IS NOT NULL) THEN 1 ELSE 0 END AS att_footprint_year
    FROM #student_enrol_all_without_att_check b
    LEFT JOIN #att_enrol_coverage c
    ON c.schoolnbr = b.schoolnbr
    AND c.start_year = YEAR(b.startdate)
    LEFT JOIN #att_enrol_coverage d
    ON d.schoolnbr = b.schoolnbr
    AND d.start_year = YEAR(b.enddate)
)
SELECT *
INTO #student_enrol_removed_spells
FROM all_spells
WHERE att_footprint_year = 0; 

GO
 PRINT('#student_enrol_removed_spells TABLE generated');

/*

	BEGIN check

		select count(*) from #student_enrol_all_without_att_check -- 
		select count(*) from #student_enrol_removed_spells -- 
		select count(*) from #student_enrol_all-- 
		 and  =  +  

		select * from #student_enrol_removed_spells order by snz_uid, startdate, enddate

		select * 
		from #school_metadata
		where moe_pp_provider_code in (select distinct schoolnbr from #student_enrol_removed_spells)

	END check

*/

/*

	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	2. BREAKING OVERLAPPED SPELLS WITH ATTENDANCE DATA
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	This section carries out the overlap breaking in three steps:
		- 2.1. Overlapping spells which have at least one matching date 
		- 2.2. Overlapping spells which do not have a matching date and which are inner spells
		- 2.3. Overlapping spells which do not have a matching date and which are not inner spells
	The reason for this is that the second step is vastly complicated if you dont first carry out the first one!

*/

/*
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.1. Breaking overlapped spells that have at least one matching date
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	There are three cases to consider
		- overlapped spells with a matching startdate and matching enddate
		- overlapped spells wth only a matching startdate
		- overlapped spells with only a mathing enddate

*/

/*	
	:::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.1.1.  Spells with mathching startdate and matching enddate
	:::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	Generate a '#duplicate_spells_indicated' table of the same length as student_enrol_all
	- spells with duplicate enddates, startdates (but always different schools they have been dealth with elsewhere) or both are indicated
	- priority is assigned based on attendance data in the overlap

*/

DROP TABLE IF EXISTS #duplicate_spells_indicated ;
WITH cte AS /* all spells that have same snz_uid, startdate, enddate */
(
    SELECT snz_uid
        , startdate
        , enddate
        , COUNT(*) AS n_rows
        , COUNT(DISTINCT schoolnbr) AS n_schools
    FROM #student_enrol_all
    GROUP BY snz_uid
        , startdate
        , enddate
    HAVING COUNT(DISTINCT schoolnbr) > 1
)
    , duplicate_spells AS /* pick up the actual spell information from #student_enroll_all for these duplicates */
(
    SELECT b.*
    FROM cte a
    JOIN #student_enrol_all b
    ON a.snz_uid = b.snz_uid
    AND a.startdate = b.startdate
    AND a.enddate = b.enddate /* join on both enddate and startdate*/
)
    , attendance_in_spell AS /* pick up the attendance records for these spells, label in-scope records*/
(
    SELECT a.*
        , b.moe_ssa_attendance_date
        , CASE WHEN b.moe_ssa_provider_code IS NULL THEN NULL	
		WHEN b.moe_ssa_provider_code IS NOT NULL AND (b.moe_ssa_attendance_date BETWEEN a.startdate AND a.enddate
)THEN 1 WHEN b.moe_ssa_provider_code IS NOT NULL
AND NOT(
    b.moe_ssa_attendance_date BETWEEN a.startdate AND a.enddate
)THEN 0 ELSE -99 END AS attendance_checker
FROM duplicate_spells a
LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b
ON a.snz_uid = b.snz_uid
AND a.schoolnbr = b.moe_ssa_provider_code
)
, attendance_summary_in_spell AS /* summarise the attendance situation per spell */
(
SELECT snz_uid
    , schoolnbr
    , startdate
    , enddate
    , MIN(moe_ssa_attendance_date) AS first_attendance
    , MAX(moe_ssa_attendance_date) AS last_attendance
	/*, count(distinct moe_ssa_attendance_date) as n_days_in_overlap*/
    , SUM(attendance_checker) AS n_days_in_overlap
	/*, case when sum(attendance_checker) is null then 1 else 0 end as null_checker*/
    , ROW_NUMBER() OVER(PARTITION BY snz_uid, startdate, enddate ORDER BY SUM(attendance_checker) DESC) AS rn /* if tied its random, if null its de-prioritised */
FROM attendance_in_spell
GROUP BY snz_uid
    , schoolnbr
    , startdate
    , enddate
    , att_footprint_year
)
, final_table AS(
SELECT *
    , CASE WHEN rn <> 1 		
		THEN 0 ELSE 1 END AS unique_school_flag
    , CASE WHEN rn = 1 THEN 'matching_startdate_and_enddate' ELSE 'matching_startdate_and_enddate' END AS duplicate_reason	
	FROM attendance_summary_in_spell	
)
SELECT 
*
, SUM(unique_school_flag) OVER(PARTITION BY snz_uid, startdate, enddate) AS sum_priority
INTO #duplicate_spells_indicated
FROM final_table;

GO

PRINT('#duplicate_spells_indicated TABLE generated');

/*

	BEGIN check

		select count(*) from #duplicate_spells_indicated	--  full run	
		select count(*) from #duplicate_spells_indicated where rn <>1 --  full run
		select * from #duplicate_spells_indicated order by snz_uid

	END check

*/

/*
	:::::::::::::::::::::::::::::::::::::::::::::::::::
	2.1.2. Build table with full duplicates deduplicated 
	:::::::::::::::::::::::::::::::::::::::::::::::::::
	doesn't matter how, this step is only required to find matching start dates spells
	full duplicate spell dates are still present in the table, just one copy of them

*/

DROP TABLE IF EXISTS #no_duplicate_spells
;WITH cte AS 
(
	SELECT
	*
	, ROW_NUMBER() OVER(PARTITION BY snz_uid, startdate, enddate ORDER BY schoolnbr DESC) AS rn_tmp
	FROM #student_enrol_all	
)
SELECT
*
INTO #no_duplicate_spells 
FROM cte
WHERE rn_tmp = 1; /* takes highest school number to break ties (BAD PRACTICE in general, in this case no big deal) */

GO

PRINT('#no_duplicate_spells generated'); 

/*

	BEGIN check

		select count(*) from #student_enrol_all -- x
		select count(*) from #duplicate_spells_indicated where rn <> 1 -- y removed spells
		select count(*) from #no_duplicate_spells -- z remaining spells

		and x = z + y

	END check

*/

/*
	:::::::::::::::::::::::::::::::::::::::::::::::::::
	2.1.3. Spells with matching startdate 
	:::::::::::::::::::::::::::::::::::::::::::::::::::	
	- carry out the same exercise as above but this time for matching startdates only
*/

DROP TABLE IF EXISTS #duplicate_startdate_indicated
;WITH cte AS /* starting from de-duplicated start + end dates, count matching startdates spells */
(
	SELECT 
	snz_uid
	, startdate	
	, COUNT(*) AS n_rows
	/* , count(distinct schoolnbr) as n_schools */
	FROM #no_duplicate_spells
	GROUP BY snz_uid, startdate
	HAVING COUNT(*) > 1		
)
, duplicate_spells AS /* pick out the limits of the overlap date interval and the "overhung" enddate */
(	

	SELECT 
	b.*
	, MIN(b.enddate) OVER(PARTITION BY b.snz_uid, b.startdate) AS overlap_enddate
	, MAX(b.enddate) OVER(PARTITION BY b.snz_uid, b.startdate) AS latest_enddate
	, DATEDIFF(DAY, b.startdate, b.enddate) AS spell_length	
	FROM cte a
	JOIN #student_enrol_all b
	ON a.snz_uid = b.snz_uid AND a.startdate = b.startdate
)
, attendance_in_spell AS /* pick up the attendance recorsd for these spells */
(
	SELECT
	a.*
	, b.moe_ssa_attendance_date AS overlap_attendance_date
	, CASE WHEN b.moe_ssa_provider_code IS NULL THEN NULL	
		WHEN b.moe_ssa_provider_code IS NOT NULL AND (b.moe_ssa_attendance_date BETWEEN a.startdate AND a.overlap_enddate) THEN 1
		WHEN b.moe_ssa_provider_code IS NOT NULL AND NOT (b.moe_ssa_attendance_date BETWEEN a.startdate AND a.overlap_enddate) THEN 0
		ELSE -99 END AS attendance_checker
	FROM duplicate_spells a
	LEFT JOIN $(idicleanversion).[moe_clean].school_student_attendance b
	ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code  
)
, attendance_summary_in_spell AS /* summarise the attendance records */
(
	SELECT 
	snz_uid
	, schoolnbr
	, startdate	
	, overlap_enddate
	, latest_enddate 	
	, spell_length		
	, MIN(overlap_attendance_date) AS first_attendance
	, MAX(overlap_attendance_date) AS last_attendance	
	, SUM(attendance_checker) AS n_days_in_overlap
	, DATEDIFF(DAY, startdate, overlap_enddate) AS overlap_length
	/*, att_footprint_year*/
	/* , case when sum(attendance_checker) is null then 1 else 0 end as null_checker */
	/*, ROW_NUMBER() over(partition by snz_uid order by count(distinct overlap_attendance_date) desc) as rn -- give a ranking to the spells based on numbe of attendance days*/
	/* give a ranking to the spells based on numbe of attendance days (or spell length if tied) */
	, ROW_NUMBER() OVER(PARTITION BY snz_uid, overlap_enddate, startdate ORDER BY SUM(attendance_checker) DESC, spell_length DESC) AS rn
	FROM attendance_in_spell
	GROUP BY snz_uid
	, schoolnbr
	, startdate
	, overlap_enddate
	, latest_enddate
	, spell_length
	/*, att_footprint_year -- these are common across the spells with same startdate except school*/	
)
, final_table AS
(
	SELECT 
	*
	/*, min(att_footprint_year) over(partition by snz_uid, startdate) as min_att_footprint*/
	, CASE WHEN rn <> 1 
		/*and min(att_footprint_year) over(partition by snz_uid, startdate) > 0		*/
		THEN 0 ELSE 1 END AS unique_school_flag
	, CASE WHEN rn = 1 THEN 'matching_startdate' ELSE 'matching_startdate' END AS duplicate_reason	
	FROM attendance_summary_in_spell
)
SELECT 
*
, SUM(unique_school_flag) OVER(PARTITION BY snz_uid, startdate) AS sum_priority
INTO #duplicate_startdate_indicated
FROM final_table;

GO

PRINT('#duplicate_startdate_indicated TABLE generated'); 

/*

	BEGIN check

		select * from #duplicate_startdate_indicated order by snz_uid, rn

	END check

*/

/*
	:::::::::::::::::::::::::::::::::::::::::::::::::::
	2.1.4. Spells with matching enddate
	:::::::::::::::::::::::::::::::::::::::::::::::::::			
	- finally break the ties for the remaining matching enddate ones
	- starting from the no duplicate spells table so that we dont pick up the full duplicates

	ISSUE: when there are duplicate startdates, and duplicate edndates for a given student and these pairs interact!
	TODO: fix this

*/

DROP TABLE IF EXISTS #duplicate_enddate_indicated
;WITH cte AS
(
	SELECT 
	snz_uid
	, enddate	
	, COUNT(*) AS n_rows
	, COUNT(DISTINCT schoolnbr) AS n_schools
	FROM #no_duplicate_spells
	GROUP BY snz_uid, enddate
	HAVING COUNT(*) > 1 
)
, duplicate_spells AS
(
	SELECT 
	b.*
	, MAX(b.startdate) OVER(PARTITION BY b.snz_uid, b.enddate) AS overlap_startdate /* same enddate, need to find the limit of the overlap */
	, MIN(b.startdate) OVER(PARTITION BY b.snz_uid, b.enddate) AS earliest_startdate /* the earliest startdate (incase there is an overlap and we need to split a spell to fill an enrolment gap) */
	, DATEDIFF(DAY, b.startdate, b.enddate) AS spell_length	
	FROM cte a
	JOIN #student_enrol_all b
	ON a.snz_uid = b.snz_uid AND a.enddate = b.enddate 
)
, attendance_in_spell AS
(
	SELECT
	a.*
	, b.moe_ssa_attendance_date AS overlap_attendance_date
	, CASE WHEN b.moe_ssa_provider_code IS NULL THEN NULL	
		WHEN b.moe_ssa_provider_code IS NOT NULL AND (b.moe_ssa_attendance_date BETWEEN a.overlap_startdate AND a.enddate) THEN 1
		WHEN b.moe_ssa_provider_code IS NOT NULL AND NOT (b.moe_ssa_attendance_date BETWEEN a.overlap_startdate AND a.enddate) THEN 0
		ELSE -99 END AS attendance_checker	
	FROM duplicate_spells a
	LEFT JOIN $(idicleanversion).[moe_clean].school_student_attendance b
	ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code
)
, attendance_summary_in_spell AS
(
	SELECT 
	snz_uid
	, schoolnbr	
	, overlap_startdate
	, earliest_startdate
	, enddate
	, spell_length	
	, MIN(overlap_attendance_date) AS first_attendance
	, MAX(overlap_attendance_date) AS last_attendance	
	, SUM(attendance_checker) AS n_days_in_overlap
	/* , ROW_NUMBER() over(partition by snz_uid order by count(distinct overlap_attendance_date) desc) as rn */
	, ROW_NUMBER() OVER(PARTITION BY snz_uid, overlap_startdate, enddate ORDER BY SUM(attendance_checker) DESC, spell_length DESC) AS rn	
	FROM attendance_in_spell
	GROUP BY snz_uid, schoolnbr, enddate, overlap_startdate, earliest_startdate, spell_length	
)
, final_table AS
(
	SELECT 
	*		
	, CASE WHEN rn <> 1		
		THEN 0 ELSE 1 END AS unique_school_flag
	, CASE WHEN rn = 1 THEN 'matching_enddate' ELSE 'matching_enddate' END AS duplicate_reason /* relic from when we named the not-prioritised one differently */
FROM attendance_summary_in_spell
)
SELECT *
, SUM(unique_school_flag) OVER(PARTITION BY snz_uid, enddate) AS sum_priority
INTO #duplicate_enddate_indicated
FROM final_table;

GO
 PRINT('#duplicate_enddate_indicated TABLE generated'); 

/*

	:::::::::::::::::::::::::::::::::::::::::::::::
	2.1.5. Bring these together in one table
	:::::::::::::::::::::::::::::::::::::::::::::::
	the filtering to prioritised spells is not carried out yet at this stage

*/

DROP TABLE IF EXISTS #student_enrol_all_indicated;
SELECT a.*
------------------------------------------------------
, CASE WHEN c.latest_enddate = a.enddate
AND c.unique_school_flag = 0 THEN c.overlap_enddate WHEN d.earliest_startdate = a.startdate
AND d.unique_school_flag = 0 THEN d.earliest_startdate ELSE NULL END AS overhang_spell_startdate
------------------------------------------------------
, CASE WHEN c.latest_enddate = a.enddate
AND c.unique_school_flag = 0 THEN c.latest_enddate WHEN d.earliest_startdate = a.startdate
AND d.unique_school_flag = 0 THEN d.overlap_startdate ELSE NULL END AS overhang_spell_enddate
------------------------------------------------------
, CASE WHEN c.latest_enddate = a.enddate
AND c.unique_school_flag = 0 THEN 'enddate_overhang' WHEN d.earliest_startdate = a.startdate
AND d.unique_school_flag = 0 THEN 'startdate_overhang' ELSE NULL END AS overhang_date_type
------------------------------------------------------
, CASE WHEN b.snz_uid IS NOT NULL THEN b.unique_school_flag 
	WHEN c.snz_uid IS NOT NULL THEN c.unique_school_flag
	WHEN d.snz_uid IS NOT NULL THEN d.unique_school_flag
	ELSE 1 END AS unique_school_flag
------------------------------------------------------
, CASE WHEN b.snz_uid IS NOT NULL THEN b.duplicate_reason 
	WHEN c.snz_uid IS NOT NULL THEN c.duplicate_reason 
	WHEN d.snz_uid IS NOT NULL THEN d.duplicate_reason	
	END AS duplicate_reason
------------------------------------------------------
, CASE WHEN b.snz_uid IS NOT NULL THEN b.n_days_in_overlap 
	WHEN c.snz_uid IS NOT NULL THEN c.n_days_in_overlap 
	WHEN d.snz_uid IS NOT NULL THEN d.n_days_in_overlap 
	END AS attendance_days_in_overlap
	------------------------------------------------------
, COALESCE(b.sum_priority, c.sum_priority, d.sum_priority) AS sum_priority 
------------------------------------------------------
INTO #student_enrol_all_indicated
FROM #student_enrol_all a
LEFT JOIN #duplicate_spells_indicated b
ON b.snz_uid = a.snz_uid
AND b.schoolnbr = a.schoolnbr
AND b.startdate = a.startdate
AND b.enddate = a.enddate
LEFT JOIN #duplicate_startdate_indicated c
ON c.snz_uid = a.snz_uid
AND c.schoolnbr = a.schoolnbr
AND c.startdate = a.startdate
LEFT JOIN #duplicate_enddate_indicated d
ON d.snz_uid = a.snz_uid
AND d.schoolnbr = a.schoolnbr
AND d.enddate = a.enddate; 

GO
 PRINT('#student_enrol_all_indicated TABLE generated'); 

/*

	BEGIN testing

		select count(*) from #student_enrol_all_indicated 
		select count(*) from #no_duplicate_spells 
		select count(*) from #student_enrol_all 

		select * from #student_enrol_all_indicated order by snz_uid, startdate, enddate

		select * from #student_enrol_all_indicated
		where snz_uid in (select distinct snz_uid  from #student_enrol_all_indicated where duplicate_reason is not null)
		order by snz_uid, startdate, enddate 

	END testing

*/

/*

	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.1.6. Filter non-prioritised spells to make a clean dataset of spells
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	- In this step we also split off and append bits of non-prioritised spells that overlap with the prioritised biy so as not to induce non-enrolment spell later on below

*/

DROP TABLE IF EXISTS #big_clean_spells;
SELECT snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, sum_priority
, att_footprint_year
, special_flag
, CASE WHEN duplicate_reason IS NOT NULL AND (sum_priority IS NULL OR sum_priority = 1) THEN CONCAT(duplicate_reason, '_spell_removed')
	WHEN duplicate_reason IS NOT NULL AND sum_priority = 2 THEN 'overlapping_priority'
	WHEN duplicate_reason IS NULL THEN 'no_other_spell' 	
	ELSE 'error' END AS unique_spell_reason
INTO #big_clean_spells
FROM #student_enrol_all_indicated
WHERE unique_school_flag = 1 /* bring all the unique ones */

UNION ALL 

SELECT snz_uid
, overhang_spell_startdate AS startdate
, overhang_spell_enddate AS enddate
, CASE WHEN overhang_date_type = 'startdate_overhang' THEN start_level
	ELSE NULL END AS start_level
, CASE WHEN overhang_date_type = 'enddate_overhang' THEN moe_esi_leave_rsn_code
	ELSE NULL END AS moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, sum_priority
, att_footprint_year
, special_flag
, overhang_date_type AS unique_spell_reason
FROM  #student_enrol_all_indicated
JOIN #global g ON 1=1
WHERE overhang_date_type IN ('startdate_overhang', 'enddate_overhang')
AND unique_school_flag = 0
AND DATEDIFF(DAY, startdate, enddate) > g.short;

GO

PRINT('#big_clean_spells TABLE generated'); 

/*

	BEGIN testing

		generic scan

		select * from #big_clean_spells where snz_uid in (select distinct snz_uid  from #big_clean_spells where sum_priority is not null) order by snz_uid, startdate, enddate
		select distinct unique_spell_reason from #big_clean_spells
		select count(*) from #big_clean_spells 

		single student

		select * from #big_clean_spells where snz_uid = x order by startdate, enddate
		select * from #student_enrol_all_indicated where snz_uid = x order by startdate, enddate
		select * from #duplicate_enddate_indicated where snz_uid = x

	END testing

*/

/*
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.2. Breaking overlapped spells that have no matching date
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	Above, the spells with at least of matching date have been dealth with 
	This step now deals with the non-matching date overlapping spells

	There are two general cases to consider
		- inner spells: if an overlapped spell is entirely contained within another spell it is called inner
		- generic overlaps: any remaining overlap which is not inner		

	This section deals only with the first case, the second case will be the subject of step 2.3.

*/

/*	
	:::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.2.1 Immediate neighbour inner spells - first pass
	:::::::::::::::::::::::::::::::::::::::::::::::::::::
	When ordered by startdate, one can easily identify an inner spell if the encompassing 'outer' spell is in the immediately previous row
	This next section deals with inner spells that are of this adjacent row kind

	There are two complicated types of inner spell that can persist even after the completion of this step:
		(i) It can happen that a spell is inner but that its corresponding outer spell is not an immediate nearest neighbour
		(ii) After prioritising an inner through attendance data it can happen that new inner spells are created from the clipped ends

	These two possibilities (i)-(ii) are dealth with by passing the entire dataset through a second inner-spell prioritisation step after the first, 
	in conjunction with a global filtering of the data of students that have a very deep stack of contemporaneous spells (done above in step 1.3 in
	the generation of the #student_enrol_all_without_att_check table via the #multiple_lags table)

	Input table check:
	select top 10 * from #big_clean_spells order by snz_uid, startdate, enddate

*/

DROP TABLE IF EXISTS #inner_spells_indicated
;WITH no_duplicates AS /* start with big clean spells table which has already been prioritised for all the matching dates ones */
(
	SELECT 
	snz_uid
	, startdate
	, enddate
	, schoolnbr
	, start_level
	, sch_enddate_imputed		
	, unique_spell_reason	
	FROM #big_clean_spells 
)
, overlaps_labelled AS /*  label the consecutive spells that have an overlap */
(
	SELECT 
	*	
	, CASE WHEN LEAD(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) < no_duplicates.enddate THEN 1 ELSE 0 END AS next_spell_inner	
	, CASE WHEN LAG(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) > no_duplicates.enddate THEN 1 ELSE 0 END AS this_spell_inner	
	FROM no_duplicates	
)
, only_overlaps AS /* filter to only those spells that are overlapped in the inner-outer relationship */
(
	SELECT 
	* 
	, LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_startdate
	, LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_enddate
	, LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_enddate
	, LAG(overlaps_labelled.schoolnbr) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_school
	, CASE WHEN next_spell_inner = 1 
				THEN LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
			WHEN this_spell_inner = 1 
				THEN startdate
				ELSE NULL END AS overlap_startdate
	, CASE WHEN next_spell_inner = 1 
				THEN LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
			WHEN this_spell_inner = 1 
				THEN enddate
				ELSE NULL END AS overlap_enddate
	FROM overlaps_labelled
	WHERE (this_spell_inner = 1 OR next_spell_inner = 1) 	
)
, only_overlaps_with_attendance AS /* join the attendance date to these overlapped spells, add flags when attendance lies in the various date ranges */
(
	SELECT 
	a.*
	, b.moe_ssa_attendance_date	
	, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
		WHEN moe_ssa_attendance_date IS NOT NULL AND 		
			moe_ssa_attendance_date BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 1 -- this general condition accounts for inner spells if next inner
		WHEN moe_ssa_attendance_date IS NOT NULL 
			AND moe_ssa_attendance_date NOT BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 0 -- this general condition accounts for inner spells if next inner
		END AS overlap_with_next_attendance_flag

	, CASE WHEN moe_ssa_attendance_date IS NULL 
			THEN NULL /*  if no attendance records at all for that student at that school then null */
		WHEN moe_ssa_attendance_date IS NOT NULL AND  /*  if attendance records do exist 				 */
			moe_ssa_attendance_date BETWEEN	startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) /* and attendenace record for that school and its in the right date range */
		/* last_enddate is null accounts for the spells that are 'next spell is inner' which have null last_enddate entries */
			THEN 1 /* then label as good */

		WHEN moe_ssa_attendance_date IS NOT NULL /* if attendance records do exist  */
			AND moe_ssa_attendance_date NOT BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate)  /* and attendance record is not in the right date range  */
		/* last_enddate is null accounts for the spells that are 'next spell is inner' which have null last_enddate entries */
			THEN 0 /* then label it as bad */
		END AS overlap_with_last_attendance_flag
	------------------------------------------
	, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
		WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date BETWEEN startdate AND enddate THEN 1
		WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date NOT BETWEEN startdate AND enddate THEN 0
		END AS this_spell_attendance_flag		
	------------------------------------------
	FROM only_overlaps a
	LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b
	ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code /*  need to join both on school and student	 */	
)
, summarised_overlaps_attendance AS /* summarise the overlapped attendance situation */
(
	SELECT 
	snz_uid
	, startdate
	, enddate
	, schoolnbr
	, last_school	
	, unique_spell_reason		
	, this_spell_inner
	, next_spell_inner
	, last_enddate
	, next_startdate
	, overlap_startdate
	, overlap_enddate
	, SUM(overlap_with_last_attendance_flag) AS n_rows_in_overlap_with_last_raw	
	, SUM(overlap_with_next_attendance_flag) AS n_rows_in_overleap_with_next_raw
	, SUM(this_spell_attendance_flag) AS n_rows_in_range
	FROM only_overlaps_with_attendance
	GROUP BY snz_uid
	, startdate
	, enddate
	, schoolnbr
	, last_school		
	, unique_spell_reason
	, this_spell_inner
	, next_spell_inner-- 
	, next_startdate
	, last_enddate
	, overlap_startdate
	, overlap_enddate		
)
, summarised_labelled_overlaps AS /* count chain of overlapped spells */
(
	SELECT 
	a.*
	, COUNT(*) OVER(PARTITION BY snz_uid) AS paired_spells
	, CASE WHEN next_spell_inner = 1			
			AND n_rows_in_overleap_with_next_raw IS NULL
		THEN 0 	/* replace null with zero when the school legit does have attendance records somewherebut just not in range */
		ELSE n_rows_in_overleap_with_next_raw
		END AS n_rows_in_overleap_with_next
	,CASE WHEN this_spell_inner = 1		
		AND n_rows_in_overlap_with_last_raw IS NULL
		THEN 0 /* replace null with zero when the school legit does have attendance records somewhere */
		ELSE n_rows_in_overlap_with_last_raw
		END AS n_rows_in_overlap_with_last	
	FROM summarised_overlaps_attendance a			
)
, pairs_indicated AS /*  create the new spells on each end if they need to be made */
(
	SELECT 
	snz_uid
	, startdate
	, enddate
	, schoolnbr
	, last_school	
	, this_spell_inner
	, next_spell_inner
	, last_enddate
	, next_startdate
	, n_rows_in_overlap_with_last
	, n_rows_in_overleap_with_next
	, overlap_startdate
	, overlap_enddate	
	, unique_spell_reason
	, paired_spells		
	, startdate AS first_new_startdate /*  first indicate the start of the whole thing */	
	/* clip off the start of the outer (longer) spell if the attendance comparison prioritises the inner */
	, CASE WHEN next_spell_inner = 1 /* if there is an overlap with the next spell	*/
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) /* and its not prioritised by attendenace  */			
			THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the enddate to the next startdate */
			ELSE enddate END AS first_new_enddate	 /* else leave the enddate untouched */
	/*----------------------------------------------*/
	/* while also clip the end of the outer spell off */
	, CASE WHEN next_spell_inner = 1 /* if there is an overlap with the next spell				 */
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) /* and its not prioritised by attendenace  */
			/* if either are globally null then no comparison can be made and we leave them untouched */
			THEN LEAD(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the enddate to the next startdate */
			ELSE NULL END AS second_new_startdate	 /* else leave the second one NULL */
	/*----------------------------------------------*/
	/*  and clipping the end of the spell off */
	, CASE WHEN next_spell_inner = 1 /* if there is an overlap with the next spell				 */
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) /* and its not prioritised by attendenace */ 
			/*  if either are null then no comparison can be made and we leave them untouched */
			THEN enddate /* then clip the enddate to the next startdate */
			ELSE NULL END AS second_new_enddate	 /*  else leave the second one NULL */
	-----------------------------------------------
	/* flag to indicate priority */
	, CASE WHEN next_spell_inner = 1				
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) 			
			THEN 0 /* then label it as having received a clipping */
			WHEN next_spell_inner = 1 
			AND ((n_rows_in_overleap_with_next >= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and it IS not prioritised by attendenace  */
				 OR LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate) IS NULL) /*  deprioritise inner spells even if they have global null attendance (NEW CONDITION) */
			THEN 1 /* label is as prioritised */
			ELSE NULL END AS priotity_spell	 
	/*----------------------------------------------*/
	/* flag to indicate priotity */	
	, CASE WHEN this_spell_inner = 1
			AND (n_rows_in_overlap_with_last <= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate) /* de-prioritize for tied attendance conditions */
					OR n_rows_in_overlap_with_last IS NULL ) /* and its not prioritised by attendenace or its got null attendance						 */
			THEN 0 /* then deprioritise it */
			WHEN this_spell_inner = 1
			AND last_school = schoolnbr /* deprioritize if its the same school! */
			THEN 0
			WHEN this_spell_inner = 1 AND last_school <> schoolnbr /* if there is an overlap with the previous spell */
			AND (n_rows_in_overlap_with_last >= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and it has more attendance records */
			THEN 1 
			ELSE NULL END AS priority_spell_inner	
	/*----------------------------------------------*/
	FROM summarised_labelled_overlaps
)
, split_pairs AS
(
	SELECT
	snz_uid
	, startdate AS old_startdate
	, enddate AS old_enddate
	, schoolnbr	
	, this_spell_inner
	, next_spell_inner	
	, n_rows_in_overlap_with_last
	, n_rows_in_overleap_with_next
	, overlap_startdate
	, overlap_enddate
	, first_new_startdate AS startdate
	, first_new_enddate AS enddate	
	,unique_spell_reason
	, 1 AS union_number
	, COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
	FROM pairs_indicated	

	UNION ALL 

	SELECT
	snz_uid
	, startdate AS old_startdate
	, enddate AS old_enddate
	, schoolnbr
	, this_spell_inner
	, next_spell_inner	
	, n_rows_in_overlap_with_last
	, n_rows_in_overleap_with_next
	, overlap_startdate
	, overlap_enddate
	, second_new_startdate AS startdate
	, second_new_enddate AS enddate	
	, unique_spell_reason
	, 2 AS union_number
	, COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
	FROM pairs_indicated
	WHERE second_new_startdate IS NOT NULL
)
SELECT 
*
, SUM(priority_spell) OVER(PARTITION BY snz_uid, overlap_startdate, overlap_enddate) AS sum_priority
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 'enddate_snipped_outer_spell' WHEN this_spell_inner = 0
AND priority_spell = 0
AND enddate = old_enddate THEN 'startdate_snipped_outer_spell' WHEN this_spell_inner = 0
AND priority_spell = 1 THEN 'priority_outer_spell' WHEN this_spell_inner = 1
AND priority_spell = 0 THEN 'not_priority_inner_spell' WHEN this_spell_inner = 1
AND priority_spell = 1 THEN 'priority_inner_spell' WHEN priority_spell IS NULL THEN 'no_comparison_possible' ELSE NULL END AS inner_priority_reason
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 1 
	WHEN this_spell_inner = 0 AND priority_spell = 0 AND enddate = old_enddate THEN 1 WHEN this_spell_inner = 0
AND priority_spell = 1 THEN 1 WHEN this_spell_inner = 1
AND priority_spell = 0 THEN 0 WHEN this_spell_inner = 1
AND priority_spell = 1 THEN 1 WHEN priority_spell IS NULL THEN NULL ELSE NULL END AS inner_priority
INTO #inner_spells_indicated
FROM split_pairs a;

GO
 PRINT('#inner_spells_indicated TABLE generated');

/*

	BEGIN testing

		select count(*) from #big_clean_spells  
		select count(*) from #inner_spells_indicated 

		select distinct inner_priority_reason from #inner_spells_indicated
		select distinct unique_spell_reason from #inner_spells_indicated

		select top 10 * from #big_clean_spells
		select * from #inner_spells_indicated order by snz_uid, startdate, enddate		

		select * from #inner_spells_indicated join #global g on 1=1 where datediff(day, startdate, enddate) <= g.short 

		select union_number , count(*) from #inner_spells_indicated group by union_number

	END testing

*/

/*
	:::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.2.2 Add back the first pass of prioritised inner spells 
	:::::::::::::::::::::::::::::::::::::::::::::::::::::
	Now add these prioritised or split spells back onto the full dataset

*/

DROP TABLE IF EXISTS #inner_spells_added_back ;
WITH cte AS /* first attach the inner and outer spells back onto big clean spells  */
(
SELECT a.*
    , b.this_spell_inner
    , b.next_spell_inner
    , b.n_rows_in_overleap_with_next
    , b.n_rows_in_overlap_with_last
    , b.startdate AS new_startdate
    , b.enddate AS new_enddate
    , b.old_startdate
    , b.old_enddate
    , inner_priority
    , inner_priority_reason
FROM #big_clean_spells a
LEFT JOIN #inner_spells_indicated b
ON a.snz_uid = b.snz_uid
AND a.startdate = b.old_startdate
AND a.enddate = b.old_enddate
AND a.schoolnbr = b.schoolnbr
)
, inner_spells_added_back AS /* change the dates to match the snipped dates, if snipped */
(
SELECT snz_uid
    , CASE WHEN new_startdate IS NULL THEN startdate
		ELSE new_startdate
		END AS startdate
    , CASE WHEN new_enddate IS NULL THEN enddate /* leave date unchanged if its not snipped */
		ELSE new_enddate END AS enddate
    , start_level
    , moe_esi_leave_rsn_code
    , schoolnbr
    , sch_enddate_imputed
    , unique_spell_reason
    , special_flag
    , CASE WHEN this_spell_inner IS NULL THEN unique_spell_reason
		WHEN this_spell_inner IS NOT NULL AND sum_priority = 2 THEN 'overlapping_priority' /* this should not happen anymore as prioritisation should always occur now */
		ELSE inner_priority_reason
		END AS unique_spell_reason_with_inner
    , inner_priority
FROM cte
)
SELECT *
INTO #inner_spells_added_back
FROM inner_spells_added_back
JOIN #global g
ON 1 = 1
WHERE inner_priority = 1
OR inner_priority IS NULL /* filter off the de-prioritised inner spells: the deprioritised outer spells are snipped */
AND DATEDIFF(DAY, startdate, enddate) > g.short; /* remove short snipped bits */

GO
 PRINT('#inner_spells_added_back TABLE generated');

/*
	:::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.2.3 Immediate neighbour inner spells - second pass
	:::::::::::::::::::::::::::::::::::::::::::::::::::::
	Steps 2.2.3 and 2.2.4 are intended to be identical to 2.2.1 and 2.2.2, but starting from #inner_spells_added_back instead of #big_clean_spells
	In this way we are capturing any new nearest neighbour inner spells that exist due to eiher of the reasons (i)-(ii) discussed at the start of step 2.2.1	

*/

DROP TABLE IF EXISTS #inner_spells_indicated_again ;
WITH no_duplicates AS(
SELECT snz_uid
    , startdate
    , enddate
    , schoolnbr
    , sch_enddate_imputed
    , unique_spell_reason
    , unique_spell_reason_with_inner
FROM #inner_spells_added_back
)
, overlaps_labelled AS(
SELECT *
    , CASE WHEN LEAD(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) < no_duplicates.enddate THEN 1 ELSE 0 END AS next_spell_inner
    , CASE WHEN LAG(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) > no_duplicates.enddate THEN 1 ELSE 0 END AS this_spell_inner
FROM no_duplicates
)
, only_overlaps AS(
SELECT *
    , LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_startdate
    , LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_enddate
    , LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_enddate
    , LAG(overlaps_labelled.schoolnbr) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_school
    , CASE WHEN next_spell_inner = 1 
				THEN LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
			WHEN this_spell_inner = 1 
				THEN startdate
				ELSE NULL END AS overlap_startdate
    , CASE WHEN next_spell_inner = 1 
				THEN LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
			WHEN this_spell_inner = 1 
				THEN enddate ELSE NULL END AS overlap_enddate
FROM overlaps_labelled
WHERE(
    this_spell_inner = 1
    OR next_spell_inner = 1
)
)
, only_overlaps_with_attendance AS(
SELECT a.*
    , b.moe_ssa_attendance_date
    , CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 1 WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 0 END AS overlap_with_next_attendance_flag
    , CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date
BETWEEN startdate
AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) THEN 1 WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) THEN 0 END AS overlap_with_last_attendance_flag
    , CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date BETWEEN startdate AND enddate THEN 1 WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN startdate AND enddate THEN 0 END AS this_spell_attendance_flag
FROM only_overlaps a
LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b
ON a.snz_uid = b.snz_uid
AND a.schoolnbr = b.moe_ssa_provider_code
)
, summarised_overlaps_attendance AS(
SELECT snz_uid
    , startdate
    , enddate
    , schoolnbr
    , last_school
    , unique_spell_reason
    , unique_spell_reason_with_inner
    , this_spell_inner
    , next_spell_inner
    , last_enddate
    , next_startdate
    , overlap_startdate
    , overlap_enddate
    , SUM(overlap_with_last_attendance_flag) AS n_rows_in_overlap_with_last_raw
    , SUM(overlap_with_next_attendance_flag) AS n_rows_in_overleap_with_next_raw
    , SUM(this_spell_attendance_flag) AS n_rows_in_range
FROM only_overlaps_with_attendance
GROUP BY snz_uid
    , startdate
    , enddate
    , schoolnbr
    , last_school
    , unique_spell_reason
    , unique_spell_reason_with_inner
    , next_startdate
    , last_enddate
    , overlap_startdate
    , overlap_enddate
    , this_spell_inner
    , next_spell_inner
)
, summarised_labelled_overlaps AS(
SELECT a.*
    , COUNT(*) OVER(PARTITION BY snz_uid) AS paired_spells
    , CASE WHEN next_spell_inner = 1			
			AND n_rows_in_overleap_with_next_raw IS NULL
		THEN 0 	
		ELSE n_rows_in_overleap_with_next_raw
		END AS n_rows_in_overleap_with_next
    , CASE WHEN this_spell_inner = 1		
		AND n_rows_in_overlap_with_last_raw IS NULL
		THEN 0 
		ELSE n_rows_in_overlap_with_last_raw
		END AS n_rows_in_overlap_with_last
FROM summarised_overlaps_attendance a
)
, pairs_indicated AS(
SELECT snz_uid
    , startdate
    , enddate
    , schoolnbr
    , last_school
    , this_spell_inner
    , next_spell_inner
    , last_enddate
    , next_startdate
    , n_rows_in_overlap_with_last
    , n_rows_in_overleap_with_next
    , overlap_startdate
    , overlap_enddate
    , unique_spell_reason
    , unique_spell_reason_with_inner
    , startdate AS first_new_startdate
    , CASE WHEN next_spell_inner = 1
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) 			
			THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate) 
			ELSE enddate END AS first_new_enddate
    , CASE WHEN next_spell_inner = 1 
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) 			
			THEN LEAD(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate) 
			ELSE NULL END AS second_new_startdate
    , CASE WHEN next_spell_inner = 1 
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) 			
			THEN enddate ELSE NULL END AS second_new_enddate
    , CASE WHEN next_spell_inner = 1
			AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) 			
			THEN 0 
			WHEN next_spell_inner = 1 
			AND ((n_rows_in_overleap_with_next >= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate))
				 OR LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate) IS NULL) 
			THEN 1 
			ELSE NULL END AS priotity_spell
    , CASE WHEN this_spell_inner = 1
			AND (n_rows_in_overlap_with_last <= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)
					OR n_rows_in_overlap_with_last IS NULL ) 
			THEN 0 
			WHEN this_spell_inner = 1
			AND last_school = schoolnbr
			THEN 0
			WHEN this_spell_inner = 1 AND last_school <> schoolnbr 
			AND (n_rows_in_overlap_with_last >= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)) 
			THEN 1 
			ELSE NULL END AS priority_spell_inner
FROM summarised_labelled_overlaps
)
, split_pairs AS(
SELECT snz_uid
    , startdate AS old_startdate
    , enddate AS old_enddate
    , schoolnbr
    , this_spell_inner
    , next_spell_inner
    , n_rows_in_overlap_with_last
    , n_rows_in_overleap_with_next
    , overlap_startdate
    , overlap_enddate
    , first_new_startdate AS startdate
    , first_new_enddate AS enddate
    , unique_spell_reason
    , unique_spell_reason_with_inner
    , COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
FROM pairs_indicated 

UNION ALL 

SELECT snz_uid
    , startdate AS old_startdate
    , enddate AS old_enddate
    , schoolnbr
    , this_spell_inner
    , next_spell_inner
    , n_rows_in_overlap_with_last
    , n_rows_in_overleap_with_next
    , overlap_startdate
    , overlap_enddate
    , second_new_startdate AS startdate
    , second_new_enddate AS enddate
    , unique_spell_reason
    , unique_spell_reason_with_inner
    , COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
FROM pairs_indicated
WHERE second_new_startdate IS NOT NULL
)
SELECT *
, SUM(priority_spell) OVER(PARTITION BY snz_uid, overlap_startdate, overlap_enddate) AS sum_priority
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 'enddate_snipped_outer_spell' 
	WHEN this_spell_inner = 0 AND priority_spell = 0 AND enddate = old_enddate THEN 'startdate_snipped_outer_spell' 
	WHEN this_spell_inner = 0 AND priority_spell = 1 THEN 'priority_outer_spell'
	WHEN this_spell_inner = 1 AND priority_spell = 0 THEN 'not_priority_inner_spell'
	WHEN this_spell_inner = 1 AND priority_spell = 1 THEN 'priority_inner_spell'
	WHEN priority_spell IS NULL THEN 'no_comparison_possible'
	ELSE NULL END AS inner_priority_reason_again
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 1
	WHEN this_spell_inner = 0 AND priority_spell = 0 AND enddate = old_enddate THEN 1
	WHEN this_spell_inner = 0 AND priority_spell = 1 THEN 1
	WHEN this_spell_inner = 1 AND priority_spell = 0 THEN 0
	WHEN this_spell_inner = 1 AND priority_spell = 1 THEN 1
	WHEN priority_spell IS NULL THEN NULL
	ELSE NULL END AS inner_priority_again
INTO #inner_spells_indicated_again
FROM split_pairs;

GO

PRINT('#inner_spells_indicated_again TABLE generated');

/*
	::::::::::::::::::::::::::::::::::::::::::
	2.2.4 Add the inner spells back agaig to the main dataset
	::::::::::::::::::::::::::::::::::::::::::
	This step is equivalent to 2.2.2 above, but this is the second pass
*/

DROP TABLE IF EXISTS #inner_spells_added_back_again
;WITH cte AS 
(
	SELECT
	a.*
	, this_spell_inner
	, next_spell_inner
	, n_rows_in_overleap_with_next
	, n_rows_in_overlap_with_last
	, b.startdate  AS new_startdate
	, b.enddate AS new_enddate
	, b.old_startdate
	, b.old_enddate
	, inner_priority_again	
	, inner_priority_reason_again
	FROM #inner_spells_added_back a
	LEFT JOIN #inner_spells_indicated_again b
	ON a.snz_uid = b.snz_uid 
	AND a.startdate = b.old_startdate AND a.enddate = b.old_enddate
	AND a.schoolnbr = b.schoolnbr
)
, inner_spells_added_back_again AS 
(
	SELECT 
	snz_uid	
	,CASE WHEN new_startdate IS NULL THEN startdate
		ELSE new_startdate
		END AS startdate	
	, CASE WHEN new_enddate IS NULL THEN enddate
		ELSE new_enddate
		END AS enddate
	, start_level
	, moe_esi_leave_rsn_code
	, schoolnbr
	, sch_enddate_imputed
	, unique_spell_reason
	, unique_spell_reason_with_inner
	, special_flag
	, CASE WHEN this_spell_inner IS NULL THEN unique_spell_reason_with_inner		
		ELSE inner_priority_reason_again
		END AS unique_spell_reason_with_inner_again		
	, inner_priority_again
	FROM cte		
)
SELECT a.* 
INTO #inner_spells_added_back_again
FROM inner_spells_added_back_again a
JOIN #global g ON 1=1
WHERE inner_priority_again IS NULL OR inner_priority_again = 1
AND DATEDIFF(DAY, startdate, enddate) > g.short;

GO

PRINT('#inner_spells_added_back_again TABLE generated');

/*

	BEGIN IMPORTANT TEST

		;with cte as
		(
			select * 
			, case when lead(enddate) over(partition by snz_uid order by startdate, enddate) < enddate then 1 else 0 end as next_spell_inner			
				, case when lag(enddate) over(partition by snz_uid order by startdate, enddate) > enddate then 1 else 0 end as this_spell_inner	
			from #inner_spells_added_back_again
		)		
		select 
		distinct snz_uid 
		from cte
		where snz_uid in (select distinct snz_uid from cte where this_spell_inner = 1) 
		order by snz_uid, startdate, enddate; 

	! NEEDS TO BE 0 FOR SCRIPT TO BE VALID ! 
	The idea is that we have not generated any new inner spells on this round of deduplicating. 
	If there are more inner spells present, we need to iterate one more time to deduplicate it

	END testing

*/

/*	
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.2.5 Create clean input table for the next step
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	Now clean up this mess, ready for input to the generic overlaps step

*/

DROP TABLE IF EXISTS #input_to_generic_overlaps_indicated;
SELECT 
snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, unique_spell_reason_with_inner_again AS unique_spell_reason
, special_flag
INTO #input_to_generic_overlaps_indicated
FROM #inner_spells_added_back_again; 

GO

PRINT('#input_to_generic_overlaps_indicated TABLE generated');

/*

	BEGIN check

		select count(*) from #input_to_generic_overlaps_indicated

		x has a level 2 deep overlap
		select * from #big_clean_spells
		where snz_uid = x order by startdate, enddate

		select * from #input_to_generic_overlaps_indicated
		where snz_uid = x order by startdate, enddate

		next school is the same school, even if there is a gap over summer, needs to be filled 

	END check

*/

/*
	::::::::::::::::::::::::::::::::::::::::::::
	2.3. de-deuplication of non-inner spells 
	::::::::::::::::::::::::::::::::::::::::::::
	This final prioritisation step! 
	This step is easier than the inenr spells bit
	The only possible complication would be if a chain of overlapped spells resulted in a negative length spell by snipping it at both ends. 
	Its possible that the code will naturally deal with that in any case

*/

DROP TABLE IF EXISTS #generic_overlaps_idicated
;WITH overlaps_labelled AS /* label the consecutive spells that have an overlap */
(
	SELECT 
	*
	, CASE WHEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) < enddate THEN 1 ELSE 0 END AS overlap_with_next_startdate	
	, CASE WHEN LAG(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 ELSE 0 END AS overlap_with_last_enddate		
	FROM #input_to_generic_overlaps_indicated		
)
, only_overlaps AS /* filter to only those spells that are overlapped */
(
	SELECT 
	* 
	, LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_startdate
	, LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_enddate
	, LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_enddate
	, CASE WHEN overlap_with_next_startdate = 1 
				THEN LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
			WHEN overlap_with_last_enddate = 1 
				THEN startdate
			ELSE NULL END AS overlap_startdate
	, CASE WHEN overlap_with_next_startdate = 1  
				THEN enddate
			WHEN overlap_with_last_enddate = 1 
				THEN LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
				ELSE NULL END AS overlap_enddate
	FROM overlaps_labelled
	WHERE (overlap_with_next_startdate = 1 OR overlap_with_last_enddate = 1)	
)
, only_overlaps_with_attendance AS /* join the attendance date to these overlapped spells, add flags when attendance lies in the various date ranges */
(
	SELECT 
	a.*
	, b.moe_ssa_attendance_date
	/*------------------------------------------*/
	, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL /* if no attendance records at all for that school thrn null */
		WHEN moe_ssa_attendance_date IS NOT NULL AND  /* if attendance records do exist */
			moe_ssa_attendance_date BETWEEN	startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) /* and attendenace record for that school and its in the right date range */
			THEN 1 /* then label as good */
		WHEN moe_ssa_attendance_date IS NOT NULL /* if attendance records do exist  */
			AND moe_ssa_attendance_date NOT BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate)  /* and attendance record is not in the right date range */
			THEN 0 /* then label it as bad */
		END AS overlap_with_last_attendance_flag
	/*------------------------------------------*/
	, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
		WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date BETWEEN startdate AND enddate THEN 1
		WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date NOT BETWEEN startdate AND enddate THEN 0
		END AS this_spell_attendance_flag	
	/*------------------------------------------*/
	, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
		WHEN moe_ssa_attendance_date IS NOT NULL AND 		
			moe_ssa_attendance_date BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 1
		WHEN moe_ssa_attendance_date IS NOT NULL 
			AND moe_ssa_attendance_date NOT BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 0
		END AS overlap_with_next_attendance_flag
	/*------------------------------------------*/
	FROM only_overlaps a
	LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b	
	ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code		
)
, summarised_overlaps_attendance AS /* summarise the overlapped attendance situation  */
(
	SELECT 
	snz_uid
	, startdate
	, enddate
	, schoolnbr	
	, unique_spell_reason
	, overlap_with_next_startdate
	, overlap_with_last_enddate		
	, next_startdate
	, last_enddate	
	, overlap_startdate
	, overlap_enddate
	, SUM(overlap_with_next_attendance_flag) AS n_rows_in_overleap_with_next
	, SUM(overlap_with_last_attendance_flag) AS n_rows_in_overlap_with_last	
	, SUM(this_spell_attendance_flag) AS n_rows_in_range
	FROM only_overlaps_with_attendance
	GROUP BY snz_uid
	, startdate
	, enddate
	, schoolnbr		
	, unique_spell_reason	
	, overlap_with_next_startdate
	, overlap_with_last_enddate
	, next_startdate
	, last_enddate
	, overlap_startdate
	, overlap_enddate		
)
, summarised_labelled_overlaps AS /* count chain of overlapped spells */
(
	SELECT 
	a.*
	, COUNT(*) OVER(PARTITION BY snz_uid) AS paired_spells
	FROM summarised_overlaps_attendance a
	LEFT JOIN #att_enrol_coverage b
	ON a.schoolnbr = b.schoolnbr AND YEAR(a.startdate) = b.start_year
)
, pairs_of_overlapped AS 
(
	SELECT
	a.snz_uid
	, a.startdate
	, a.enddate
	, a.schoolnbr
	, a.overlap_with_next_startdate
	, a.overlap_with_last_enddate	
	, next_startdate
	, last_enddate
	, CASE WHEN overlap_with_next_startdate = 1			
			AND n_rows_in_overleap_with_next IS NULL
		THEN 0 	/* replace null with zero when the school legit does have attendance records somewhere */
		ELSE n_rows_in_overleap_with_next
		END AS n_rows_in_overleap_with_next
	, CASE WHEN overlap_with_last_enddate = 1		
		AND n_rows_in_overlap_with_last IS NULL
		THEN 0 /* replace null with zero when the school legit does have attendance records somewhere */
		ELSE n_rows_in_overlap_with_last
		END AS n_rows_in_overlap_with_last	
	, n_rows_in_range
	, paired_spells
	, unique_spell_reason	
	, overlap_startdate
	, overlap_enddate	
	FROM summarised_labelled_overlaps a	
)
, pairs_indicated AS
(
	SELECT 
	*
	, CASE WHEN overlap_with_last_enddate = 1 /* if there is an overlap with the previous spell	*/
			AND (n_rows_in_overlap_with_last < LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and its not prioritised by attendenace*/
			THEN LAG(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the startdate to be the previous enddate */
			ELSE startdate END AS new_startdate 

	, CASE WHEN overlap_with_next_startdate = 1 /* if there is an overlap with the next spell	*/
			AND (n_rows_in_overleap_with_next <= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and its not prioritised by attendenace*/
			THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the enddate to the next startdate */
			ELSE enddate END AS new_enddate	

	, CASE WHEN overlap_with_next_startdate = 1 /* if there is an overlap with the next spell	*/
			AND (n_rows_in_overleap_with_next <= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and its not prioritised by attendenace */
			THEN 0
			WHEN overlap_with_next_startdate = 1 -- if there is an overlap with the next spell					
			AND (n_rows_in_overleap_with_next > LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and it IS prioritised*/
			THEN 1
			ELSE NULL END AS priority_spell_2	 

	, CASE WHEN overlap_with_last_enddate = 1 /* if there is an overlap with the next spell	*/
			AND (n_rows_in_overlap_with_last < LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)) 
			THEN 0 
			WHEN overlap_with_last_enddate = 1 
			AND (n_rows_in_overlap_with_last >= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)) 
			THEN 1
			ELSE NULL END AS priority_spell
	FROM pairs_of_overlapped	
)
, split_pairs AS
(
	SELECT
	snz_uid
	, startdate AS old_startdate
	, enddate AS old_enddate
	, schoolnbr
	, overlap_with_next_startdate
	, overlap_with_last_enddate		
	, n_rows_in_overleap_with_next
	, n_rows_in_overlap_with_last	
	, new_startdate AS startdate
	, new_enddate AS enddate
	, overlap_startdate
	, overlap_enddate
	, priority_spell AS priority_spell_raw
	, priority_spell_2 AS priority_spell_2_raw		
	, COALESCE(priority_spell, priority_spell_2) AS priority_spell
	FROM pairs_indicated		
)
SELECT 
*
, SUM(priority_spell) OVER(PARTITION BY snz_uid, overlap_startdate, overlap_enddate) AS sum_priority
,CASE WHEN overlap_with_last_enddate = 0 AND priority_spell = 0 THEN 'enddate_clipped_generic_overlap' 
	WHEN overlap_with_last_enddate = 0 AND priority_spell = 1 THEN 'priority_generic overlap'
	WHEN overlap_with_last_enddate = 1 AND priority_spell = 0 THEN 'startdate_clipped_generic_overlap'
	WHEN overlap_with_last_enddate = 1 AND priority_spell = 1 THEN 'priority_generic overlap'
	WHEN priority_spell IS NULL THEN 'no_comparison_possible'
	ELSE NULL END AS priority_reason
INTO #generic_overlaps_idicated
FROM split_pairs;

GO

PRINT('#generic_overlaps_idicated TABLE generated');

/*
	:::::::::::::::::::::::::::::::::::::::::::::::::::::::
	2.3.2 Adding back the transformed non-inner generic overlaps
	:::::::::::::::::::::::::::::::::::::::::::::::::::::::
	Adding back the generic overlaps to the main dataset

*/

DROP TABLE IF EXISTS #generic_spells_added_back
;WITH cte AS
(
	SELECT
	a.*
	, b.overlap_with_next_startdate
	, b.overlap_with_last_enddate
	, b.startdate AS new_startdate
	, b.enddate AS new_enddate
	, b.old_startdate
	, b.old_enddate
	, priority_spell
	, sum_priority
	, priority_reason
	FROM #input_to_generic_overlaps_indicated a
	LEFT JOIN #generic_overlaps_idicated b
	ON a.snz_uid = b.snz_uid 
	AND a.startdate = b.old_startdate AND a.enddate = b.old_enddate
	AND a.schoolnbr = b.schoolnbr
)
, generic_spells_added_back AS
(
SELECT 
	snz_uid	
	,CASE WHEN new_startdate IS NULL THEN startdate
		ELSE new_startdate
		END AS startdate	
	, CASE WHEN new_enddate IS NULL THEN enddate
		ELSE new_enddate
		END AS enddate
	, start_level
	, moe_esi_leave_rsn_code
	, schoolnbr
	, sch_enddate_imputed
	, CASE WHEN overlap_with_last_enddate IS NULL THEN unique_spell_reason
		WHEN overlap_with_last_enddate IS NOT NULL AND sum_priority = 2 THEN 'overlapping_priority' 
		ELSE priority_reason
		END AS unique_spell_reason		
	, special_flag
	FROM cte
)
SELECT *
INTO #generic_spells_added_back
FROM generic_spells_added_back
JOIN #global g ON 1 = 1
WHERE DATEDIFF(DAY, startdate, enddate) > g.short;

GO

PRINT('#generic_spells_added_back TABLE generated');

/*
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	3. Appending the spells that have no attendance data associated 
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	Initially in step x.x we removed all spells that were not suseptible to an attendance confrontation due to missing attendance data
	We now add these spells back

	input table check:
		select * from #big_clean_spells
		select * from #generic_spells_added_back
		select * from #student_enrol_removed_spells

*/

DROP TABLE IF EXISTS #null_att_added_back;
SELECT 
snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, 0 AS att_footprint_year
, unique_spell_reason
, CASE WHEN schoolnbr = 498 THEN 'correspondance' 
	WHEN schoolnbr = 972 THEN 'homeschool' 
	ELSE NULL END AS special_flag
INTO #null_att_added_back
FROM #generic_spells_added_back

UNION ALL 

SELECT
snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, att_footprint_year
, 'no_attendance_data' AS unique_spell_reason
, special_flag
FROM #student_enrol_removed_spells; 

GO

PRINT('#null_att_added_back TABLE generated');

/*

	BEGIN check

		select count(*) from #null_att_added_back

	END check

*/

/*
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	4. Adding derived metadata to the full table
	::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
	Now that we have out fully cleaned table, we can compute all the relevant info, to see where we have gaps etc.

*/

/*

	::::::::::::::::::::::::
	4.1 Adding it back in initially 
	::::::::::::::::::::::::

*/

DROP TABLE IF EXISTS #big_clean_spells_info;
;WITH cte AS
(
	SELECT 
	*
	, DATEDIFF(DAY, startdate, enddate) AS enrol_days
	, ROW_NUMBER() OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) AS rn /* should all now be unique startdays	 */
	FROM #null_att_added_back	
)
SELECT 
cte.snz_uid
, cte.startdate
, cte.enddate
, cte.enrol_days
, cte.schoolnbr
, CASE WHEN (cte.unique_spell_reason IN ('enddate_overhang', 
										'startdate_snipped_outer_spell',										
										'startdate_clipped_generic_overlap')) THEN cte.unique_spell_reason
	ELSE CAST(start_level AS VARCHAR) END AS start_level
, cte.rn /* we need this when adding gaps back in */
, unique_spell_reason
, special_flag
, CASE WHEN (cte.unique_spell_reason IN ('startdate_overhang', 
										'enddate_sniped_outer_spell', 										
										'enddate_clipped_generic_overlap')) THEN cte.unique_spell_reason
	ELSE b.ReasonForLeavingDescription END AS reason_for_leaving
, CASE WHEN LEAD(cte.startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) < cte.enddate THEN 1 ELSE 0 END AS overlap_with_next_startdate
/*------------------------------------------*/
, CASE WHEN LEAD(cte.startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) < cte.enddate 
	AND LEAD(cte.enddate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) < cte.enddate 
	THEN 1 ELSE 0 END AS next_spell_inner
, CASE WHEN LAG(cte.enddate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) > cte.enddate 	
	AND LAG(cte.enddate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) > cte.startdate 	
	THEN 1 ELSE 0 END AS this_spell_inner
/*------------------------------------------*/
, CASE WHEN LEAD(cte.startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) > cte.enddate THEN 1 ELSE 0 END AS gap_to_next_startdate
, DATEDIFF(DAY, enddate, LEAD(startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn)) AS days_to_next_spell
, cte.sch_enddate_imputed 
, CASE WHEN YEAR(enddate) + 1 = YEAR(LEAD(startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn)) 
	AND DATEDIFF(DAY, enddate, LEAD(startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn)) < g.max_summer_break
	THEN 1 ELSE 0 END AS old_year_old_school
INTO #big_clean_spells_info
FROM cte 
LEFT JOIN $(idimetaversion).[moe_school].[rsn_for_leaving_code] b
ON cte.moe_esi_leave_rsn_code = b.ReasonForLeavingId
JOIN #global g ON 1=1
JOIN $(idicleanversion).[data].[personal_detail] p
ON p.snz_uid = cte.snz_uid
ORDER BY cte.snz_uid, rn; 

GO

PRINT('#big_clean_spells_info TABLE generated'); 

/*
	BEGIN check

		select * from #big_clean_spells_info order by enrol_days 
		select count(*) from #big_clean_spells_info

	END check
*/

/*
	:::::::::::::::::::::::::::::::::::::::
	4.2 Remove short spells again
	:::::::::::::::::::::::::::::::::::::::
*/

DROP TABLE IF EXISTS #big_clean_spells_info_no_short_gaps
;WITH cte AS 
(
	SELECT 
	a.* 
	, CASE WHEN days_to_next_spell BETWEEN 1 AND g.short 
		THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) 
		ELSE enddate
		END AS new_enddate
	/*, case when days_to_next_spell between 1 and g.short 
		then 1
		else NULL
		end as changed_enddate */
	FROM #big_clean_spells_info a
	JOIN #global g ON 1=1
)
SELECT
snz_uid
, startdate
, new_enddate AS enddate
, DATEDIFF(DAY, startdate, new_enddate) AS enrol_days
, schoolnbr
, start_level
, rn
, unique_spell_reason
, special_flag
, reason_for_leaving
, next_spell_inner
, this_spell_inner
, overlap_with_next_startdate
, CASE WHEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY rn) > new_enddate THEN 1 ELSE 0 END AS gap_to_next_startdate
, DATEDIFF(DAY, new_enddate, LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY rn)) AS days_to_next_spell
, sch_enddate_imputed
, old_year_old_school
INTO #big_clean_spells_info_no_short_gaps
FROM cte;

GO

PRINT('#big_clean_spells_info_no_short_gaps TABLE generated'); 

/*

	BEGIN check

		select count(*) from #big_clean_spells_info
		select count(*) from #big_clean_spells_info_no_short_gaps

		select * from #big_clean_spells_info_no_short_gaps where days_to_next_spell is not null and days_to_next_spell > 0 order by days_to_next_spell 
		where changed_enddate is not null

		select * from #big_clean_spells_info order by snz_uid, startdate, enddate
		select * from #big_clean_spells_info where overlap_with_next_startdate > 0 

		when null att is added back there are now overlaps

		select 
			snz_uid
			, startdate
			, enddate
			, enrol_days
			, schoolnbr
			, start_level
			--, age_at_start
			, unique_spell_reason
			, reason_for_leaving
			, overlap_with_next_startdate
			--, gap_to_next_startdate
			, next_spell_inner
			, this_spell_inner
			, days_to_next_spell
			, sch_enddate_imputed
			, old_year_old_school
		from #big_clean_spells_info_no_short_gaps 
		where snz_uid in (select distinct snz_uid from #big_clean_spells_info where overlap_with_next_startdate > 0)
		order by snz_uid, startdate, enddate -- x affected spells 

		Can we deal with them in some way? FUTURE IMPROVEMENT TODO

	END check

*/

/*
	::::::::::::::::::::::::::::::::::::
	5. Build the 'complete' spells table including labelled gaps in enrolment
	::::::::::::::::::::::::::::::::::::
	This step consists of two seperate steps
		(i) Find the gaps
		(ii) Add them back in

*/

/*

	:::::::::::::::::::::::::::::::::
	5.1 Find the gaps in enrolment that remain
	:::::::::::::::::::::::::::::::::
	just the gaps that remain

	input table check:
	select * from #big_clean_spells_info_no_short_gaps

*/

DROP TABLE IF EXISTS #enrolment_gaps;
SELECT 
snz_uid
, enddate AS startdate
, DATEADD(DAY, days_to_next_spell, enddate) AS enddate
, days_to_next_spell AS enrol_days
, NULL AS schoolnbr
, NULL AS start_level
, rn+1 AS rn
, CASE WHEN old_year_old_school = 1 THEN 'School change OVER Summer' ELSE 'New Enrolment' END AS reason_for_leaving
, 0 AS this_spell_inner
, 1 AS enrolment_gap
, 0 AS sch_enddate_imputed
, CASE WHEN old_year_old_school = 1 THEN 1 ELSE 0 END AS summer_break
INTO #enrolment_gaps
FROM #big_clean_spells_info_no_short_gaps
WHERE gap_to_next_startdate = 1;

GO

PRINT('#enrolment_gaps TABLE generated'); 

/*
	:::::::::::::::::::::::::::::::::::::
	5.2. append the 'gap' spells to the main dataset
	:::::::::::::::::::::::::::::::::::::
	The resulting table has no missing dates, every student has a row that covers any given day
	From the first day of their enroilment until max_attendance (or extraction_date if we're using that)

*/

DROP TABLE IF EXISTS #complete_spells
;WITH cte AS
(	
	SELECT 
	snz_uid
	, startdate
	, enddate
	, enrol_days
	, schoolnbr
	, start_level
	, reason_for_leaving
	, unique_spell_reason
	, special_flag
	, this_spell_inner
	, 0 AS enrolment_gap
	, sch_enddate_imputed
	, 0 AS summer_break
	FROM #big_clean_spells_info_no_short_gaps

	UNION ALL

	SELECT 	
	snz_uid
	, startdate
	, enddate
	, enrol_days
	, schoolnbr	
	, CAST(start_level AS VARCHAR) AS start_level 
	, reason_for_leaving	
	, CASE WHEN summer_break = 1 THEN 'summer_break'
		ELSE 'enrolment_gap' END AS unique_spell_reason	
	, NULL AS special_flag
	, this_spell_inner
	, enrolment_gap	
	, sch_enddate_imputed	
	, summer_break
	FROM #enrolment_gaps			
)
SELECT 
snz_uid
, startdate
, enddate
, enrol_days
, schoolnbr
, unique_spell_reason
, special_flag
, start_level
, CASE WHEN sch_enddate_imputed = 1 THEN 'Spell Enddate Imputed' ELSE reason_for_leaving END AS spell_end_reason
, this_spell_inner
, CASE WHEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate) < enddate THEN 1 ELSE 0 END AS overlap_with_next_startdate
, CASE WHEN DATEDIFF(DAY, enddate, LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate))  IS NULL THEN 1
	ELSE 0 END AS final_spell
, enrolment_gap
, sch_enddate_imputed
, summer_break
INTO #complete_spells
FROM cte;

GO
 PRINT('#complete_spells TABLE generated');

/*

	BEGIN check

		select distinct days_to_next_spell from #complete_spells 
		select distinct this_spell_inner from #complete_spells 
		select distinct gap_to_next_startdate from #complete_spells 
		select distinct overlap_with_next_startdate from #complete_spells 

		select * from #complete_spells order by snz_uid, startdate
		select * from #complete_spells 
		where snz_uid in (select distinct snz_uid from #complete_spells where overlap_with_next_startdate = 1)
		order by snz_uid, startdate, enddate -- x spells contaminated by overlaps in one of the spells for a snz_uid

		select  distinct unique_spell_reason from #complete_spells where schoolnbr is null -- enrolment_gap

	END check

*/

/*
	:::::::::::::::::::::::::::
	END OF TABLE BUILD SECTION
	:::::::::::::::::::::::::::	
*/

DROP TABLE IF EXISTS $(targetdb).$(targetschema).$(projprefix)_school_enrolment_spells;

SELECT *

INTO $(targetdb).$(targetschema).$(projprefix)_school_enrolment_spells

FROM #complete_spells;

/*
	TABLES TO DROP 

	drop table if exists #big_clean_spells
	drop table if exists #rec_cte
	drop table if exists #big_clean_spells_info_all_inner_removed
	drop table if exists #big_clean_spells_inner_spells
	drop table if exists #duplicate_enddate_indicated
	drop table if exists #duplicate_startdate_indicated
	drop table if exists #duplicate_spells_indicated
	drop table if exists #enrolment_gaps
	drop table if exists #no_duplicate_spells

	--some new ones I made since last update

	drop table if exists #att_enrol_coverage
	drop table if exists #big_clean_spells_info
	drop table if exists #big_clean_spells_info_no_short_gaps
	drop table if exists #closed_schools
	drop table if exists #generic_overlaps_idicated
	drop table if exists #generic_spells_added_back
	drop table if exists #global
	drop table if exists #inner_spells_added_back
	drop table if exists #inner_spells_added_back_again
	drop table if exists #inner_spells_indicated
	drop table if exists #inner_spells_indicated_again
	drop table if exists #input_to_generic_overlaps_indicated
	drop table if exists #multiple_lags
	drop table if exists #null_att_added_back
	drop table if exists #student_enrol_all
	drop table if exists #student_enrol_all_indicated
	drop table if exists #student_enrol_all_without_att_check
	drop table if exists #student_enrol_removed_spells
	drop table if exists #tmp_student_sample

*/