Module Output
**SQL: [IDI_Community].[edu_school_enrolment_spells].[school_enrolment_spells_YYYYMM]
**SAS: libname cm_dlr_ODBC dsn=idi_community_srvprd schema=edu_school_enrolment_spells; proc print data = cm_dlr.school_enrolment_spells_YYYYMM ;run ;
How to access a code module in the Data Lab:Read here
Purpose
This code creates spells of school enrolment for ākonga enrolled in school in New Zealand. There are two output tables associated with this module corresponding to two use cases:
- a table containing information about which school each student is enrolled in (enrolment spells).
- a table containing information about whether a student is enrolled in any school (enrolment status).
This script is responsible for the first table.
Key Concepts
ENROL vs School Roll Return data sources
ENROL is a register of student enrolments. Schools are legally required to update enrolment records in ENROL within 5 school days as students enrol, change schools or leave the school system. The source of the student rolls in the school directory come from ENROL. This is updated on a monthly basis (during the school year). ENROL is also the source data for this module.
School Roll Return (SRR) is another source of roll information. There are four school roll returns every year. All schools complete a snapshot of roll information as at 1 March and 1 July. The Ministry of Education also collects 1 June and 1 September rolls from state and state-integrated schools with students in funding year levels 9 to 15. Charter Schools also complete the 1 June and 1 September returns. Most schools use a student management system (SMS) to manage their student records, and to complete the roll returns for the Ministry. SRR data is reported on the MoE education counts public facing page.
Enrolments vs Attendance
If a student is attending a school, then the school has a legal requirement to enrol that student in ENROL. Please see this public facing Minsitry page.
A child can start school when they turn 5, but they are not legally required to start until they turn 6. Once they are in enrolled in school they are legally required to attend until they are 16. If a student does not attend school while they are enrolled, this is dealt with based on the school’s internal policies. Once the student has been absent for 20 consecutive school days (unless they have been told that the student’s absence is only temporary) the school should then withdraw them with their last date of attendance at the school. Please see this public facing Ministry page
There are a small number of situations where a student can be enrolled at multiple schools simultaneously. These include dual enrolments with Te Kura, or off-site enrolments with Health Schools or Alternative Education Providers.
Comparison against other sources
At a given snapshot of time, the enrolment spells module can be used to re-create school roll counts. Snapshot roll counts can be summarised from the table [IDI_Clean_].[moe_clean].[school_roll_return] at two different dates every year: March 1st and July 1st. Mismatches can be investigated at the student level.
Public facing school roll summaries, derived from the snapshot School Roll Return, are also available from MoE’s EdCounts page. These public facing summaries do not always match those obtained from the [moe_clean].[school_roll_return] table.
The module tables are derived from the source table [IDI_Clean_].[moe_clean].[student_enrol] through a cleaning process. A comparison to this source can be used to understand the extent of the cleaning process.
References & Contacts
- James Swindells (MoE)
- Cheryl Remington (MoE)
- Angus Prain (Stats)
Development team
| Domain | Agency | Person |
|---|---|---|
| Lead SME | SIA | Andrew Webber |
| Lead SME | UC | Peng Sun |
| IDI analysts | Stats | Angus Prain |
| Data Supply | MoE | ENROL |
| Policy/Operational Experts | MoE | Chelsea Shaw |
Module Business Rules
The enrolment_spells output table consists of unique combinations of (student, spell start, spell end, school) (which are snz_uid, startdate, enddate, schoolnbr as column names in the table), with overlapping spells reduced to only those overlaps which cannot be disentangled using attendance data prioritisation.
The script uses the following business rules:
- Short spells are removed (length of shortest spell determined by user parameter).
- Missing spell enddates are imputed (imputed date determined by user parameter, intended to be equal to last attendance date in [moe_clean].[student_attendance] table).
- Same startdate, same enddate, same school: remove duplicate rows.
- Same startdate, different enddate, same school: take longest spell.
- Different startdate, same enddate, same school: take longest spell.
We also apply the following general business rule to a range of different cases of overlapped enrolment spells:
General Rule. When a student is enrolled at more than one school simultaneously, the school that the student attends for more days in the overlapping period is prioritised and other spells have the overlapping region snipped away, leaving shorter, non-overlapping spells. When attendance data is not available for comparison due to historical missing data or school-specific reasons, no prioritisation is carried out and overlapping spells remain in the table.
The following 4 business rules are special cases of the general rule:
- Same startdate, same enddate, different school: use attendance data to break overlap
a. if either school has global null attendance*, no comparison is possible, both spells left
b. if both schools have global non-null attendance**, select school with highest attendance count in overlap. - Same startdate, different enddate, different school: use attendance data to break overlap
a. if either school has global null attendance*, no comparison is possible, both spells left
b. if both schools have global non-null attendance**, select school with highest attendance count in overlap
c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping. - Different startdate, same enddate, different school: use attendance data to break overlap
a. if either school has global null attendance*, no comparison is possible, both spells left
b. if both schools have global non-null attendance**, select school with highest attendance count in overlap
c. if longer spell is de-prioritised, keep the overhanging bit of the longer spell via snipping. - Overlapping spells with no matching dates: use attendance data to break overlap
a. if either school has global null attendance*, no comparison is possible, both spells left
b. if both schools have global non-null attendance**, select school with highest attendance count in overlap
c. Snip overlapped portion of de-prioritised spell off, leaving the on-overlapped bit(s).
*Global null attendance is a property that a school can have in a given year when the school has zero attendance records in the table [moe_clean].[student_attendance] in that year.
**Global non-null attendance is where a school has some attendance records in the table [moe_clean].[student_attendance] in a given year.
Note: Attendance data collection is terms 1-4 from 2019 onwards, with only Term 2 attendance recorded from 2018 and older. This results in some overlaps before 2019 being difficult to prioritise if they do not occur in Term 2. In these cases, a school for which the student has attended at any point, even outside the overlapped region (zero overlapped attendance) is prioritised over a school for which the student has never attended (NULL overlapped attendance). When both schools have zero overlapped attendance or both have null overlapped attendance, then the overlaps are broken using the rule of thumb that enddates are less reliable than startdates - in this case we snip the enddate of the earlier spell to match the startdate of the later starting spell. If this situation occurs in the special case of an inner / outer spell pair, we prioritise the outer spell, removing the inner spell from the table.
Open Issues/Comments
- Missing enddates that do not correspond to students enrolled at extraction (source data issue)
- Poorly joined snz_uid (IDI linking issue)
- Absence of a longitudinal school profile metadata (proposals exist to address this non-critical issue)
- We have not used the [te_kura] datasets for enrolments in Te Kura (Correspondence School) (future implementation is possible)
- Student attendance data is used in a critical way in this script, and therefore its quality and coverage plays a key role in the overall quality and coverage of the module. Attendance data coverage is currently (June 2025) as follows:
a. Term 2 data only from 2011 until 2018
b. Terms 1 through to 4 from 2019 until term 3 2024
c. Max attendance date is 2024-09-27 - The [moe_clean].[student_enrol] table goes back to 1985 does not have any good coverage until 2005.
Parameters
The following parameters should be supplied to this module to run it in the database:
{targetdb}: The SQL database on which the spell datasets are to be created.{idicleanversion}: The IDI Clean version that the spell datasets need to be based on.{targetschema}: The project schema under the target database into which the spell datasets are to be created.{projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.{max_attendance_date}: the max date in student_attendance table
Dependencies
[moe_clean].[student_enrol][moe_clean].[student_attendance][moe_clean].[provider_profile]
Outputs
{targetdb}.{targetschema}.{projprefix}_school_enrolment_spells
Variable Descriptions
| Column name | Description |
|---|---|
| snz_uid | Stats nz unique identifier |
| startdate | Spell start date |
| enddate | Spell end date |
| enrol_days | Length of spell in days |
| schoolnbr | School profile number |
| unique_spell_reason | Description of any cleaning that has occurred on the spell or on an overlapping adjacent spell |
| special_flag | Flag indicating Te Kra and Homeschool |
| start_level | School level of the student at spell start date |
| spell_end_reason | Description of why the spell ended - this derived from SpellEndReason in the [student_enrol] table |
| this_spell_inner | Binary flag indicating that the spell is entirely containes in a larger (outer) spell |
| overlap_with_next_startdate | Binary flag indicating that the enddate overlaps with the startdate of an adjacent spell (it has not been deprioritised) |
| final_spell | The last spell in the table for the student (ordered by startdate, enddate) |
| enrolment_gap | The spell is a period of non-enrolment |
| sch_enddate_imputed | Binary flag indicating that the spell enddate was NULL in the source data |
| summer_break | Binary flag indicating that the spell is a gap, but it’s a gap over the summer break |
Module Version & Change History
| Date | Version | Comments |
|---|---|---|
| 4 June 2025 | Initial Version based on specifications from Commissioning document. |
Code
:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
:setvar idimetaversion "{idimetaversion}"
:setvar max_attendance_date "{max_attendance_date}"
/*
:::::::::::::::::::::
0. GLOBAL SETUP
:::::::::::::::::::::
*/
/*
::::::::::::::::::::::::::::
0.1. GLOBAL
::::::::::::::::::::::::::::
Some parameters that get used later on
*/
DROP TABLE IF EXISTS #global;
SELECT start_year = 2018
, end_year = 2025
, max_summer_break = 60 /* maximum gap over summer that we allowe without inducing non_enrolment */
, short = 4 /* minimum length of shortest spells pulled through into the base table before processing */
, max_attendance_date = '2024-09-27'
, lowest_age = 4
, highest_age = 21 /* ORS students may be enrolled up to age 21 */
INTO #global;
GO
PRINT('#global TABLE generated');
/*
:::::::::::::::::::::::::::::::
SCHOOL METADATA
:::::::::::::::::::::::::::::::
for convenience a useful table that has all the relevant school metadata
drop table if exists #school_metadata;
select *
into #school_metadata
from [IDI_Sandpit].[DL-MAA2020-47].[school_metadata];
*/
/*
:::::::::::::::::::::::::::::::::::
STUDENT POPULATION SAMPLE DEFINITION
:::::::::::::::::::::::::::::::::::
This is a tmp student sample for testing purposes
The sample population consists of ~20k students who were enrolled
on June 15 of the year given by global.start_year (defined above)
drop table if exists #tmp_student_sample;
select
distinct top 20000
snz_uid
into #tmp_student_sample
from $(idicleanversion).[moe_clean].[student_enrol]
join #global g
on 1 = 1
/* everyone with a spell overlapping with June 15 on start year */
WHERE DATEFROMPARTS(g.start_year, 6, 15) BETWEEN moe_esi_start_date AND ISNULL(moe_esi_end_date, g.max_attendance_date);
GO
PRINT('#tmp_student_sample TABLE generated')
*/
/*
:::::::::::::::::::::::::::::::::::::::::
0.2 Schools that are closed
:::::::::::::::::::::::::::::::::::::::::
*/
DROP TABLE IF EXISTS #closed_schools;
SELECT moe_pp_provider_code
, moe_pp_provider_close_date
INTO #closed_schools
FROM $(idicleanversion).[moe_clean].[provider_profile]
WHERE moe_pp_provider_close_date IS NOT NULL;
GO
PRINT('#closed_schools TABLE generated');
/*
::::::::::::::::::::::::::::
1. STUDENT ENROLMENT DATA
::::::::::::::::::::::::::::
Note home schooling episodes are included in these data, provider code = 972 (labelled as 'homeschool' in special_flag in next section)
Note also that Te Kura spells are included, provider code = 498 (labelled as 'correspondance' in special_flag in next section)
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::
1.1. Get the (mostly) raw records of students
::::::::::::::::::::::::::::::::::::::::::::::
Spells are required to fit in the appropriate window of dates
Short spells are removed
*/
DROP TABLE IF EXISTS #student_enrol_all_raw;
SELECT a.snz_uid
, a.snz_moe_uid
, moe_esi_start_date AS startdate
/* if enddate is missing use extraction date*/
/* ,isnull(moe_esi_end_date, moe_esi_extrtn_date) as enddate */
/* change in code to map to max_attendance date only, for comparison */
, ISNULL(moe_esi_end_date, g.max_attendance_date) AS enddate
, moe_esi_leave_rsn_code
, DATEDIFF(d,moe_esi_start_date,ISNULL(moe_esi_end_date,moe_esi_extrtn_date)) AS enrol_days
, moe_esi_provider_code AS schoolnbr
, moe_esi_entry_year_lvl_nbr AS start_level
, CASE WHEN moe_esi_end_date IS NULL THEN 1 ELSE 0 END AS sch_enddate_imputed
/* include a row number for selecting the record if duplicate rows with same school (threee types of duplicate) */
, ROW_NUMBER() OVER(PARTITION BY a.snz_uid, moe_esi_start_date, moe_esi_end_date, moe_esi_provider_code ORDER BY moe_esi_entry_year_lvl_nbr) AS rn_both /*same startdate and enddate: choose randomly*/
, ROW_NUMBER() OVER(PARTITION BY a.snz_uid, moe_esi_start_date, moe_esi_provider_code ORDER BY moe_esi_end_date DESC) AS rn_start /* same startdate: choose latest enddate */
, ROW_NUMBER() OVER(PARTITION BY a.snz_uid, moe_esi_end_date, moe_esi_provider_code ORDER BY moe_esi_start_date) AS rn_end /* same enddate: choose earliest startdate */
, DATEDIFF(YEAR, pd.snz_birth_date_proxy, moe_esi_start_date) AS age_at_startdate
, DATEDIFF(YEAR, pd.snz_birth_date_proxy, ISNULL(moe_esi_end_date, moe_esi_extrtn_date)) AS age_at_enddate
, CASE WHEN closed.moe_pp_provider_close_date IS NOT NULL
AND ISNULL(moe_esi_end_date, moe_esi_extrtn_date) > closed.moe_pp_provider_close_date
THEN 1 ELSE NULL END AS closed_school_at_enddate
INTO #student_enrol_all_raw
FROM $(idicleanversion).[moe_clean].student_enrol a
JOIN #global g
ON 1 = 1
JOIN $(idicleanversion).[data].personal_detail pd
ON pd.snz_uid = a.snz_uid
LEFT JOIN #closed_schools closed
ON closed.moe_pp_provider_code = a.moe_esi_provider_code
WHERE DATEDIFF(d,moe_esi_start_date, ISNULL(moe_esi_end_date, g.max_attendance_date)) > g.short /* delete short spells */
/* BEGIN STUDENT SAMPLE FILTER */
/*and a.snz_uid in (select snz_uid from #tmp_student_sample ) -- temp student sample */
/* END STUDENT SAMPLE FILTER */
AND moe_esi_start_date <= g.max_attendance_date
AND ISNULL(moe_esi_end_date, g.max_attendance_date) >= DATEFROMPARTS(g.start_year, 1,1); /* some overlap with selected time period */
GO
PRINT('#student_enrol_all_raw TABLE built');
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
1.2. Find all students who have a 'triple lag'
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- Table finds all lags
- Lags are where the start date of one spell overlaps with an enddate of a spell that started earlier
- Triple lagged spells occur when a student is enrolled in 4 distinct schools simultaneously
we coinsider this to be an indicatio that something is wrong with the linking of such student
and therefore they are removed from the table
*/
DROP TABLE IF EXISTS #multiple_lags;
SELECT *
, CASE WHEN LAG(enddate, 2) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS double_lag
, CASE WHEN LAG(enddate, 3) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS triple_lag
, CASE WHEN LAG(enddate, 4) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS quad_lag
, CASE WHEN LAG(enddate, 5) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS pent_lag
, CASE WHEN LAG(enddate, 6) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS hex_lag
, CASE WHEN LAG(enddate, 7) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS hep_lag
, CASE WHEN LAG(enddate, 8) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS oct_lag
, CASE WHEN LAG(enddate, 9) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS non_lag
, CASE WHEN LAG(enddate, 10) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1
ELSE 0 END AS dec_lag
INTO #multiple_lags
FROM #student_enrol_all_raw;
GO
PRINT('#multiple_lags TABLE generated');
/*
BEGIN testing
select distinct snz_uid from #multiple_lags where triple_lag = 1 --
select distinct snz_uid from #multiple_lags where quad_lag = 1 --
select distinct schoolnbr from #multiple_lags where triple_lag = 1
select
sum(double_lag) as double_lags --
, sum(triple_lag) as triple_lag --
, sum(quad_lag) as quad_lag --
, sum(pent_lag) as pent_lag --
, sum(hex_lag) as hex_lag --
, sum(hep_lag) as hep_lag --
, sum(oct_lag) as oct_lag --
, sum(non_lag) as non_lag --
, sum(dec_lag) as dec_lag --
from #multiple_lags
BEGIN testing
*/
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
1.3. Filtering raw data to appropriate spells
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Build the base table by deduplicating on snz_uid, schoolnbr,
startdate, enddate and adding a special flag for homeschool and te kura
filtering the triple lagged spells and spells associated with closed schools
and filtering to the appropriate age range
*/
DROP TABLE IF EXISTS #student_enrol_all_without_att_check;
SELECT snz_uid
, startdate
, enddate
, moe_esi_leave_rsn_code
, enrol_days
, schoolnbr
, start_level
, sch_enddate_imputed
, CASE WHEN schoolnbr = 498 THEN 'correspondance'
WHEN schoolnbr = 972 THEN 'homeschool'
ELSE NULL END AS special_flag
INTO #student_enrol_all_without_att_check
FROM #student_enrol_all_raw
WHERE rn_both = 1
AND rn_start = 1
AND rn_end = 1 /* duplicate spells for a given student at the same school, select only the longest spell */
AND age_at_enddate BETWEEN 4 AND 19
AND age_at_startdate BETWEEN 4 AND 19
/* filter out spells that end after the school lis closed */
AND closed_school_at_enddate IS NULL
/* this is where we filterf out students enrolled in 4 schools simultaneously */
AND snz_uid NOT IN (SELECT DISTINCT snz_uid FROM #multiple_lags WHERE triple_lag = 1);
GO
PRINT('#student_enrol_all_without_att_check generated');
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
1.4. Build a table of the overlaps between attendance data and enrolment data
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
The idea is we should not allow comparisons to schools which did not have any attendance footprint (for whatever reason)
*/
DROP TABLE IF EXISTS #att_enrol_coverage ;
WITH cte AS(
SELECT DISTINCT schoolnbr
, YEAR(startdate) AS start_year
FROM #student_enrol_all_without_att_check
UNION
SELECT DISTINCT schoolnbr
, YEAR(enddate) AS start_year
FROM #student_enrol_all_without_att_check
)
, all_attendance AS(
SELECT DISTINCT moe_ssa_provider_code
, YEAR(moe_ssa_attendance_date) AS att_year
FROM $(idicleanversion).[moe_clean].[school_student_attendance]
)
SELECT a.*
, b.*
INTO #att_enrol_coverage
FROM cte a
LEFT JOIN all_attendance b
ON a.schoolnbr = b.moe_ssa_provider_code
AND a.start_year = b.att_year;
GO
PRINT('#att_enrol_coverage TABLE generated');
/*
BEGIN check
select * from #att_enrol_coverage order by schoolnbr
select count(*) from #att_enrol_coverage
select start_year, count(*) from #att_enrol_coverage
where att_year is null
group by start_year order by start_year
END check
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
1.5. remove the schools that are not comparible using attendance data
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
This table is our base table, that is used in the rest of this script
*/
DROP TABLE IF EXISTS #student_enrol_all ;
WITH all_spells AS(
SELECT b.*
, CASE WHEN (c.att_year IS NOT NULL OR d.att_year IS NOT NULL) THEN 1 ELSE 0 END AS att_footprint_year
FROM #student_enrol_all_without_att_check b
LEFT JOIN #att_enrol_coverage c
ON c.schoolnbr = b.schoolnbr
AND c.start_year = YEAR(b.startdate)
LEFT JOIN #att_enrol_coverage d
ON d.schoolnbr = b.schoolnbr
AND d.start_year = YEAR(b.enddate)
)
SELECT *
INTO #student_enrol_all
FROM all_spells
WHERE att_footprint_year = 1;
GO
PRINT('#student_enrol_all TABLE generated');
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
1.6. Keep a record of those removed spells
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
These records will be added back at the end
*/
DROP TABLE IF EXISTS #student_enrol_removed_spells ;
WITH all_spells AS(
SELECT b.*
, CASE WHEN (c.att_year IS NOT NULL OR d.att_year IS NOT NULL) THEN 1 ELSE 0 END AS att_footprint_year
FROM #student_enrol_all_without_att_check b
LEFT JOIN #att_enrol_coverage c
ON c.schoolnbr = b.schoolnbr
AND c.start_year = YEAR(b.startdate)
LEFT JOIN #att_enrol_coverage d
ON d.schoolnbr = b.schoolnbr
AND d.start_year = YEAR(b.enddate)
)
SELECT *
INTO #student_enrol_removed_spells
FROM all_spells
WHERE att_footprint_year = 0;
GO
PRINT('#student_enrol_removed_spells TABLE generated');
/*
BEGIN check
select count(*) from #student_enrol_all_without_att_check --
select count(*) from #student_enrol_removed_spells --
select count(*) from #student_enrol_all--
and = +
select * from #student_enrol_removed_spells order by snz_uid, startdate, enddate
select *
from #school_metadata
where moe_pp_provider_code in (select distinct schoolnbr from #student_enrol_removed_spells)
END check
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
2. BREAKING OVERLAPPED SPELLS WITH ATTENDANCE DATA
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
This section carries out the overlap breaking in three steps:
- 2.1. Overlapping spells which have at least one matching date
- 2.2. Overlapping spells which do not have a matching date and which are inner spells
- 2.3. Overlapping spells which do not have a matching date and which are not inner spells
The reason for this is that the second step is vastly complicated if you dont first carry out the first one!
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
2.1. Breaking overlapped spells that have at least one matching date
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
There are three cases to consider
- overlapped spells with a matching startdate and matching enddate
- overlapped spells wth only a matching startdate
- overlapped spells with only a mathing enddate
*/
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::
2.1.1. Spells with mathching startdate and matching enddate
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Generate a '#duplicate_spells_indicated' table of the same length as student_enrol_all
- spells with duplicate enddates, startdates (but always different schools they have been dealth with elsewhere) or both are indicated
- priority is assigned based on attendance data in the overlap
*/
DROP TABLE IF EXISTS #duplicate_spells_indicated ;
WITH cte AS /* all spells that have same snz_uid, startdate, enddate */
(
SELECT snz_uid
, startdate
, enddate
, COUNT(*) AS n_rows
, COUNT(DISTINCT schoolnbr) AS n_schools
FROM #student_enrol_all
GROUP BY snz_uid
, startdate
, enddate
HAVING COUNT(DISTINCT schoolnbr) > 1
)
, duplicate_spells AS /* pick up the actual spell information from #student_enroll_all for these duplicates */
(
SELECT b.*
FROM cte a
JOIN #student_enrol_all b
ON a.snz_uid = b.snz_uid
AND a.startdate = b.startdate
AND a.enddate = b.enddate /* join on both enddate and startdate*/
)
, attendance_in_spell AS /* pick up the attendance records for these spells, label in-scope records*/
(
SELECT a.*
, b.moe_ssa_attendance_date
, CASE WHEN b.moe_ssa_provider_code IS NULL THEN NULL
WHEN b.moe_ssa_provider_code IS NOT NULL AND (b.moe_ssa_attendance_date BETWEEN a.startdate AND a.enddate
)THEN 1 WHEN b.moe_ssa_provider_code IS NOT NULL
AND NOT(
b.moe_ssa_attendance_date BETWEEN a.startdate AND a.enddate
)THEN 0 ELSE -99 END AS attendance_checker
FROM duplicate_spells a
LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b
ON a.snz_uid = b.snz_uid
AND a.schoolnbr = b.moe_ssa_provider_code
)
, attendance_summary_in_spell AS /* summarise the attendance situation per spell */
(
SELECT snz_uid
, schoolnbr
, startdate
, enddate
, MIN(moe_ssa_attendance_date) AS first_attendance
, MAX(moe_ssa_attendance_date) AS last_attendance
/*, count(distinct moe_ssa_attendance_date) as n_days_in_overlap*/
, SUM(attendance_checker) AS n_days_in_overlap
/*, case when sum(attendance_checker) is null then 1 else 0 end as null_checker*/
, ROW_NUMBER() OVER(PARTITION BY snz_uid, startdate, enddate ORDER BY SUM(attendance_checker) DESC) AS rn /* if tied its random, if null its de-prioritised */
FROM attendance_in_spell
GROUP BY snz_uid
, schoolnbr
, startdate
, enddate
, att_footprint_year
)
, final_table AS(
SELECT *
, CASE WHEN rn <> 1
THEN 0 ELSE 1 END AS unique_school_flag
, CASE WHEN rn = 1 THEN 'matching_startdate_and_enddate' ELSE 'matching_startdate_and_enddate' END AS duplicate_reason
FROM attendance_summary_in_spell
)
SELECT
*
, SUM(unique_school_flag) OVER(PARTITION BY snz_uid, startdate, enddate) AS sum_priority
INTO #duplicate_spells_indicated
FROM final_table;
GO
PRINT('#duplicate_spells_indicated TABLE generated');
/*
BEGIN check
select count(*) from #duplicate_spells_indicated -- full run
select count(*) from #duplicate_spells_indicated where rn <>1 -- full run
select * from #duplicate_spells_indicated order by snz_uid
END check
*/
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::
2.1.2. Build table with full duplicates deduplicated
:::::::::::::::::::::::::::::::::::::::::::::::::::
doesn't matter how, this step is only required to find matching start dates spells
full duplicate spell dates are still present in the table, just one copy of them
*/
DROP TABLE IF EXISTS #no_duplicate_spells
;WITH cte AS
(
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY snz_uid, startdate, enddate ORDER BY schoolnbr DESC) AS rn_tmp
FROM #student_enrol_all
)
SELECT
*
INTO #no_duplicate_spells
FROM cte
WHERE rn_tmp = 1; /* takes highest school number to break ties (BAD PRACTICE in general, in this case no big deal) */
GO
PRINT('#no_duplicate_spells generated');
/*
BEGIN check
select count(*) from #student_enrol_all -- x
select count(*) from #duplicate_spells_indicated where rn <> 1 -- y removed spells
select count(*) from #no_duplicate_spells -- z remaining spells
and x = z + y
END check
*/
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::
2.1.3. Spells with matching startdate
:::::::::::::::::::::::::::::::::::::::::::::::::::
- carry out the same exercise as above but this time for matching startdates only
*/
DROP TABLE IF EXISTS #duplicate_startdate_indicated
;WITH cte AS /* starting from de-duplicated start + end dates, count matching startdates spells */
(
SELECT
snz_uid
, startdate
, COUNT(*) AS n_rows
/* , count(distinct schoolnbr) as n_schools */
FROM #no_duplicate_spells
GROUP BY snz_uid, startdate
HAVING COUNT(*) > 1
)
, duplicate_spells AS /* pick out the limits of the overlap date interval and the "overhung" enddate */
(
SELECT
b.*
, MIN(b.enddate) OVER(PARTITION BY b.snz_uid, b.startdate) AS overlap_enddate
, MAX(b.enddate) OVER(PARTITION BY b.snz_uid, b.startdate) AS latest_enddate
, DATEDIFF(DAY, b.startdate, b.enddate) AS spell_length
FROM cte a
JOIN #student_enrol_all b
ON a.snz_uid = b.snz_uid AND a.startdate = b.startdate
)
, attendance_in_spell AS /* pick up the attendance recorsd for these spells */
(
SELECT
a.*
, b.moe_ssa_attendance_date AS overlap_attendance_date
, CASE WHEN b.moe_ssa_provider_code IS NULL THEN NULL
WHEN b.moe_ssa_provider_code IS NOT NULL AND (b.moe_ssa_attendance_date BETWEEN a.startdate AND a.overlap_enddate) THEN 1
WHEN b.moe_ssa_provider_code IS NOT NULL AND NOT (b.moe_ssa_attendance_date BETWEEN a.startdate AND a.overlap_enddate) THEN 0
ELSE -99 END AS attendance_checker
FROM duplicate_spells a
LEFT JOIN $(idicleanversion).[moe_clean].school_student_attendance b
ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code
)
, attendance_summary_in_spell AS /* summarise the attendance records */
(
SELECT
snz_uid
, schoolnbr
, startdate
, overlap_enddate
, latest_enddate
, spell_length
, MIN(overlap_attendance_date) AS first_attendance
, MAX(overlap_attendance_date) AS last_attendance
, SUM(attendance_checker) AS n_days_in_overlap
, DATEDIFF(DAY, startdate, overlap_enddate) AS overlap_length
/*, att_footprint_year*/
/* , case when sum(attendance_checker) is null then 1 else 0 end as null_checker */
/*, ROW_NUMBER() over(partition by snz_uid order by count(distinct overlap_attendance_date) desc) as rn -- give a ranking to the spells based on numbe of attendance days*/
/* give a ranking to the spells based on numbe of attendance days (or spell length if tied) */
, ROW_NUMBER() OVER(PARTITION BY snz_uid, overlap_enddate, startdate ORDER BY SUM(attendance_checker) DESC, spell_length DESC) AS rn
FROM attendance_in_spell
GROUP BY snz_uid
, schoolnbr
, startdate
, overlap_enddate
, latest_enddate
, spell_length
/*, att_footprint_year -- these are common across the spells with same startdate except school*/
)
, final_table AS
(
SELECT
*
/*, min(att_footprint_year) over(partition by snz_uid, startdate) as min_att_footprint*/
, CASE WHEN rn <> 1
/*and min(att_footprint_year) over(partition by snz_uid, startdate) > 0 */
THEN 0 ELSE 1 END AS unique_school_flag
, CASE WHEN rn = 1 THEN 'matching_startdate' ELSE 'matching_startdate' END AS duplicate_reason
FROM attendance_summary_in_spell
)
SELECT
*
, SUM(unique_school_flag) OVER(PARTITION BY snz_uid, startdate) AS sum_priority
INTO #duplicate_startdate_indicated
FROM final_table;
GO
PRINT('#duplicate_startdate_indicated TABLE generated');
/*
BEGIN check
select * from #duplicate_startdate_indicated order by snz_uid, rn
END check
*/
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::
2.1.4. Spells with matching enddate
:::::::::::::::::::::::::::::::::::::::::::::::::::
- finally break the ties for the remaining matching enddate ones
- starting from the no duplicate spells table so that we dont pick up the full duplicates
ISSUE: when there are duplicate startdates, and duplicate edndates for a given student and these pairs interact!
TODO: fix this
*/
DROP TABLE IF EXISTS #duplicate_enddate_indicated
;WITH cte AS
(
SELECT
snz_uid
, enddate
, COUNT(*) AS n_rows
, COUNT(DISTINCT schoolnbr) AS n_schools
FROM #no_duplicate_spells
GROUP BY snz_uid, enddate
HAVING COUNT(*) > 1
)
, duplicate_spells AS
(
SELECT
b.*
, MAX(b.startdate) OVER(PARTITION BY b.snz_uid, b.enddate) AS overlap_startdate /* same enddate, need to find the limit of the overlap */
, MIN(b.startdate) OVER(PARTITION BY b.snz_uid, b.enddate) AS earliest_startdate /* the earliest startdate (incase there is an overlap and we need to split a spell to fill an enrolment gap) */
, DATEDIFF(DAY, b.startdate, b.enddate) AS spell_length
FROM cte a
JOIN #student_enrol_all b
ON a.snz_uid = b.snz_uid AND a.enddate = b.enddate
)
, attendance_in_spell AS
(
SELECT
a.*
, b.moe_ssa_attendance_date AS overlap_attendance_date
, CASE WHEN b.moe_ssa_provider_code IS NULL THEN NULL
WHEN b.moe_ssa_provider_code IS NOT NULL AND (b.moe_ssa_attendance_date BETWEEN a.overlap_startdate AND a.enddate) THEN 1
WHEN b.moe_ssa_provider_code IS NOT NULL AND NOT (b.moe_ssa_attendance_date BETWEEN a.overlap_startdate AND a.enddate) THEN 0
ELSE -99 END AS attendance_checker
FROM duplicate_spells a
LEFT JOIN $(idicleanversion).[moe_clean].school_student_attendance b
ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code
)
, attendance_summary_in_spell AS
(
SELECT
snz_uid
, schoolnbr
, overlap_startdate
, earliest_startdate
, enddate
, spell_length
, MIN(overlap_attendance_date) AS first_attendance
, MAX(overlap_attendance_date) AS last_attendance
, SUM(attendance_checker) AS n_days_in_overlap
/* , ROW_NUMBER() over(partition by snz_uid order by count(distinct overlap_attendance_date) desc) as rn */
, ROW_NUMBER() OVER(PARTITION BY snz_uid, overlap_startdate, enddate ORDER BY SUM(attendance_checker) DESC, spell_length DESC) AS rn
FROM attendance_in_spell
GROUP BY snz_uid, schoolnbr, enddate, overlap_startdate, earliest_startdate, spell_length
)
, final_table AS
(
SELECT
*
, CASE WHEN rn <> 1
THEN 0 ELSE 1 END AS unique_school_flag
, CASE WHEN rn = 1 THEN 'matching_enddate' ELSE 'matching_enddate' END AS duplicate_reason /* relic from when we named the not-prioritised one differently */
FROM attendance_summary_in_spell
)
SELECT *
, SUM(unique_school_flag) OVER(PARTITION BY snz_uid, enddate) AS sum_priority
INTO #duplicate_enddate_indicated
FROM final_table;
GO
PRINT('#duplicate_enddate_indicated TABLE generated');
/*
:::::::::::::::::::::::::::::::::::::::::::::::
2.1.5. Bring these together in one table
:::::::::::::::::::::::::::::::::::::::::::::::
the filtering to prioritised spells is not carried out yet at this stage
*/
DROP TABLE IF EXISTS #student_enrol_all_indicated;
SELECT a.*
------------------------------------------------------
, CASE WHEN c.latest_enddate = a.enddate
AND c.unique_school_flag = 0 THEN c.overlap_enddate WHEN d.earliest_startdate = a.startdate
AND d.unique_school_flag = 0 THEN d.earliest_startdate ELSE NULL END AS overhang_spell_startdate
------------------------------------------------------
, CASE WHEN c.latest_enddate = a.enddate
AND c.unique_school_flag = 0 THEN c.latest_enddate WHEN d.earliest_startdate = a.startdate
AND d.unique_school_flag = 0 THEN d.overlap_startdate ELSE NULL END AS overhang_spell_enddate
------------------------------------------------------
, CASE WHEN c.latest_enddate = a.enddate
AND c.unique_school_flag = 0 THEN 'enddate_overhang' WHEN d.earliest_startdate = a.startdate
AND d.unique_school_flag = 0 THEN 'startdate_overhang' ELSE NULL END AS overhang_date_type
------------------------------------------------------
, CASE WHEN b.snz_uid IS NOT NULL THEN b.unique_school_flag
WHEN c.snz_uid IS NOT NULL THEN c.unique_school_flag
WHEN d.snz_uid IS NOT NULL THEN d.unique_school_flag
ELSE 1 END AS unique_school_flag
------------------------------------------------------
, CASE WHEN b.snz_uid IS NOT NULL THEN b.duplicate_reason
WHEN c.snz_uid IS NOT NULL THEN c.duplicate_reason
WHEN d.snz_uid IS NOT NULL THEN d.duplicate_reason
END AS duplicate_reason
------------------------------------------------------
, CASE WHEN b.snz_uid IS NOT NULL THEN b.n_days_in_overlap
WHEN c.snz_uid IS NOT NULL THEN c.n_days_in_overlap
WHEN d.snz_uid IS NOT NULL THEN d.n_days_in_overlap
END AS attendance_days_in_overlap
------------------------------------------------------
, COALESCE(b.sum_priority, c.sum_priority, d.sum_priority) AS sum_priority
------------------------------------------------------
INTO #student_enrol_all_indicated
FROM #student_enrol_all a
LEFT JOIN #duplicate_spells_indicated b
ON b.snz_uid = a.snz_uid
AND b.schoolnbr = a.schoolnbr
AND b.startdate = a.startdate
AND b.enddate = a.enddate
LEFT JOIN #duplicate_startdate_indicated c
ON c.snz_uid = a.snz_uid
AND c.schoolnbr = a.schoolnbr
AND c.startdate = a.startdate
LEFT JOIN #duplicate_enddate_indicated d
ON d.snz_uid = a.snz_uid
AND d.schoolnbr = a.schoolnbr
AND d.enddate = a.enddate;
GO
PRINT('#student_enrol_all_indicated TABLE generated');
/*
BEGIN testing
select count(*) from #student_enrol_all_indicated
select count(*) from #no_duplicate_spells
select count(*) from #student_enrol_all
select * from #student_enrol_all_indicated order by snz_uid, startdate, enddate
select * from #student_enrol_all_indicated
where snz_uid in (select distinct snz_uid from #student_enrol_all_indicated where duplicate_reason is not null)
order by snz_uid, startdate, enddate
END testing
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
2.1.6. Filter non-prioritised spells to make a clean dataset of spells
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- In this step we also split off and append bits of non-prioritised spells that overlap with the prioritised biy so as not to induce non-enrolment spell later on below
*/
DROP TABLE IF EXISTS #big_clean_spells;
SELECT snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, sum_priority
, att_footprint_year
, special_flag
, CASE WHEN duplicate_reason IS NOT NULL AND (sum_priority IS NULL OR sum_priority = 1) THEN CONCAT(duplicate_reason, '_spell_removed')
WHEN duplicate_reason IS NOT NULL AND sum_priority = 2 THEN 'overlapping_priority'
WHEN duplicate_reason IS NULL THEN 'no_other_spell'
ELSE 'error' END AS unique_spell_reason
INTO #big_clean_spells
FROM #student_enrol_all_indicated
WHERE unique_school_flag = 1 /* bring all the unique ones */
UNION ALL
SELECT snz_uid
, overhang_spell_startdate AS startdate
, overhang_spell_enddate AS enddate
, CASE WHEN overhang_date_type = 'startdate_overhang' THEN start_level
ELSE NULL END AS start_level
, CASE WHEN overhang_date_type = 'enddate_overhang' THEN moe_esi_leave_rsn_code
ELSE NULL END AS moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, sum_priority
, att_footprint_year
, special_flag
, overhang_date_type AS unique_spell_reason
FROM #student_enrol_all_indicated
JOIN #global g ON 1=1
WHERE overhang_date_type IN ('startdate_overhang', 'enddate_overhang')
AND unique_school_flag = 0
AND DATEDIFF(DAY, startdate, enddate) > g.short;
GO
PRINT('#big_clean_spells TABLE generated');
/*
BEGIN testing
generic scan
select * from #big_clean_spells where snz_uid in (select distinct snz_uid from #big_clean_spells where sum_priority is not null) order by snz_uid, startdate, enddate
select distinct unique_spell_reason from #big_clean_spells
select count(*) from #big_clean_spells
single student
select * from #big_clean_spells where snz_uid = x order by startdate, enddate
select * from #student_enrol_all_indicated where snz_uid = x order by startdate, enddate
select * from #duplicate_enddate_indicated where snz_uid = x
END testing
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
2.2. Breaking overlapped spells that have no matching date
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Above, the spells with at least of matching date have been dealth with
This step now deals with the non-matching date overlapping spells
There are two general cases to consider
- inner spells: if an overlapped spell is entirely contained within another spell it is called inner
- generic overlaps: any remaining overlap which is not inner
This section deals only with the first case, the second case will be the subject of step 2.3.
*/
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::::
2.2.1 Immediate neighbour inner spells - first pass
:::::::::::::::::::::::::::::::::::::::::::::::::::::
When ordered by startdate, one can easily identify an inner spell if the encompassing 'outer' spell is in the immediately previous row
This next section deals with inner spells that are of this adjacent row kind
There are two complicated types of inner spell that can persist even after the completion of this step:
(i) It can happen that a spell is inner but that its corresponding outer spell is not an immediate nearest neighbour
(ii) After prioritising an inner through attendance data it can happen that new inner spells are created from the clipped ends
These two possibilities (i)-(ii) are dealth with by passing the entire dataset through a second inner-spell prioritisation step after the first,
in conjunction with a global filtering of the data of students that have a very deep stack of contemporaneous spells (done above in step 1.3 in
the generation of the #student_enrol_all_without_att_check table via the #multiple_lags table)
Input table check:
select top 10 * from #big_clean_spells order by snz_uid, startdate, enddate
*/
DROP TABLE IF EXISTS #inner_spells_indicated
;WITH no_duplicates AS /* start with big clean spells table which has already been prioritised for all the matching dates ones */
(
SELECT
snz_uid
, startdate
, enddate
, schoolnbr
, start_level
, sch_enddate_imputed
, unique_spell_reason
FROM #big_clean_spells
)
, overlaps_labelled AS /* label the consecutive spells that have an overlap */
(
SELECT
*
, CASE WHEN LEAD(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) < no_duplicates.enddate THEN 1 ELSE 0 END AS next_spell_inner
, CASE WHEN LAG(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) > no_duplicates.enddate THEN 1 ELSE 0 END AS this_spell_inner
FROM no_duplicates
)
, only_overlaps AS /* filter to only those spells that are overlapped in the inner-outer relationship */
(
SELECT
*
, LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_startdate
, LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_enddate
, LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_enddate
, LAG(overlaps_labelled.schoolnbr) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_school
, CASE WHEN next_spell_inner = 1
THEN LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
WHEN this_spell_inner = 1
THEN startdate
ELSE NULL END AS overlap_startdate
, CASE WHEN next_spell_inner = 1
THEN LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
WHEN this_spell_inner = 1
THEN enddate
ELSE NULL END AS overlap_enddate
FROM overlaps_labelled
WHERE (this_spell_inner = 1 OR next_spell_inner = 1)
)
, only_overlaps_with_attendance AS /* join the attendance date to these overlapped spells, add flags when attendance lies in the various date ranges */
(
SELECT
a.*
, b.moe_ssa_attendance_date
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
WHEN moe_ssa_attendance_date IS NOT NULL AND
moe_ssa_attendance_date BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 1 -- this general condition accounts for inner spells if next inner
WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 0 -- this general condition accounts for inner spells if next inner
END AS overlap_with_next_attendance_flag
, CASE WHEN moe_ssa_attendance_date IS NULL
THEN NULL /* if no attendance records at all for that student at that school then null */
WHEN moe_ssa_attendance_date IS NOT NULL AND /* if attendance records do exist */
moe_ssa_attendance_date BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) /* and attendenace record for that school and its in the right date range */
/* last_enddate is null accounts for the spells that are 'next spell is inner' which have null last_enddate entries */
THEN 1 /* then label as good */
WHEN moe_ssa_attendance_date IS NOT NULL /* if attendance records do exist */
AND moe_ssa_attendance_date NOT BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) /* and attendance record is not in the right date range */
/* last_enddate is null accounts for the spells that are 'next spell is inner' which have null last_enddate entries */
THEN 0 /* then label it as bad */
END AS overlap_with_last_attendance_flag
------------------------------------------
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date BETWEEN startdate AND enddate THEN 1
WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date NOT BETWEEN startdate AND enddate THEN 0
END AS this_spell_attendance_flag
------------------------------------------
FROM only_overlaps a
LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b
ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code /* need to join both on school and student */
)
, summarised_overlaps_attendance AS /* summarise the overlapped attendance situation */
(
SELECT
snz_uid
, startdate
, enddate
, schoolnbr
, last_school
, unique_spell_reason
, this_spell_inner
, next_spell_inner
, last_enddate
, next_startdate
, overlap_startdate
, overlap_enddate
, SUM(overlap_with_last_attendance_flag) AS n_rows_in_overlap_with_last_raw
, SUM(overlap_with_next_attendance_flag) AS n_rows_in_overleap_with_next_raw
, SUM(this_spell_attendance_flag) AS n_rows_in_range
FROM only_overlaps_with_attendance
GROUP BY snz_uid
, startdate
, enddate
, schoolnbr
, last_school
, unique_spell_reason
, this_spell_inner
, next_spell_inner--
, next_startdate
, last_enddate
, overlap_startdate
, overlap_enddate
)
, summarised_labelled_overlaps AS /* count chain of overlapped spells */
(
SELECT
a.*
, COUNT(*) OVER(PARTITION BY snz_uid) AS paired_spells
, CASE WHEN next_spell_inner = 1
AND n_rows_in_overleap_with_next_raw IS NULL
THEN 0 /* replace null with zero when the school legit does have attendance records somewherebut just not in range */
ELSE n_rows_in_overleap_with_next_raw
END AS n_rows_in_overleap_with_next
,CASE WHEN this_spell_inner = 1
AND n_rows_in_overlap_with_last_raw IS NULL
THEN 0 /* replace null with zero when the school legit does have attendance records somewhere */
ELSE n_rows_in_overlap_with_last_raw
END AS n_rows_in_overlap_with_last
FROM summarised_overlaps_attendance a
)
, pairs_indicated AS /* create the new spells on each end if they need to be made */
(
SELECT
snz_uid
, startdate
, enddate
, schoolnbr
, last_school
, this_spell_inner
, next_spell_inner
, last_enddate
, next_startdate
, n_rows_in_overlap_with_last
, n_rows_in_overleap_with_next
, overlap_startdate
, overlap_enddate
, unique_spell_reason
, paired_spells
, startdate AS first_new_startdate /* first indicate the start of the whole thing */
/* clip off the start of the outer (longer) spell if the attendance comparison prioritises the inner */
, CASE WHEN next_spell_inner = 1 /* if there is an overlap with the next spell */
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) /* and its not prioritised by attendenace */
THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the enddate to the next startdate */
ELSE enddate END AS first_new_enddate /* else leave the enddate untouched */
/*----------------------------------------------*/
/* while also clip the end of the outer spell off */
, CASE WHEN next_spell_inner = 1 /* if there is an overlap with the next spell */
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) /* and its not prioritised by attendenace */
/* if either are globally null then no comparison can be made and we leave them untouched */
THEN LEAD(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the enddate to the next startdate */
ELSE NULL END AS second_new_startdate /* else leave the second one NULL */
/*----------------------------------------------*/
/* and clipping the end of the spell off */
, CASE WHEN next_spell_inner = 1 /* if there is an overlap with the next spell */
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate)) /* and its not prioritised by attendenace */
/* if either are null then no comparison can be made and we leave them untouched */
THEN enddate /* then clip the enddate to the next startdate */
ELSE NULL END AS second_new_enddate /* else leave the second one NULL */
-----------------------------------------------
/* flag to indicate priority */
, CASE WHEN next_spell_inner = 1
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate))
THEN 0 /* then label it as having received a clipping */
WHEN next_spell_inner = 1
AND ((n_rows_in_overleap_with_next >= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and it IS not prioritised by attendenace */
OR LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate) IS NULL) /* deprioritise inner spells even if they have global null attendance (NEW CONDITION) */
THEN 1 /* label is as prioritised */
ELSE NULL END AS priotity_spell
/*----------------------------------------------*/
/* flag to indicate priotity */
, CASE WHEN this_spell_inner = 1
AND (n_rows_in_overlap_with_last <= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate) /* de-prioritize for tied attendance conditions */
OR n_rows_in_overlap_with_last IS NULL ) /* and its not prioritised by attendenace or its got null attendance */
THEN 0 /* then deprioritise it */
WHEN this_spell_inner = 1
AND last_school = schoolnbr /* deprioritize if its the same school! */
THEN 0
WHEN this_spell_inner = 1 AND last_school <> schoolnbr /* if there is an overlap with the previous spell */
AND (n_rows_in_overlap_with_last >= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and it has more attendance records */
THEN 1
ELSE NULL END AS priority_spell_inner
/*----------------------------------------------*/
FROM summarised_labelled_overlaps
)
, split_pairs AS
(
SELECT
snz_uid
, startdate AS old_startdate
, enddate AS old_enddate
, schoolnbr
, this_spell_inner
, next_spell_inner
, n_rows_in_overlap_with_last
, n_rows_in_overleap_with_next
, overlap_startdate
, overlap_enddate
, first_new_startdate AS startdate
, first_new_enddate AS enddate
,unique_spell_reason
, 1 AS union_number
, COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
FROM pairs_indicated
UNION ALL
SELECT
snz_uid
, startdate AS old_startdate
, enddate AS old_enddate
, schoolnbr
, this_spell_inner
, next_spell_inner
, n_rows_in_overlap_with_last
, n_rows_in_overleap_with_next
, overlap_startdate
, overlap_enddate
, second_new_startdate AS startdate
, second_new_enddate AS enddate
, unique_spell_reason
, 2 AS union_number
, COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
FROM pairs_indicated
WHERE second_new_startdate IS NOT NULL
)
SELECT
*
, SUM(priority_spell) OVER(PARTITION BY snz_uid, overlap_startdate, overlap_enddate) AS sum_priority
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 'enddate_snipped_outer_spell' WHEN this_spell_inner = 0
AND priority_spell = 0
AND enddate = old_enddate THEN 'startdate_snipped_outer_spell' WHEN this_spell_inner = 0
AND priority_spell = 1 THEN 'priority_outer_spell' WHEN this_spell_inner = 1
AND priority_spell = 0 THEN 'not_priority_inner_spell' WHEN this_spell_inner = 1
AND priority_spell = 1 THEN 'priority_inner_spell' WHEN priority_spell IS NULL THEN 'no_comparison_possible' ELSE NULL END AS inner_priority_reason
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 1
WHEN this_spell_inner = 0 AND priority_spell = 0 AND enddate = old_enddate THEN 1 WHEN this_spell_inner = 0
AND priority_spell = 1 THEN 1 WHEN this_spell_inner = 1
AND priority_spell = 0 THEN 0 WHEN this_spell_inner = 1
AND priority_spell = 1 THEN 1 WHEN priority_spell IS NULL THEN NULL ELSE NULL END AS inner_priority
INTO #inner_spells_indicated
FROM split_pairs a;
GO
PRINT('#inner_spells_indicated TABLE generated');
/*
BEGIN testing
select count(*) from #big_clean_spells
select count(*) from #inner_spells_indicated
select distinct inner_priority_reason from #inner_spells_indicated
select distinct unique_spell_reason from #inner_spells_indicated
select top 10 * from #big_clean_spells
select * from #inner_spells_indicated order by snz_uid, startdate, enddate
select * from #inner_spells_indicated join #global g on 1=1 where datediff(day, startdate, enddate) <= g.short
select union_number , count(*) from #inner_spells_indicated group by union_number
END testing
*/
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::::
2.2.2 Add back the first pass of prioritised inner spells
:::::::::::::::::::::::::::::::::::::::::::::::::::::
Now add these prioritised or split spells back onto the full dataset
*/
DROP TABLE IF EXISTS #inner_spells_added_back ;
WITH cte AS /* first attach the inner and outer spells back onto big clean spells */
(
SELECT a.*
, b.this_spell_inner
, b.next_spell_inner
, b.n_rows_in_overleap_with_next
, b.n_rows_in_overlap_with_last
, b.startdate AS new_startdate
, b.enddate AS new_enddate
, b.old_startdate
, b.old_enddate
, inner_priority
, inner_priority_reason
FROM #big_clean_spells a
LEFT JOIN #inner_spells_indicated b
ON a.snz_uid = b.snz_uid
AND a.startdate = b.old_startdate
AND a.enddate = b.old_enddate
AND a.schoolnbr = b.schoolnbr
)
, inner_spells_added_back AS /* change the dates to match the snipped dates, if snipped */
(
SELECT snz_uid
, CASE WHEN new_startdate IS NULL THEN startdate
ELSE new_startdate
END AS startdate
, CASE WHEN new_enddate IS NULL THEN enddate /* leave date unchanged if its not snipped */
ELSE new_enddate END AS enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, unique_spell_reason
, special_flag
, CASE WHEN this_spell_inner IS NULL THEN unique_spell_reason
WHEN this_spell_inner IS NOT NULL AND sum_priority = 2 THEN 'overlapping_priority' /* this should not happen anymore as prioritisation should always occur now */
ELSE inner_priority_reason
END AS unique_spell_reason_with_inner
, inner_priority
FROM cte
)
SELECT *
INTO #inner_spells_added_back
FROM inner_spells_added_back
JOIN #global g
ON 1 = 1
WHERE inner_priority = 1
OR inner_priority IS NULL /* filter off the de-prioritised inner spells: the deprioritised outer spells are snipped */
AND DATEDIFF(DAY, startdate, enddate) > g.short; /* remove short snipped bits */
GO
PRINT('#inner_spells_added_back TABLE generated');
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::::
2.2.3 Immediate neighbour inner spells - second pass
:::::::::::::::::::::::::::::::::::::::::::::::::::::
Steps 2.2.3 and 2.2.4 are intended to be identical to 2.2.1 and 2.2.2, but starting from #inner_spells_added_back instead of #big_clean_spells
In this way we are capturing any new nearest neighbour inner spells that exist due to eiher of the reasons (i)-(ii) discussed at the start of step 2.2.1
*/
DROP TABLE IF EXISTS #inner_spells_indicated_again ;
WITH no_duplicates AS(
SELECT snz_uid
, startdate
, enddate
, schoolnbr
, sch_enddate_imputed
, unique_spell_reason
, unique_spell_reason_with_inner
FROM #inner_spells_added_back
)
, overlaps_labelled AS(
SELECT *
, CASE WHEN LEAD(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) < no_duplicates.enddate THEN 1 ELSE 0 END AS next_spell_inner
, CASE WHEN LAG(no_duplicates.enddate) OVER(PARTITION BY no_duplicates.snz_uid ORDER BY no_duplicates.startdate, no_duplicates.enddate) > no_duplicates.enddate THEN 1 ELSE 0 END AS this_spell_inner
FROM no_duplicates
)
, only_overlaps AS(
SELECT *
, LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_startdate
, LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_enddate
, LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_enddate
, LAG(overlaps_labelled.schoolnbr) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_school
, CASE WHEN next_spell_inner = 1
THEN LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
WHEN this_spell_inner = 1
THEN startdate
ELSE NULL END AS overlap_startdate
, CASE WHEN next_spell_inner = 1
THEN LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
WHEN this_spell_inner = 1
THEN enddate ELSE NULL END AS overlap_enddate
FROM overlaps_labelled
WHERE(
this_spell_inner = 1
OR next_spell_inner = 1
)
)
, only_overlaps_with_attendance AS(
SELECT a.*
, b.moe_ssa_attendance_date
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 1 WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 0 END AS overlap_with_next_attendance_flag
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date
BETWEEN startdate
AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) THEN 1 WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) THEN 0 END AS overlap_with_last_attendance_flag
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date BETWEEN startdate AND enddate THEN 1 WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN startdate AND enddate THEN 0 END AS this_spell_attendance_flag
FROM only_overlaps a
LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b
ON a.snz_uid = b.snz_uid
AND a.schoolnbr = b.moe_ssa_provider_code
)
, summarised_overlaps_attendance AS(
SELECT snz_uid
, startdate
, enddate
, schoolnbr
, last_school
, unique_spell_reason
, unique_spell_reason_with_inner
, this_spell_inner
, next_spell_inner
, last_enddate
, next_startdate
, overlap_startdate
, overlap_enddate
, SUM(overlap_with_last_attendance_flag) AS n_rows_in_overlap_with_last_raw
, SUM(overlap_with_next_attendance_flag) AS n_rows_in_overleap_with_next_raw
, SUM(this_spell_attendance_flag) AS n_rows_in_range
FROM only_overlaps_with_attendance
GROUP BY snz_uid
, startdate
, enddate
, schoolnbr
, last_school
, unique_spell_reason
, unique_spell_reason_with_inner
, next_startdate
, last_enddate
, overlap_startdate
, overlap_enddate
, this_spell_inner
, next_spell_inner
)
, summarised_labelled_overlaps AS(
SELECT a.*
, COUNT(*) OVER(PARTITION BY snz_uid) AS paired_spells
, CASE WHEN next_spell_inner = 1
AND n_rows_in_overleap_with_next_raw IS NULL
THEN 0
ELSE n_rows_in_overleap_with_next_raw
END AS n_rows_in_overleap_with_next
, CASE WHEN this_spell_inner = 1
AND n_rows_in_overlap_with_last_raw IS NULL
THEN 0
ELSE n_rows_in_overlap_with_last_raw
END AS n_rows_in_overlap_with_last
FROM summarised_overlaps_attendance a
)
, pairs_indicated AS(
SELECT snz_uid
, startdate
, enddate
, schoolnbr
, last_school
, this_spell_inner
, next_spell_inner
, last_enddate
, next_startdate
, n_rows_in_overlap_with_last
, n_rows_in_overleap_with_next
, overlap_startdate
, overlap_enddate
, unique_spell_reason
, unique_spell_reason_with_inner
, startdate AS first_new_startdate
, CASE WHEN next_spell_inner = 1
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate))
THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate)
ELSE enddate END AS first_new_enddate
, CASE WHEN next_spell_inner = 1
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate))
THEN LEAD(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate)
ELSE NULL END AS second_new_startdate
, CASE WHEN next_spell_inner = 1
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid, overlap_startdate ORDER BY startdate))
THEN enddate ELSE NULL END AS second_new_enddate
, CASE WHEN next_spell_inner = 1
AND (n_rows_in_overleap_with_next < LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate))
THEN 0
WHEN next_spell_inner = 1
AND ((n_rows_in_overleap_with_next >= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate))
OR LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate) IS NULL)
THEN 1
ELSE NULL END AS priotity_spell
, CASE WHEN this_spell_inner = 1
AND (n_rows_in_overlap_with_last <= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)
OR n_rows_in_overlap_with_last IS NULL )
THEN 0
WHEN this_spell_inner = 1
AND last_school = schoolnbr
THEN 0
WHEN this_spell_inner = 1 AND last_school <> schoolnbr
AND (n_rows_in_overlap_with_last >= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate))
THEN 1
ELSE NULL END AS priority_spell_inner
FROM summarised_labelled_overlaps
)
, split_pairs AS(
SELECT snz_uid
, startdate AS old_startdate
, enddate AS old_enddate
, schoolnbr
, this_spell_inner
, next_spell_inner
, n_rows_in_overlap_with_last
, n_rows_in_overleap_with_next
, overlap_startdate
, overlap_enddate
, first_new_startdate AS startdate
, first_new_enddate AS enddate
, unique_spell_reason
, unique_spell_reason_with_inner
, COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
FROM pairs_indicated
UNION ALL
SELECT snz_uid
, startdate AS old_startdate
, enddate AS old_enddate
, schoolnbr
, this_spell_inner
, next_spell_inner
, n_rows_in_overlap_with_last
, n_rows_in_overleap_with_next
, overlap_startdate
, overlap_enddate
, second_new_startdate AS startdate
, second_new_enddate AS enddate
, unique_spell_reason
, unique_spell_reason_with_inner
, COALESCE(priotity_spell, priority_spell_inner) AS priority_spell
FROM pairs_indicated
WHERE second_new_startdate IS NOT NULL
)
SELECT *
, SUM(priority_spell) OVER(PARTITION BY snz_uid, overlap_startdate, overlap_enddate) AS sum_priority
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 'enddate_snipped_outer_spell'
WHEN this_spell_inner = 0 AND priority_spell = 0 AND enddate = old_enddate THEN 'startdate_snipped_outer_spell'
WHEN this_spell_inner = 0 AND priority_spell = 1 THEN 'priority_outer_spell'
WHEN this_spell_inner = 1 AND priority_spell = 0 THEN 'not_priority_inner_spell'
WHEN this_spell_inner = 1 AND priority_spell = 1 THEN 'priority_inner_spell'
WHEN priority_spell IS NULL THEN 'no_comparison_possible'
ELSE NULL END AS inner_priority_reason_again
, CASE WHEN this_spell_inner = 0 AND priority_spell = 0 AND startdate = old_startdate THEN 1
WHEN this_spell_inner = 0 AND priority_spell = 0 AND enddate = old_enddate THEN 1
WHEN this_spell_inner = 0 AND priority_spell = 1 THEN 1
WHEN this_spell_inner = 1 AND priority_spell = 0 THEN 0
WHEN this_spell_inner = 1 AND priority_spell = 1 THEN 1
WHEN priority_spell IS NULL THEN NULL
ELSE NULL END AS inner_priority_again
INTO #inner_spells_indicated_again
FROM split_pairs;
GO
PRINT('#inner_spells_indicated_again TABLE generated');
/*
::::::::::::::::::::::::::::::::::::::::::
2.2.4 Add the inner spells back agaig to the main dataset
::::::::::::::::::::::::::::::::::::::::::
This step is equivalent to 2.2.2 above, but this is the second pass
*/
DROP TABLE IF EXISTS #inner_spells_added_back_again
;WITH cte AS
(
SELECT
a.*
, this_spell_inner
, next_spell_inner
, n_rows_in_overleap_with_next
, n_rows_in_overlap_with_last
, b.startdate AS new_startdate
, b.enddate AS new_enddate
, b.old_startdate
, b.old_enddate
, inner_priority_again
, inner_priority_reason_again
FROM #inner_spells_added_back a
LEFT JOIN #inner_spells_indicated_again b
ON a.snz_uid = b.snz_uid
AND a.startdate = b.old_startdate AND a.enddate = b.old_enddate
AND a.schoolnbr = b.schoolnbr
)
, inner_spells_added_back_again AS
(
SELECT
snz_uid
,CASE WHEN new_startdate IS NULL THEN startdate
ELSE new_startdate
END AS startdate
, CASE WHEN new_enddate IS NULL THEN enddate
ELSE new_enddate
END AS enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, unique_spell_reason
, unique_spell_reason_with_inner
, special_flag
, CASE WHEN this_spell_inner IS NULL THEN unique_spell_reason_with_inner
ELSE inner_priority_reason_again
END AS unique_spell_reason_with_inner_again
, inner_priority_again
FROM cte
)
SELECT a.*
INTO #inner_spells_added_back_again
FROM inner_spells_added_back_again a
JOIN #global g ON 1=1
WHERE inner_priority_again IS NULL OR inner_priority_again = 1
AND DATEDIFF(DAY, startdate, enddate) > g.short;
GO
PRINT('#inner_spells_added_back_again TABLE generated');
/*
BEGIN IMPORTANT TEST
;with cte as
(
select *
, case when lead(enddate) over(partition by snz_uid order by startdate, enddate) < enddate then 1 else 0 end as next_spell_inner
, case when lag(enddate) over(partition by snz_uid order by startdate, enddate) > enddate then 1 else 0 end as this_spell_inner
from #inner_spells_added_back_again
)
select
distinct snz_uid
from cte
where snz_uid in (select distinct snz_uid from cte where this_spell_inner = 1)
order by snz_uid, startdate, enddate;
! NEEDS TO BE 0 FOR SCRIPT TO BE VALID !
The idea is that we have not generated any new inner spells on this round of deduplicating.
If there are more inner spells present, we need to iterate one more time to deduplicate it
END testing
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
2.2.5 Create clean input table for the next step
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Now clean up this mess, ready for input to the generic overlaps step
*/
DROP TABLE IF EXISTS #input_to_generic_overlaps_indicated;
SELECT
snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, unique_spell_reason_with_inner_again AS unique_spell_reason
, special_flag
INTO #input_to_generic_overlaps_indicated
FROM #inner_spells_added_back_again;
GO
PRINT('#input_to_generic_overlaps_indicated TABLE generated');
/*
BEGIN check
select count(*) from #input_to_generic_overlaps_indicated
x has a level 2 deep overlap
select * from #big_clean_spells
where snz_uid = x order by startdate, enddate
select * from #input_to_generic_overlaps_indicated
where snz_uid = x order by startdate, enddate
next school is the same school, even if there is a gap over summer, needs to be filled
END check
*/
/*
::::::::::::::::::::::::::::::::::::::::::::
2.3. de-deuplication of non-inner spells
::::::::::::::::::::::::::::::::::::::::::::
This final prioritisation step!
This step is easier than the inenr spells bit
The only possible complication would be if a chain of overlapped spells resulted in a negative length spell by snipping it at both ends.
Its possible that the code will naturally deal with that in any case
*/
DROP TABLE IF EXISTS #generic_overlaps_idicated
;WITH overlaps_labelled AS /* label the consecutive spells that have an overlap */
(
SELECT
*
, CASE WHEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) < enddate THEN 1 ELSE 0 END AS overlap_with_next_startdate
, CASE WHEN LAG(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) > startdate THEN 1 ELSE 0 END AS overlap_with_last_enddate
FROM #input_to_generic_overlaps_indicated
)
, only_overlaps AS /* filter to only those spells that are overlapped */
(
SELECT
*
, LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_startdate
, LEAD(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS next_enddate
, LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate) AS last_enddate
, CASE WHEN overlap_with_next_startdate = 1
THEN LEAD(overlaps_labelled.startdate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
WHEN overlap_with_last_enddate = 1
THEN startdate
ELSE NULL END AS overlap_startdate
, CASE WHEN overlap_with_next_startdate = 1
THEN enddate
WHEN overlap_with_last_enddate = 1
THEN LAG(overlaps_labelled.enddate) OVER(PARTITION BY overlaps_labelled.snz_uid ORDER BY overlaps_labelled.startdate)
ELSE NULL END AS overlap_enddate
FROM overlaps_labelled
WHERE (overlap_with_next_startdate = 1 OR overlap_with_last_enddate = 1)
)
, only_overlaps_with_attendance AS /* join the attendance date to these overlapped spells, add flags when attendance lies in the various date ranges */
(
SELECT
a.*
, b.moe_ssa_attendance_date
/*------------------------------------------*/
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL /* if no attendance records at all for that school thrn null */
WHEN moe_ssa_attendance_date IS NOT NULL AND /* if attendance records do exist */
moe_ssa_attendance_date BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) /* and attendenace record for that school and its in the right date range */
THEN 1 /* then label as good */
WHEN moe_ssa_attendance_date IS NOT NULL /* if attendance records do exist */
AND moe_ssa_attendance_date NOT BETWEEN startdate AND IIF(last_enddate <= enddate OR last_enddate IS NULL, last_enddate, enddate) /* and attendance record is not in the right date range */
THEN 0 /* then label it as bad */
END AS overlap_with_last_attendance_flag
/*------------------------------------------*/
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date BETWEEN startdate AND enddate THEN 1
WHEN moe_ssa_attendance_date IS NOT NULL AND moe_ssa_attendance_date NOT BETWEEN startdate AND enddate THEN 0
END AS this_spell_attendance_flag
/*------------------------------------------*/
, CASE WHEN moe_ssa_attendance_date IS NULL THEN NULL
WHEN moe_ssa_attendance_date IS NOT NULL AND
moe_ssa_attendance_date BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 1
WHEN moe_ssa_attendance_date IS NOT NULL
AND moe_ssa_attendance_date NOT BETWEEN next_startdate AND IIF(enddate <= next_enddate, enddate, next_enddate) THEN 0
END AS overlap_with_next_attendance_flag
/*------------------------------------------*/
FROM only_overlaps a
LEFT JOIN $(idicleanversion).[moe_clean].[school_student_attendance] b
ON a.snz_uid = b.snz_uid AND a.schoolnbr = b.moe_ssa_provider_code
)
, summarised_overlaps_attendance AS /* summarise the overlapped attendance situation */
(
SELECT
snz_uid
, startdate
, enddate
, schoolnbr
, unique_spell_reason
, overlap_with_next_startdate
, overlap_with_last_enddate
, next_startdate
, last_enddate
, overlap_startdate
, overlap_enddate
, SUM(overlap_with_next_attendance_flag) AS n_rows_in_overleap_with_next
, SUM(overlap_with_last_attendance_flag) AS n_rows_in_overlap_with_last
, SUM(this_spell_attendance_flag) AS n_rows_in_range
FROM only_overlaps_with_attendance
GROUP BY snz_uid
, startdate
, enddate
, schoolnbr
, unique_spell_reason
, overlap_with_next_startdate
, overlap_with_last_enddate
, next_startdate
, last_enddate
, overlap_startdate
, overlap_enddate
)
, summarised_labelled_overlaps AS /* count chain of overlapped spells */
(
SELECT
a.*
, COUNT(*) OVER(PARTITION BY snz_uid) AS paired_spells
FROM summarised_overlaps_attendance a
LEFT JOIN #att_enrol_coverage b
ON a.schoolnbr = b.schoolnbr AND YEAR(a.startdate) = b.start_year
)
, pairs_of_overlapped AS
(
SELECT
a.snz_uid
, a.startdate
, a.enddate
, a.schoolnbr
, a.overlap_with_next_startdate
, a.overlap_with_last_enddate
, next_startdate
, last_enddate
, CASE WHEN overlap_with_next_startdate = 1
AND n_rows_in_overleap_with_next IS NULL
THEN 0 /* replace null with zero when the school legit does have attendance records somewhere */
ELSE n_rows_in_overleap_with_next
END AS n_rows_in_overleap_with_next
, CASE WHEN overlap_with_last_enddate = 1
AND n_rows_in_overlap_with_last IS NULL
THEN 0 /* replace null with zero when the school legit does have attendance records somewhere */
ELSE n_rows_in_overlap_with_last
END AS n_rows_in_overlap_with_last
, n_rows_in_range
, paired_spells
, unique_spell_reason
, overlap_startdate
, overlap_enddate
FROM summarised_labelled_overlaps a
)
, pairs_indicated AS
(
SELECT
*
, CASE WHEN overlap_with_last_enddate = 1 /* if there is an overlap with the previous spell */
AND (n_rows_in_overlap_with_last < LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and its not prioritised by attendenace*/
THEN LAG(enddate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the startdate to be the previous enddate */
ELSE startdate END AS new_startdate
, CASE WHEN overlap_with_next_startdate = 1 /* if there is an overlap with the next spell */
AND (n_rows_in_overleap_with_next <= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and its not prioritised by attendenace*/
THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate) /* then clip the enddate to the next startdate */
ELSE enddate END AS new_enddate
, CASE WHEN overlap_with_next_startdate = 1 /* if there is an overlap with the next spell */
AND (n_rows_in_overleap_with_next <= LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and its not prioritised by attendenace */
THEN 0
WHEN overlap_with_next_startdate = 1 -- if there is an overlap with the next spell
AND (n_rows_in_overleap_with_next > LEAD(n_rows_in_overlap_with_last) OVER(PARTITION BY snz_uid ORDER BY startdate)) /* and it IS prioritised*/
THEN 1
ELSE NULL END AS priority_spell_2
, CASE WHEN overlap_with_last_enddate = 1 /* if there is an overlap with the next spell */
AND (n_rows_in_overlap_with_last < LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate))
THEN 0
WHEN overlap_with_last_enddate = 1
AND (n_rows_in_overlap_with_last >= LAG(n_rows_in_overleap_with_next) OVER(PARTITION BY snz_uid ORDER BY startdate))
THEN 1
ELSE NULL END AS priority_spell
FROM pairs_of_overlapped
)
, split_pairs AS
(
SELECT
snz_uid
, startdate AS old_startdate
, enddate AS old_enddate
, schoolnbr
, overlap_with_next_startdate
, overlap_with_last_enddate
, n_rows_in_overleap_with_next
, n_rows_in_overlap_with_last
, new_startdate AS startdate
, new_enddate AS enddate
, overlap_startdate
, overlap_enddate
, priority_spell AS priority_spell_raw
, priority_spell_2 AS priority_spell_2_raw
, COALESCE(priority_spell, priority_spell_2) AS priority_spell
FROM pairs_indicated
)
SELECT
*
, SUM(priority_spell) OVER(PARTITION BY snz_uid, overlap_startdate, overlap_enddate) AS sum_priority
,CASE WHEN overlap_with_last_enddate = 0 AND priority_spell = 0 THEN 'enddate_clipped_generic_overlap'
WHEN overlap_with_last_enddate = 0 AND priority_spell = 1 THEN 'priority_generic overlap'
WHEN overlap_with_last_enddate = 1 AND priority_spell = 0 THEN 'startdate_clipped_generic_overlap'
WHEN overlap_with_last_enddate = 1 AND priority_spell = 1 THEN 'priority_generic overlap'
WHEN priority_spell IS NULL THEN 'no_comparison_possible'
ELSE NULL END AS priority_reason
INTO #generic_overlaps_idicated
FROM split_pairs;
GO
PRINT('#generic_overlaps_idicated TABLE generated');
/*
:::::::::::::::::::::::::::::::::::::::::::::::::::::::
2.3.2 Adding back the transformed non-inner generic overlaps
:::::::::::::::::::::::::::::::::::::::::::::::::::::::
Adding back the generic overlaps to the main dataset
*/
DROP TABLE IF EXISTS #generic_spells_added_back
;WITH cte AS
(
SELECT
a.*
, b.overlap_with_next_startdate
, b.overlap_with_last_enddate
, b.startdate AS new_startdate
, b.enddate AS new_enddate
, b.old_startdate
, b.old_enddate
, priority_spell
, sum_priority
, priority_reason
FROM #input_to_generic_overlaps_indicated a
LEFT JOIN #generic_overlaps_idicated b
ON a.snz_uid = b.snz_uid
AND a.startdate = b.old_startdate AND a.enddate = b.old_enddate
AND a.schoolnbr = b.schoolnbr
)
, generic_spells_added_back AS
(
SELECT
snz_uid
,CASE WHEN new_startdate IS NULL THEN startdate
ELSE new_startdate
END AS startdate
, CASE WHEN new_enddate IS NULL THEN enddate
ELSE new_enddate
END AS enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, CASE WHEN overlap_with_last_enddate IS NULL THEN unique_spell_reason
WHEN overlap_with_last_enddate IS NOT NULL AND sum_priority = 2 THEN 'overlapping_priority'
ELSE priority_reason
END AS unique_spell_reason
, special_flag
FROM cte
)
SELECT *
INTO #generic_spells_added_back
FROM generic_spells_added_back
JOIN #global g ON 1 = 1
WHERE DATEDIFF(DAY, startdate, enddate) > g.short;
GO
PRINT('#generic_spells_added_back TABLE generated');
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
3. Appending the spells that have no attendance data associated
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Initially in step x.x we removed all spells that were not suseptible to an attendance confrontation due to missing attendance data
We now add these spells back
input table check:
select * from #big_clean_spells
select * from #generic_spells_added_back
select * from #student_enrol_removed_spells
*/
DROP TABLE IF EXISTS #null_att_added_back;
SELECT
snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, 0 AS att_footprint_year
, unique_spell_reason
, CASE WHEN schoolnbr = 498 THEN 'correspondance'
WHEN schoolnbr = 972 THEN 'homeschool'
ELSE NULL END AS special_flag
INTO #null_att_added_back
FROM #generic_spells_added_back
UNION ALL
SELECT
snz_uid
, startdate
, enddate
, start_level
, moe_esi_leave_rsn_code
, schoolnbr
, sch_enddate_imputed
, att_footprint_year
, 'no_attendance_data' AS unique_spell_reason
, special_flag
FROM #student_enrol_removed_spells;
GO
PRINT('#null_att_added_back TABLE generated');
/*
BEGIN check
select count(*) from #null_att_added_back
END check
*/
/*
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
4. Adding derived metadata to the full table
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Now that we have out fully cleaned table, we can compute all the relevant info, to see where we have gaps etc.
*/
/*
::::::::::::::::::::::::
4.1 Adding it back in initially
::::::::::::::::::::::::
*/
DROP TABLE IF EXISTS #big_clean_spells_info;
;WITH cte AS
(
SELECT
*
, DATEDIFF(DAY, startdate, enddate) AS enrol_days
, ROW_NUMBER() OVER(PARTITION BY snz_uid ORDER BY startdate, enddate) AS rn /* should all now be unique startdays */
FROM #null_att_added_back
)
SELECT
cte.snz_uid
, cte.startdate
, cte.enddate
, cte.enrol_days
, cte.schoolnbr
, CASE WHEN (cte.unique_spell_reason IN ('enddate_overhang',
'startdate_snipped_outer_spell',
'startdate_clipped_generic_overlap')) THEN cte.unique_spell_reason
ELSE CAST(start_level AS VARCHAR) END AS start_level
, cte.rn /* we need this when adding gaps back in */
, unique_spell_reason
, special_flag
, CASE WHEN (cte.unique_spell_reason IN ('startdate_overhang',
'enddate_sniped_outer_spell',
'enddate_clipped_generic_overlap')) THEN cte.unique_spell_reason
ELSE b.ReasonForLeavingDescription END AS reason_for_leaving
, CASE WHEN LEAD(cte.startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) < cte.enddate THEN 1 ELSE 0 END AS overlap_with_next_startdate
/*------------------------------------------*/
, CASE WHEN LEAD(cte.startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) < cte.enddate
AND LEAD(cte.enddate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) < cte.enddate
THEN 1 ELSE 0 END AS next_spell_inner
, CASE WHEN LAG(cte.enddate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) > cte.enddate
AND LAG(cte.enddate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) > cte.startdate
THEN 1 ELSE 0 END AS this_spell_inner
/*------------------------------------------*/
, CASE WHEN LEAD(cte.startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn) > cte.enddate THEN 1 ELSE 0 END AS gap_to_next_startdate
, DATEDIFF(DAY, enddate, LEAD(startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn)) AS days_to_next_spell
, cte.sch_enddate_imputed
, CASE WHEN YEAR(enddate) + 1 = YEAR(LEAD(startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn))
AND DATEDIFF(DAY, enddate, LEAD(startdate) OVER(PARTITION BY cte.snz_uid ORDER BY rn)) < g.max_summer_break
THEN 1 ELSE 0 END AS old_year_old_school
INTO #big_clean_spells_info
FROM cte
LEFT JOIN $(idimetaversion).[moe_school].[rsn_for_leaving_code] b
ON cte.moe_esi_leave_rsn_code = b.ReasonForLeavingId
JOIN #global g ON 1=1
JOIN $(idicleanversion).[data].[personal_detail] p
ON p.snz_uid = cte.snz_uid
ORDER BY cte.snz_uid, rn;
GO
PRINT('#big_clean_spells_info TABLE generated');
/*
BEGIN check
select * from #big_clean_spells_info order by enrol_days
select count(*) from #big_clean_spells_info
END check
*/
/*
:::::::::::::::::::::::::::::::::::::::
4.2 Remove short spells again
:::::::::::::::::::::::::::::::::::::::
*/
DROP TABLE IF EXISTS #big_clean_spells_info_no_short_gaps
;WITH cte AS
(
SELECT
a.*
, CASE WHEN days_to_next_spell BETWEEN 1 AND g.short
THEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate, enddate)
ELSE enddate
END AS new_enddate
/*, case when days_to_next_spell between 1 and g.short
then 1
else NULL
end as changed_enddate */
FROM #big_clean_spells_info a
JOIN #global g ON 1=1
)
SELECT
snz_uid
, startdate
, new_enddate AS enddate
, DATEDIFF(DAY, startdate, new_enddate) AS enrol_days
, schoolnbr
, start_level
, rn
, unique_spell_reason
, special_flag
, reason_for_leaving
, next_spell_inner
, this_spell_inner
, overlap_with_next_startdate
, CASE WHEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY rn) > new_enddate THEN 1 ELSE 0 END AS gap_to_next_startdate
, DATEDIFF(DAY, new_enddate, LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY rn)) AS days_to_next_spell
, sch_enddate_imputed
, old_year_old_school
INTO #big_clean_spells_info_no_short_gaps
FROM cte;
GO
PRINT('#big_clean_spells_info_no_short_gaps TABLE generated');
/*
BEGIN check
select count(*) from #big_clean_spells_info
select count(*) from #big_clean_spells_info_no_short_gaps
select * from #big_clean_spells_info_no_short_gaps where days_to_next_spell is not null and days_to_next_spell > 0 order by days_to_next_spell
where changed_enddate is not null
select * from #big_clean_spells_info order by snz_uid, startdate, enddate
select * from #big_clean_spells_info where overlap_with_next_startdate > 0
when null att is added back there are now overlaps
select
snz_uid
, startdate
, enddate
, enrol_days
, schoolnbr
, start_level
--, age_at_start
, unique_spell_reason
, reason_for_leaving
, overlap_with_next_startdate
--, gap_to_next_startdate
, next_spell_inner
, this_spell_inner
, days_to_next_spell
, sch_enddate_imputed
, old_year_old_school
from #big_clean_spells_info_no_short_gaps
where snz_uid in (select distinct snz_uid from #big_clean_spells_info where overlap_with_next_startdate > 0)
order by snz_uid, startdate, enddate -- x affected spells
Can we deal with them in some way? FUTURE IMPROVEMENT TODO
END check
*/
/*
::::::::::::::::::::::::::::::::::::
5. Build the 'complete' spells table including labelled gaps in enrolment
::::::::::::::::::::::::::::::::::::
This step consists of two seperate steps
(i) Find the gaps
(ii) Add them back in
*/
/*
:::::::::::::::::::::::::::::::::
5.1 Find the gaps in enrolment that remain
:::::::::::::::::::::::::::::::::
just the gaps that remain
input table check:
select * from #big_clean_spells_info_no_short_gaps
*/
DROP TABLE IF EXISTS #enrolment_gaps;
SELECT
snz_uid
, enddate AS startdate
, DATEADD(DAY, days_to_next_spell, enddate) AS enddate
, days_to_next_spell AS enrol_days
, NULL AS schoolnbr
, NULL AS start_level
, rn+1 AS rn
, CASE WHEN old_year_old_school = 1 THEN 'School change OVER Summer' ELSE 'New Enrolment' END AS reason_for_leaving
, 0 AS this_spell_inner
, 1 AS enrolment_gap
, 0 AS sch_enddate_imputed
, CASE WHEN old_year_old_school = 1 THEN 1 ELSE 0 END AS summer_break
INTO #enrolment_gaps
FROM #big_clean_spells_info_no_short_gaps
WHERE gap_to_next_startdate = 1;
GO
PRINT('#enrolment_gaps TABLE generated');
/*
:::::::::::::::::::::::::::::::::::::
5.2. append the 'gap' spells to the main dataset
:::::::::::::::::::::::::::::::::::::
The resulting table has no missing dates, every student has a row that covers any given day
From the first day of their enroilment until max_attendance (or extraction_date if we're using that)
*/
DROP TABLE IF EXISTS #complete_spells
;WITH cte AS
(
SELECT
snz_uid
, startdate
, enddate
, enrol_days
, schoolnbr
, start_level
, reason_for_leaving
, unique_spell_reason
, special_flag
, this_spell_inner
, 0 AS enrolment_gap
, sch_enddate_imputed
, 0 AS summer_break
FROM #big_clean_spells_info_no_short_gaps
UNION ALL
SELECT
snz_uid
, startdate
, enddate
, enrol_days
, schoolnbr
, CAST(start_level AS VARCHAR) AS start_level
, reason_for_leaving
, CASE WHEN summer_break = 1 THEN 'summer_break'
ELSE 'enrolment_gap' END AS unique_spell_reason
, NULL AS special_flag
, this_spell_inner
, enrolment_gap
, sch_enddate_imputed
, summer_break
FROM #enrolment_gaps
)
SELECT
snz_uid
, startdate
, enddate
, enrol_days
, schoolnbr
, unique_spell_reason
, special_flag
, start_level
, CASE WHEN sch_enddate_imputed = 1 THEN 'Spell Enddate Imputed' ELSE reason_for_leaving END AS spell_end_reason
, this_spell_inner
, CASE WHEN LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate) < enddate THEN 1 ELSE 0 END AS overlap_with_next_startdate
, CASE WHEN DATEDIFF(DAY, enddate, LEAD(startdate) OVER(PARTITION BY snz_uid ORDER BY startdate)) IS NULL THEN 1
ELSE 0 END AS final_spell
, enrolment_gap
, sch_enddate_imputed
, summer_break
INTO #complete_spells
FROM cte;
GO
PRINT('#complete_spells TABLE generated');
/*
BEGIN check
select distinct days_to_next_spell from #complete_spells
select distinct this_spell_inner from #complete_spells
select distinct gap_to_next_startdate from #complete_spells
select distinct overlap_with_next_startdate from #complete_spells
select * from #complete_spells order by snz_uid, startdate
select * from #complete_spells
where snz_uid in (select distinct snz_uid from #complete_spells where overlap_with_next_startdate = 1)
order by snz_uid, startdate, enddate -- x spells contaminated by overlaps in one of the spells for a snz_uid
select distinct unique_spell_reason from #complete_spells where schoolnbr is null -- enrolment_gap
END check
*/
/*
:::::::::::::::::::::::::::
END OF TABLE BUILD SECTION
:::::::::::::::::::::::::::
*/
DROP TABLE IF EXISTS $(targetdb).$(targetschema).$(projprefix)_school_enrolment_spells;
SELECT *
INTO $(targetdb).$(targetschema).$(projprefix)_school_enrolment_spells
FROM #complete_spells;
/*
TABLES TO DROP
drop table if exists #big_clean_spells
drop table if exists #rec_cte
drop table if exists #big_clean_spells_info_all_inner_removed
drop table if exists #big_clean_spells_inner_spells
drop table if exists #duplicate_enddate_indicated
drop table if exists #duplicate_startdate_indicated
drop table if exists #duplicate_spells_indicated
drop table if exists #enrolment_gaps
drop table if exists #no_duplicate_spells
--some new ones I made since last update
drop table if exists #att_enrol_coverage
drop table if exists #big_clean_spells_info
drop table if exists #big_clean_spells_info_no_short_gaps
drop table if exists #closed_schools
drop table if exists #generic_overlaps_idicated
drop table if exists #generic_spells_added_back
drop table if exists #global
drop table if exists #inner_spells_added_back
drop table if exists #inner_spells_added_back_again
drop table if exists #inner_spells_indicated
drop table if exists #inner_spells_indicated_again
drop table if exists #input_to_generic_overlaps_indicated
drop table if exists #multiple_lags
drop table if exists #null_att_added_back
drop table if exists #student_enrol_all
drop table if exists #student_enrol_all_indicated
drop table if exists #student_enrol_all_without_att_check
drop table if exists #student_enrol_removed_spells
drop table if exists #tmp_student_sample
*/