Total income module collection: Wages and salary

ivan.welsh
24 June 2024

Module output

SQL::[IDI_Community].[cm_read_income_t1_wages_salary].income_t1_wages_salary_YYYYMM
SAS: libname cmti ODBC dsn=idi_community_srvprd schema=cm_read_income_t1_wages_salary; proc print data = cmti.income_t1_wages_salary_YYYYMM
How to access a code module in the Data Lab : Read here

Purpose of Wages and Salary module

The purpose of this module is to construct spells of income from wages and salary for people in employed in New Zealand.

Note: Wages and Salary are paid by NZ registered employers to its employees. This does not imply the employees are physically in New Zealand.

Key concepts

Wages and salary income

Wages and salary income is income received from all current and previous wage and salary jobs held over the reference period, and any job-related bonuses, commissions, redundancies or other taxable income such as honoraria or directors fees.

Total income spell

An income spell is the minimum payment period per income type. Wages and salary in practice may be paid in a daily, weekly, fortnightly, monthly spell etc. Employers do not need to file if they do not pay their employees during their regular pay cycle. See Payday filing for more information on payday filing.

Source frequency

Source frequency is the frequency with which an income type is stored in the source data. Source frequency for wages and salary in the ir_ems table is monthly.

Gross income amount

Gross income amount is the taxable income, before tax, that a person earns.

Observed tax

Tax recorded as paid to IRD as part of the income filing process. Observed tax data for wages and salary is available in the IR EMS table as PAYE. This is the amount of tax that has been has been deducted from source (eg PAYE and tax filing).
Note: observed tax deduction is not the same as actual tax liability since this can only be calculated once all income sources have been identified over a tax period. This information has not been included in the initial version of the module.

Granularity

The modules are setup to be event format: One row per event, events have a person, start and end dates, and descriptive information. The variables below will uniquely identify each event in the above table:

Entity id: snz uid
Period: start end, end date
Event information: income source, taxable income indicator

Data sources

Employer monthly schedule: ir_clean.ird_ems

Employer monthly schedule (EMS) table contains employee level data from the EMS return. EMS is available in the LBD/IDI. This is a monthly tax return filed by all businesses with paid employees, which summarises the monthly wage and salary payments made to each of their employees, and the ‘pay-as-you-earn’ (PAYE) income tax deductions made. EMS is a mandatory reporting requirement for all employing firms.

Since April 2019, Inland Revenue has shifted from monthly filings under the EMS to compulsory payday filing under the Employment Information - Employee (EI-E) system. This follows a transition period since April 2018 in which firms could select whether to file monthly or through the payday filing system. For consistency, Stats NZ has made a processed version of the EI-E data available, allocating payday information to calendar months and including these records as part of the EMS tables.

Note:

IR3: ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014, ir_ir3_2013_to_2020

IR3 is the income tax return for individuals used to confirm the amount of personal income tax to be paid at the end of each tax year with data available from March 2000 to present. The main purpose of IR3 is to include any monetary payments that have not been taxed at source. IR3 is used for self-employment (filed annually by sole traders) and includes non-zero partnership, or shareholder salary income, as well as rental income. IR3 is mandatory for individuals who earned more than $200 (before tax) income other than salary, wages, interest, dividends, and/or taxable Māori authority distributions. IR3 is filed annually.

There may be a substantial lag in reporting IR3 data as individual businesses and those using tax agents have up to a year after the end of the financial year to file an annual tax return with IR.

IR3 data comes from a different table to the EIE data but is still direct information submitted by a customer

Note: 3 IR2 tables (2 ad-hoc tables and 1 clean) are required for creating the timeseries of IR3 data. IR3s belonging to people who reported no self-employment, partnership or shareholder income were excluded for tax years prior to 2019. They are in the ad-hoc tables.

IR4: ird_attachments_ir4s

IR4 is filed by companies and includes remuneration income paid to shareholders, directors, and relatives of shareholders with data available from March 1995 to present. This table holds information on non-zero shareholder income and shows non-PAYE salary payments made to working proprietors of businesses. IR4 and IR4s is required from all active NZ resident companies. IR4S is filed annually.

IR4 data comes from a different table to the EIE data but is still direct information submitted by a customer

IR20: ird_attachments_ir20

IR7 (formerly IR20) is filed for partnership and look-through companies with data available from March 1995 to present. It is filed annually. This table contains information about zero and non-zero partnership income

ird_customers

The IR customers file records all Inland Revenue’s entities, their description, status (active, deceased, etc), commencement/cessation, student loan/tax residency status and business industry code where relevant.

A full file is extracted each month containing all new records and changes to existing records.
The information in this file is not derived from an IR form. It is a combination of different fields that are updated when this specific customer information changes. IR can be notified by post, in person, by phone, email, myIR, secure mail.

The population is all entities that engage with IR. At the highest level it can be:

ird_tax_registrations

The IR tax registration file records the status of an entities tax type registration with IR. It contains start/end dates, tax return filing frequencies, if the entity has a tax agent, if they are a corporate entity and if they file electronically.
The population is all entities that engage with IR. At the highest level it can be:

ird_cross_reference

The IR cross reference file links two IRD numbers, documents the period they are linked and describes the relationship between them.

The information in this file is not derived from an IR form. It is a combination of different fields that are updated when relationships between entities change. A change can be submitted by post, in person, by phone, email, myIR, secure mail.

The population is individual to business and business to business relationships between IR entities

Practical notes

Period

Wages and Salary information is from 1999 onwards.

Identification of Earnings not liable to ACC levy and bonuses

All employees must pay an ACC earners’ levy to cover the cost of non work-related injuries. IR collect this on behalf of the ACC. The earners’ levy has been built into the PAYE tables and is deducted along with the tax. Almost all earnings subject to PAYE are liable for the levy with the exception of schedular payments, retirement payments, redundancy payments, benefits from an employee share scheme, jury fees, witness fees, and taxable pensions. See IR employers guide for more information. Inclusion of earnings not liable for ACC levy is not essential for this module. Additionally, we will not identify bonus in this module as that is not easy to identify in IDI due to quality of the lumpsum payment flag, as not all employers update it consistently.

Other exclusions from this module

We are not able to identify and quantify income support payment off-sets, debt repayments and re-directions.

Negative amounts

No exclusions made from EMS for negative amounts.

Seasonal workers identification

Not applicable total income module. Refer to employment spells module for more information.

Staged approach to include tax within the total income module

Stage 1 of module creation will only include gross income amounts. Stage 2 (pending) will include observed tax deducted at source since they are available in EMS and MSD tables. This can be used to calculate the tax at a tax year level which includes tax deducted at source and from tax filings. Stage 3 (pending) aims to derive tax at a monthly level. This is complex and will require support from SME to apply rules to derive tax. This can be used to calculate tax for any time period.

Sources of validation

Wage and salary distributions for individuals.xls (Wage and salary statistics datasets)
The module output matches well with the validation dataset

References

  1. IDI Derived tables https://vhin.co.nz/wp-content/uploads/2017/04/data-dictionary-for-ir-derived-tables-in-the-idi.docx
  2. Document : experimental-income-estimates-from-linked-admin-data-methods-and-results-revised.pdf Experimental estimates of income from linked administrative data | Stats NZ
  3. Payday filing
  4. Tax codes for individuals (ird.govt.nz) About tax codes
  5. Register as an employer (ird.govt.nz) Register as an employer
  6. Tax schedular payments and file employment information (ird.govt.nz) (Tax schedular payments and file employment information)

Key Contacts

Role Organisation Person
Dev lead MBIE Eric Krassoi Peach
Module Coder Nicholson Consulting Linda Martis
Code Supply Stats NZ
MSD
Independent contractor
Ivan Welsh
Marc de Boer
Keith McLeod
Peer review (code) Stats NZ Rodney Jer
Peer review (documentation) IRD
IRD
Joanne Butterfield
Scott Anderson
Module Steward TBD TBD

Module business rules

Defining spell start and end dates

Start and end dates are based on EMS table. The return period column in EMS is considered to be the end date and the start of that month will be the start date.

Person identifier snz_uid

A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Minimum income threshold

Currently no minimum threshold applied.

Interoperability with other modules

This module is in the business landscape so could be that it is most closely associated with business profile modules and employment spells. It is also in the socioeconomic landscape where income is a determinant of positive social outcomes, so associated with social housing, education modules, health modules, driving modules, employment assistance modules etc.

Taxable income indicator

Taxable income indicator to be hard coded in modules. Within wages and salary module, all income is taxable.

Code logic

The code is based on the rules used for APC and derived data tables in the IDI.
Steps 1 to 3 are mainly collating data from various IDI source tables. Rest of the steps are identifying wages and salary and excluding self-employment paye deducted income.

  1. We use ir_ems table to obtain wages and salary data with the intention of identifying wages and salary income and excluding self-employed income , i.e, sole traders/partners/directors earning PAYE deducted income.

    1.1 Extract all monthly data from ir_ems that have ir_ems_income_source_code ‘W&S’, which stands for wages and salary, and where return period between 2000 and 2024 (latest year). Period starts at 2000 as first records of the EMS table begin in April 1999.
    1.2 Withholding type code will be P (paye deductions) or W (withholding payments). Where an individual has both codes for a given period, we select P.

  2. Identify self-employed individuals from 3 IR3 tables, particularly sole traders earning PAYE income, in order to exclude them from ir_ems data. To do so, we will
    2.1 Extract data from IR3 tables, specifically ir_clean.ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014 and ir_ir3_2013_to_2020 from IDI_Adhoc. Table prioritisation will occur with ir_clean.ird_rtns_keypoints_ir3 being the first source of data, followed by adhoc 2000_to_2014 and adhoc 2013_to_2020.
    2.2 We will use data from net profit column.
    2.3 Remove duplicates in tables by using maximum of location number, return_version_number and ird_timestamp_date.

  3. Identify self-employed individuals from ird cross reference, ir20 and ir4s tables, particularly partners and company directors, in order to exclude them from ir_ems data. To do so, we will
    3.1 Extract data from ird_cross_reference where ir_xrf_reference_type_code (code for shareholder/partner etc) is PTR (partner), DIR, SHR or EOH (company dorector), and customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation).
    3.2 Extract data from ird_attachments_ir20 for those individuals whose customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation). (These individuals are partners).
    3.3 Extract data from ird_attachments_ir4s for those individuals whose customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation). (These individuals are company directors).

  4. Exclude sole traders earning PAYE deducted income from ir_ems data where all the below conditions are met
    4.1 income source code is W&S and withholding type code is ‘P’ in ir_ems data (implying PAYE deductions), and
    4.2 snz_ird_uid is equal to snz_employer_ird_uid (implying payee equal to payer), and
    4.3 customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation), and
    4.4 Either
    4.4.1 Tax registration type code is GST in ird_tax_registration table, within the time peroid of interest, or
    4.4.2 Net profit exists within IR3 data extracted above in Step 2 within the time period of interest

  5. Exclude company directors and partners earning PAYE deducted income from ir_ems data where all the below conditions are met
    5.1 Individuals identified in step 3 exist in the ir_ems data (matched using unique identifiers and unique employer identifiers within the time period of interest, and
    5.2 Payer not equal to payee, i.e., the individuals do not have the same unique identifiers (snz_uid) and unique employer identifiers (snz_employer_uid), and
    5.3 withholding type code is ‘P’ in ir_ems data (implying PAYE deductions) and income source code is W&S in ir_ems_data (implying wages and salary)

  6. The remaining individuals from the ir_ems tables will be those who earn wages and salary.

  7. Aggregate the ir_ems data by year, snz_uid and income source code and sum up the monthly amounts. These monthly amounts will be the wages and salary amounts.

Parameters

The following parameters should be supplied to this module to run it in the database:

  1. {targetdb}: The SQL database on which the spell datasets are to be created.
  2. {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
  3. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  4. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.

Datasets dependencies:

[IDI_Clean_YYYYMM].[ir_clean].[ird_ems]
[IDI_Clean_YYYYMM].[ir_clean].[ird_rtns_keypoints_ir3]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2000_to_2014]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2013_to_2020]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir4s]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir20]
[IDI_Clean_YYYYMM].[ir_clean].[ird_customers]
[IDI_Clean_YYYYMM].[ir_clean].[ird_tax_registrations]
[IDI_Clean_YYYYMM].[ir_clean].[ird_cross_reference] 

Outputs

{targetdb}.{targetschema}.{projprefix}_wages_salary_{refresh}

Variable Descriptions

Aspect Variables Description
Entity snz_uid Current refresh snz_uid
Period period_start_date
period_end_date
Payment start date
Payment end date
Event source_frequency Frequency of income recorded in source data
Wages and Salary is recorded in EMS as monthly income
Information income_source Type of income being derived in this module
W&S-EMS means wages and salary from EMS table
taxable_income_indicator Indicator for whether income is taxable
1:income is taxable, 0:income is not taxable
Wages and salary are taxable
gross_income Income before tax

Module Version & Change History

Date Version Comments
2023-07 Original code sourced from APC code created by Stats NZ (Ivan Welsh) in the IDI
2024-06 Initial streamlining and preparation for code module by Linda Martis, Nicholson Consulting

Code

:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"

/* Assign the target database to which all the components need to be created in. */
use $(targetdb);

/*
This script combines a number of different source of IR3 data into a single table.
Sources:IR3 table, adhoc 2000_to_2014, adhoc 2013_to_2020

Duplicates are removed from the data by filtering for the maximum return_version_nbr,
ird_timestamp_date, source_code, snz_unique_nbr and finally location_nbr.

Create the storage table. This follows the format of the refresh IR3 table,
but without the leading ir_ir3_ prefix.

*/

DROP TABLE IF EXISTS #CombinedIR3;
CREATE TABLE #CombinedIR3 (
	    snz_uid                   INT NOT NULL
       ,snz_ird_uid               INT NOT NULL
       ,location_nbr              INT NOT NULL
       ,return_period_date        DATE NOT NULL
       ,snz_unique_nbr            BIGINT NOT NULL
       ,return_version_nbr        SMALLINT NULL
      ,net_profit_amt            DECIMAL(13,2) NULL
       ,income_imp_ind            VARCHAR(1) NOT NULL
       ,tot_wholding_paymnts_amt  DECIMAL(13,2) NULL
	   ,sum_all_income_amt		  DECIMAL(13,2) NULL
       ,ird_timestamp_date        DATE NULL
	    /* 1 for refresh IR3 table, 2 for adhoc 2000_to_2014, 3 for adhoc 2013_to_2020 */
       ,source_table_code         TINYINT NOT NULL 
       ,PRIMARY KEY (snz_ird_uid, return_period_date)
);

/* Combine the source tables into one */
INSERT INTO #CombinedIR3
SELECT con.snz_uid, con.snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
	   ir_ir3_net_profit_amt, 
	   ir_ir3_income_imp_ind, 
	    ir_ir3_tot_wholding_paymnts_amt, 
      isnull(ir_ir3_net_profit_amt,0)+
       isnull(ir_ir3_tot_wholding_paymnts_amt,0)
	   AS sum_all_income_amt, 
	   ir_ir3_ird_timestamp_date, source_table_code
	FROM( 
		SELECT *, ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, ir_ir3_return_period_date ORDER BY ir_ir3_return_version_nbr desc, ir_ir3_ird_timestamp_date desc, source_table_code asc, ir_ir3_snz_unique_nbr desc, ir_ir3_location_nbr desc) AS row_nbr 
			FROM ( 
				SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
                                                      ir_ir3_net_profit_amt, 
													 ir_ir3_income_imp_ind, 
                                                      ir_ir3_tot_wholding_paymnts_amt, 
													 ir_ir3_ird_timestamp_date, 1 AS source_table_code
                                                FROM $(idicleanversion).ir_clean.ird_rtns_keypoints_ir3
                                               UNION ALL
                SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
													  ir_ir3_net_profit_amt, 
													 ir_ir3_income_imp_ind, 
													  ir_ir3_tot_wholding_paymnts_amt, 
													 ir_ir3_ird_timestamp_date, 2 AS source_table_code
                                                FROM IDI_Adhoc.clean_read_IR.ir_ir3_2000_to_2014
                                               UNION ALL
                SELECT snz_ird_uid, location_number AS ir_ir3_location_nbr, period AS ir_ir3_return_period_date,
                                                     ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, period ORDER BY return_version, selfemployed_income desc, withholding_payment desc) AS ir_ir3_snz_unique_nbr,
                                                     return_version AS ir_ir3_return_version_nbr, 
													 selfemployed_income AS ir_ir3_net_profit_amt,
                                                     'N' AS ir_ir3_income_imp_ind, 
													 withholding_payment AS ir_ir3_tot_wholding_paymnts_amt,
                                                     CAST(timestamp AS DATE) AS ir_ir3_ird_timestamp_date, 3 AS source_table_code
                                                FROM IDI_Adhoc.clean_read_IR.ir_ir3_2013_to_2020
                                             ) tu
        ) t5
 INNER JOIN $(idicleanversion).security.concordance con
       ON con.snz_ird_uid = t5.snz_ird_uid
WHERE row_nbr = 1
   AND YEAR(ir_ir3_return_period_date) BETWEEN 2000 AND year(GETDATE()); 

RAISERROR('Inserted %d rows from IR3 into #CombinedIR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
This script brings EMS data for specific income sources into one table.

Data is stored in calendar year format mth_01_amt = January and so on
*/

/* Create the equivalent of the data.income_cal_yr table */
DROP TABLE IF EXISTS #IncomeCalYr;
CREATE TABLE #IncomeCalYr (
    year_nbr SMALLINT NOT NULL
	,return_period DATE NOT NULL
	,snz_uid INT NOT NULL
    ,snz_ird_uid INT NOT NULL
    ,snz_employer_ird_uid INT NOT NULL
    ,income_source_code CHAR(7) NOT NULL
    ,withholding_type_code CHAR(1) NOT NULL
    ,gross_income DECIMAL(19,2) NOT NULL
    ,PRIMARY KEY (snz_ird_uid, return_period, snz_employer_ird_uid, income_source_code, withholding_type_code)
);

/*
Populate the income cal with EMS data
*/
INSERT INTO #IncomeCalYr WITH (TABLOCK) (
    year_nbr, return_period, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code, withholding_type_code, gross_income
)
SELECT YEAR(ir_ems_return_period_date) AS year_nbr
			   , EOMONTH(ir_ems_return_period_date) AS return_period
               ,MAX(snz_uid) AS snz_uid  
               ,MAX(snz_ird_uid) AS snz_ird_uid
               ,snz_employer_ird_uid
               ,ir_ems_income_source_code + '-EMS' AS income_source_code
               ,MIN(ir_ems_withholding_type_code) AS withholding_type_code
               ,SUM(ir_ems_gross_earnings_amt) as gross_income
          FROM $(idicleanversion).ir_clean.ird_ems
         WHERE YEAR(ir_ems_return_period_date) BETWEEN 2000 - 1 AND year(GETDATE())
               AND snz_uid > 0
			   AND ir_ems_income_source_code in ('W&S')
         GROUP BY YEAR(ir_ems_return_period_date)
				,EOMONTH(ir_ems_return_period_date)
               ,snz_ird_uid 
               ,snz_employer_ird_uid
               ,ir_ems_income_source_code
             
 
RAISERROR('Inserted %d rows from EMS into #IncomeCalYr', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/* Update W&S income when it is actually self employed income
 This step adds the (S01) - Sole Trader recieving PAYE deducted income source
 Rules: (A) W&S income
        (B) PAYE
        (C) Payee = Payer
        (D) Paye is an individual
        (E) Evidence of self employed: IR3 net profit <> 0 or GST registered
*/
UPDATE inc WITH (TABLOCK)
   SET income_source_code = 'S01-EMS'
  FROM #IncomeCalYr inc
 WHERE inc.income_source_code    = 'W&S-EMS'                   /* Wages & Salaries */
   AND inc.withholding_type_code = 'P'                         /* PAYE */
   AND inc.snz_ird_uid = inc.snz_employer_ird_uid              /* payEE = payER */
   AND EXISTS ( SELECT 1                                         
                  FROM $(idicleanversion).ir_clean.ird_customers cus            
                 WHERE cus.snz_ird_uid = inc.snz_ird_uid     
                   AND cus.ir_cus_entity_type_code = 'I')      /* payEE is an individual */
   AND (EXISTS (SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_tax_registrations reg      /* GST registered */
                 WHERE reg.snz_ird_uid = inc.snz_ird_uid
                   AND reg.ir_treg_tax_type_code = 'GST'
                   AND inc.year_nbr BETWEEN YEAR(reg.ir_treg_treg_start_date)
                                        AND YEAR(reg.ir_treg_treg_end_date)
                ) 
        OR EXISTS ( SELECT 1
                      FROM #CombinedIR3 ir3  
                     WHERE ir3.snz_ird_uid = inc.snz_ird_uid
                       AND ir3.net_profit_amt <> 0             /* net profit non zero */
                       AND inc.year_nbr BETWEEN YEAR(ir3.return_period_date) - 1
                                            AND YEAR(ir3.return_period_date)
                )
        )

RAISERROR('Updated %d rows from W&S to S01', 0, 1, @@ROWCOUNT) WITH NOWAIT;

/*
 Build temp table to hold self employed income details from
         cross reference, IR20 & IR4
*/

DROP TABLE IF EXISTS #tmp_sei_individual;
CREATE TABLE #tmp_sei_individual (
    snz_ird_uid INT NOT NULL,
    snz_employer_ird_uid INT NOT NULL,
    start_cal_year_nbr INT NOT NULL,
    end_cal_year_nbr INT NOT NULL,
    sei_type_code CHAR(1) NOT NULL
);

INSERT INTO #tmp_sei_individual (
    snz_ird_uid, snz_employer_ird_uid, start_cal_year_nbr, end_cal_year_nbr, sei_type_code
)
/* Relationships in cross ref table */
SELECT cln.ir_xrf_to_snz_ird_uid AS snz_ird_uid,
       cln.ir_xrf_from_snz_ird_uid AS snz_employer_ird_uid,
       cln.ir_xrf_first_year_nbr AS start_cal_year_nbr,
       COALESCE(cln.ir_xrf_latest_year_nbr, YEAR(cln.ir_xrf_applied_date)) AS end_cal_year_nbr,
       CASE
            WHEN cln.ir_xrf_reference_type_code = 'PTR' THEN 'P'
            WHEN cln.ir_xrf_reference_type_code IN ('DIR', 'SHR', 'EOH') THEN 'C'
            ELSE 'U'
        END
  FROM $(idicleanversion).ir_clean.ird_cross_reference cln
 WHERE cln.ir_xrf_reference_type_code IN ('PTR', 'DIR', 'SHR', 'EOH')
   AND EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = cln.ir_xrf_to_snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               )
/* Relationships in IR20 table */
UNION
SELECT cln.snz_ird_uid AS snz_ird_uid,
       cln.snz_employer_ird_uid AS snz_employer_ird_uid,
       YEAR(cln.ir_ir20_return_period_date) - 1 AS start_cal_year_nbr,
       YEAR(cln.ir_ir20_return_period_date) AS end_cal_year_nbr,
       'P' AS sei_type_code
  FROM $(idicleanversion).ir_clean.ird_attachments_ir20 cln
 WHERE EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = cln.snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               )
/* Relationships in IR4 table */
UNION
SELECT cln.snz_ird_uid AS snz_ird_uid,
       cln.snz_employer_ird_uid AS snz_employer_ird_uid,
       1900 AS start_cal_year_nbr,
       9999 AS end_cal_year_nbr,
       'C' AS sei_type_code
  FROM $(idicleanversion).ir_clean.ird_attachments_ir4s cln
 WHERE EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = cln.snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               );

RAISERROR('Inserted %d rows for cross referencing', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
/*
 Update W&S income when it is actually self employed income
 This step adds the (C01) - Company director/shareholder recieving PAYE deducted income
                and (P01) - Partner recieving PAYE deducted income
 Rules: (A) W&S income
        (B) PAYE
        (C) Payee <> Payer
        (D) Payee is an individual
        (E) Evidence of self employed: IR20 or IR4 or cross ref
        (F) Favour C02 over P02
*/

UPDATE inc WITH (TABLOCk)
   SET income_source_code = CASE WHEN sei.sei_type_code = 'C' THEN 'C01-EMS'
                                 WHEN sei.sei_type_code = 'P' THEN 'P01-EMS'
                                 ELSE 'U01-EMS'
                             END
  FROM #IncomeCalYr inc
  JOIN ( SELECT snz_ird_uid, snz_employer_ird_uid, 
                start_cal_year_nbr, end_cal_year_nbr,
                sei_type_code = MIN(sei_type_code)  /* favour 'C' over 'P' */
           FROM #tmp_sei_individual
          GROUP BY snz_ird_uid, snz_employer_ird_uid, start_cal_year_nbr, end_cal_year_nbr
       ) sei ON sei.snz_ird_uid = inc.snz_ird_uid
            AND sei.snz_employer_ird_uid = inc.snz_employer_ird_uid
            AND inc.year_nbr BETWEEN sei.start_cal_year_nbr AND sei.end_cal_year_nbr
 WHERE inc.income_source_code = 'W&S-EMS'
   AND inc.withholding_type_code = 'P'
   AND inc.snz_ird_uid <> inc.snz_employer_ird_uid;

RAISERROR('Updated %d rows from W&S to C01/P01', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Create a Sandpit table using #IncomeCalYr 
Filtering for wages and salary only
All wages and salary stored in wide format
Note: data stored as tax year mth_01_amt = January and so on
*/

DROP TABLE IF EXISTS $(targetdb).$(targetschema).$(projprefix)_wages_salary; 
CREATE TABLE $(targetdb).$(targetschema).$(projprefix)_wages_salary (
	 snz_uid INT NOT NULL
	,period_start_date DATE NOT NULL
	,period_end_date DATE NOT NULL
	,source_frequency CHAR(7) NOT NULL
	,income_source CHAR(7) NOT NULL
	,taxable_income_indicator TINYINT NOT NULL 
	,gross_income DECIMAL(19,2) NULL
	);

INSERT INTO $(targetdb).$(targetschema).$(projprefix)_wages_salary

SELECT snz_uid
		,DATEFROMPARTS(year(return_period), month(return_period), 01) as period_start_date
		,return_period AS period_end_date
		,'monthly' as source_frequency
		,income_source_code as income_source
		,1 as taxable_income_indicator
		,SUM(gross_income) AS gross_income
  FROM #IncomeCalYr
  WHERE snz_uid IS NOT NULL
  and income_source_code in ('W&S-EMS')
  and gross_income <> 0
 GROUP BY snz_uid, return_period, income_source_code;

RAISERROR('Created table in Sandpit with %d rows', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

ALTER TABLE $(targetdb).$(targetschema).$(projprefix)_wages_salary
REBUILD PARTITION = ALL WITH (DATA_COMPRESSION=PAGE);

DROP TABLE IF EXISTS #CombinedIR3;
DROP TABLE IF EXISTS #IncomeCalYr;
DROP TABLE IF EXISTS #tmp_sei_individual