Total income module collection: Self employment

cori.qian
24 June 2024

Module output

SQL::[IDI_Community].[inc_self_employment]. self_employment_YYYYMM
SAS: libname cmti ODBC dsn=idi_community_srvprd schema=inc_self_employment; proc print data = cmti.self_employment_YYYYMM
How to access a code module in the Data Lab : Read here

Purpose of Self employment (SE) module

The purpose of this module is to construct the total self-employment (SE) income for people in New Zealand.

Key concepts

Self-employment income

Self-employment income is the net profit or loss received from all current and previous self-employment jobs held over the reference period, including drawings (cash or goods the respondent takes out of the business instead of receiving a ’wage’). Self-employment income covers schedular payments, company director income, partnership income, sole trader income (excluding rental income) and shareholder-employee salary without PAYE deducted earned by business owners.

Total income spell

An income spell is the minimum payment period per income type. SE income in practice may be paid in a daily, weekly, fortnightly, monthly spell etc.

Source frequency

Source frequency is the frequency with which an income type is stored in the data. Source frequency for SE income is monthly when it is paid as schedular payments and identified in EMS and annual when it is filed in IR3 returns.

Gross income amount

For the purpose of this module, gross income is Self -employment income and income before tax for schedular payments.

Observed tax

Tax recorded as paid to IRD as part of the income filing process. Observed tax data for SE income is spread across multiple IR tables in IDI, including EMS and IR3. This is the amount of tax that has been has been deducted from source (eg: PAYE and tax filing).
Note: observed tax deduction is not the same as actual tax liability since this can only be calculated once all income sources have been identified over a tax period. This information has not been included in the initial version of the module.

Granularity

The modules are setup to be event format: One row per event, events have a person, start and end dates, and descriptive information. The variables below will uniquely identify each event in the above table:

Entity id: snz uid
Period: start end, end date
Event information: income source, taxable income indicator

Data sources

IR3: ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014, ir_ir3_2013_to_2020

IR3 is the income tax return for individuals used to confirm the amount of personal income tax to be paid at the end of each tax year with data available from March 2000 to present. The main purpose of IR3 is to include any monetary payments that have not been taxed at source. IR3 is used for self-employment (filed annually by sole traders) and includes non-zero partnership, or shareholder salary income, as well as rental income. IR3 is mandatory for individuals who earned more than $200 (before tax) income other than salary, wages, interest, dividends, and/or taxable Māori authority distributions. IR3 is filed annually.

There may be a substantial lag in reporting IR3 data as individual businesses and those using tax agents have up to a year after the end of the financial year to file an annual tax return with IR.

IR3 data comes from a different table to the EIE data but is still direct information submitted by a customer

Note: 3 IR2 tables (2 ad-hoc tables and 1 clean) are required for creating the timeseries of IR3 data. IR3s belonging to people who reported no self-employment, partnership or shareholder income were excluded for tax years prior to 2019. They are in the ad-hoc tables.

IR4: ird_attachments_ir4s

IR4 is filed by companies and includes remuneration income paid to shareholders, directors, and relatives of shareholders with data available from March 1995 to present. This table holds information on non-zero shareholder income and shows non-PAYE salary payments made to working proprietors of businesses. IR4 and IR4s is required from all active NZ resident companies. IR4S is filed annually.

IR4 data comes from a different table to the EIE data but is still direct information submitted by a customer

IR20 (ird_attachments_ir20

IR7 (formerly IR20) is filed for partnership and look-through companies with data available from March 1995 to present. It is filed annually. This table contains information about zero and non-zero partnership income.

Employer monthly schedule: ir_clean.ird_ems

Employer monthly schedule (EMS) table contains employee level data from the EMS return. Regular earnings of some self-employed persons can be reported in the EMS as wages or salaries. The earnings of independent contractors in the EMS are known as withholding or schedular payments.

EMS is available in the LBD/IDI. This is a monthly tax return filed by all businesses with paid employees, which summarises the monthly wage and salary payments made to each of their employees, and the ‘pay-as-you-earn’ (PAYE) income tax deductions made. EMS is a mandatory reporting requirement for all employing firms.

Since April 2019, Inland Revenue has shifted from monthly filings under the EMS to compulsory payday filing under the Employment Information - Employee (EI-E) system. This follows a transition period since April 2018 in which firms could select whether to file monthly or through the payday filing system. For consistency, Stats NZ has made a processed version of the EI-E data available, allocating payday information to calendar months and including these records as part of the EMS tables.

Note:

ird_customers

The IR customers file records all Inland Revenue’s entities, their description, status (active, deceased, etc), commencement/cessation, student loan/tax residency status and business industry code where relevant.

A full file is extracted each month containing all new records and changes to existing records.
The information in this file is not derived from an IR form. It is a combination of different fields that are updated when this specific customer information changes. IR can be notified by post, in person, by phone, email, myIR, secure mail.

The population is all entities that engage with IR. At the highest level it can be:

ird_tax_registrations:

The IR tax registration file records the status of an entities tax type registration with IR. It contains start/end dates, tax return filing frequencies, if the entity has a tax agent, if they are a corporate entity and if they file electronically.

The population is all entities that engage with IR. At the highest level it can be:

ird_cross_reference:

The IR cross reference file links two IRD numbers, documents the period they are linked and describes the relationship between them.

The information in this file is not derived from an IR form. It is a combination of different fields that are updated when relationships between entities change. A change can be submitted by post, in person, by phone, email, myIR, secure mail.

The population is individual to business and business to business relationships between IR entities

Practical notes

Period

Self employment income information is from 1999 onwards.

Identification of Earnings not liable to ACC levy and bonuses

All employees must pay an ACC earners’ levy to cover the cost of non work-related injuries. IR collect this on behalf of the ACC. The earners’ levy has been built into the PAYE tables and is deducted along with the tax. Almost all earnings subject to PAYE are liable for the levy with the exception of schedular payments, retirement payments, redundancy payments, benefits from an employee share scheme, jury fees, witness fees, and taxable pensions. See IR employers guide for more information. Inclusion of earnings not liable for ACC levy is not essential for this module.

Additionally, we will not identify bonus in this module as that is not easy to identify in IDI due to quality of the lumpsum payment flag, as not all employers update it consistently.

Other exclusions from this module

We are not able to identify and quantify income support payment off-sets, debt repayments and re-directions.

Negative amounts

No exclusions made from EMS for negative amounts.

Delay in filing returns by contractors

There is a time lag to pay ACC levy, income tax, and/or Student loan after receiving their income. Those unable to pay the full amount may choose a monthly repayment schedule for a fixed period. The timing of repayments will affect the net income amount. In this module, we only report on gross income and do not capture other deductions to arrive at net income.
Other time lags can be due to individuals delayed filing of income, source agency delay in providing information to Stats NZ, or Stats NZ delay in adding information received from source agency into IDI. eg: Filing happens up to 12 months following tax year and then they have to wait for the IDI refresh. This may explain some dips in an income time series towards the end of the tax year. Due to this, historic information is more likely to be complete. Data from 12 - 18 months in the past it is likely to not have complete income deductions and tax information.

Seasonal workers identification

Not applicable total income module. Refer to employment spells module for more information

Staged approach to include tax within the total income module

Stage 1 of module creation will only include gross income amounts. Stage 2 (pending) will include observed tax deducted at source since they are available in EMS and MSD tables. This can be used to calculate the tax at a tax year level which includes tax deducted at source and from tax filings. Stage 3 (pending) aims to derive tax at a monthly level. This is complex and will require support from SME to apply rules to derive tax. This can be used to calculate tax for any time period.

Sources of Validation

No validation dataset available, hence unable to validate the module output.

References

  1. IDI Derived tables
  2. Document : experimental-income-estimates-from-linked-admin-data-methods-and-results-revised.pdf
  3. Timelines at the end of the tax year
  4. Individual income tax return - IR3
  5. ir3-2024.pdf
  6. Extension of time arrangements
  7. ir4-2024.pdf
  8. ir4s-2024.pdf
  9. Income tax for partnerships
  10. File a Partnerships and look-through companies income tax return – IR7
  11. ir7-2024.pdf
  12. File a Partnership income loss attribution – IR7P
  13. ir7p-2024.pdf
  14. Tax types - Choosing the right account type
  15. ACC levies set for 2023, 2024 and 2025
  16. Paying your tax

Key Contacts

Role Organisation Person
Dev lead MBIE Eric Krassoi Peach
Module Coder Nicholson Consulting Linda Martis
Code Supply Stats NZ MSD Ivan Welsh Marc de Boer
Peer review (code) Stats NZ Rodney Jer
Peer review (documentation) IRD IRD Joanne Butterfield Scott Anderson
Module Steward Stats NZ Cori Qian

Module business rules

Defining spell start and end dates

Start and end dates are based on EMS, IR3, IR4s, IR7 dates. The return period column in the tables is considered to be the end date and the start of that period (monthly for EMS and annual tax year for other tables) will be the start date.

Person identifier snz_uid

A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Minimum income threshold

Currently no minimum threshold applied.

Interoperability with other modules

This module is in the business landscape so could be that it is most closely associated with business profile modules and employment spells. It is also in the socioeconomic landscape where income is a determinant of positive social outcomes, so associated with social housing, education modules, health modules, driving modules, employment assistance modules etc.

Duplication in data

APC code removes duplication in WHP appearing in IR3 data and WHP, S02, P02, C02 appearing in EMS data. Individuals can have multiple income types (eg: sole trader and partner receiving PAYE deducted income) in one financial year. These are not treated as duplicates as they have different amounts.

Taxable income indicator

Taxable income indicator to be hard coded in modules. Within self employment module, all income is taxable.

Code logic:

The code is based on the rules used for APC and derived data tables in the IDI.
Steps 1 to 3 are mainly collating data from various IDI source tables. Rest of the steps are identifying wages and salary and excluding self-employment paye deducted income.

  1. We use ir_ems table to obtain wages and salary data with the intention of identifying and relabelling self-employed income i.e, individuals earning PAYE deducted or withholding tax deducted income.

    1.1 Individuals will be relabelled as ‘C01’ - Company director/shareholder receiving PAYE deducted income, ‘P01’ - Partner receiving PAYE deducted income, ‘S01’ - Sole Trader receiving PAYE deducted income, ‘C02’ - Company director/shareholder receiving WHT deducted income, ‘P02’ Partner receiving withholding tax deducted income and ‘S02’ Sole Trader receiving withholding tax deducted income.

    1.2 We extract all monthly data from ir_ems that have ir_ems_income_source_code ‘W&S’ and ‘WHP’, which stands for wages and salary and withholding payment, and where return period between 2000 and 2024 (latest year). Period starts at 2000 as first records of the EMS table begin in April 1999.

    1.3 Withholding type code will be P (paye deductions) or W (withholding payments). Where an individual has both codes for a given period, we assign the code P.

  2. Identify self-employed individuals from 3 IR3 tables. This is for the purpose of relabelling individuals who appear in EMS data, and also to identify other self-employed individuals. To do so, we will

    2.1 Extract data from IR3 tables, specifically ir_clean.ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014 and ir_ir3_2013_to_2020 from IDI_Adhoc. Table prioritisation will occur with ir_clean.ird_rtns_keypoints_ir3 being the first source of data, followed by adhoc 2000_to_2014 and adhoc 2013_to_2020.

    2.2 We will use data from partnership income and net profit column.

    2.3 Remove duplicates in tables by using maximum of location number, return_version_number and ird_timestamp_date.

  3. Identify self-employed individuals from ird cross reference, ir20 and ir4s tables, particularly partners and company directors, in order to relabel them in ir_ems data. To do so, we will

    3.1 Extract data from ird_cross_reference where ir_xrf_reference_type_code (code for shareholder/partner etc) is PTR (partner), DIR, SHR or EOH (company director), and customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation). This identified the type of employee-employer relationship.

    3.2 Extract data from ird_attachments_ir20 for those individuals whose customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation). (These individuals are partners). This implies tax form has been filed for payee-payer relationship for period in question and evidence of ‘Company director/shareholder’ relationship exists.

    3.3 Extract data from ird_attachments_ir4s for those individuals whose customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation). (These individuals are company directors). This implies tax form has been filed for payee-payer relationship for period in question and evidence of ‘partner’ relationship exists.

  4. Identify sole traders earning PAYE deducted income from ir_ems data (S01-EMS) where all the below conditions are met
    4.1 income source code is W&S and withholding type code is ‘P’ in ir_ems data (implying PAYE deductions), and

    4.2 snz_ird_uid is equal to snz_employer_ird_uid (implying payee equal to payer), and

    4.3 customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation), and

    4.4 Either
    4.4.1 Tax registration type code is GST in ird_tax_registration table, within the time peroid of interest, or
    4.4.2 Net profit exists within IR3 data extracted above in Step 2 within the time period of interest

  5. Identify company directors and partners earning PAYE deducted income from ir_ems (P01-EMS, C01-EMS) data where all the below conditions are met

    5.1 Individuals identified in step 3 exist in the ir_ems data (matched using unique identifiers and unique employer identifiers within the time period of interest), and

    5.2 Payer not equal to payee, i.e., the individuals do not have the same unique identifiers (snz_uid) and unique employer identifiers (snz_employer_uid), and

    5.3 Withholding type code is ‘P’ in ir_ems data (implying PAYE deductions) and income source code is W&S in ir_ems_data (implying wages and salary)

  6. Identify company directors, partners and sole traders earning withholding payments from ir_ems data (S02-EMS, P02-EMS, C02-EMS) where all the below conditions are met

    6.1 Individuals identified in step 3 exist in the ir_ems data (identified using unique identifiers and unique employer identifiers within the time period of interest, and

    6.2 Withholding type code is ‘W’ in ir_ems data (implying withholding tax) and income source code is ‘WHP’ in ir_ems_data (implying withholding payments), and

    6.3 customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation).

    6.4 Those not labelled as company directors and partners (because they do not exist in Step 3), but are identified as individuals in ird_customers will be labelled as sole traders. The rest will continue to be labelled as WHP-EMS based on their EMS data in Step 1.

  7. Identification of self-employed individuals from ir_ems data is complete. These individuals are self-employed who earn PAYE or WHT deducted income.

  8. Identify sole trader from IR3 data (S00-IR3) using IR3 table created in Step 2. Their income comes from the net profit column, given it is not equal 0.

  9. Identify company director/shareholder using IR4s data from ird_attachments_ir4s (C00-IR4). Their income comes from the total shareholder salary column, given it is not equal 0. We exclude any income imputations since that is not reliable. Also criteria for customer entity type code from ird_customers is ‘I’ must also be met, so that we identify individuals and not organisations.

  10. Identify partner using IR20 data from ird_attachments_ir20 (P00-IR7). Their income comes from the total share of income column, given it is not equal 0. Also, criteria for customer entity type code from ird_customers is ‘I’ must also be met, so that we identify individuals and not organisations. Additionally, they should exist in IR3 table created in Step 2 and their partnership income should not be 0.

  11. Identify self-employed income with withholding tax deductions identified in IR3 table (WHP-IR3) created in Step 2. Their income comes from withholding payments amount column, given it is not equal to 0.

  12. Aggregate all the income identified from various tables (highlighted in yellow). These will be aggregated in tax year format. Their incomes will be the sum of their monthly (from ems only) and annual incomes.

  13. There may be duplication in data for withholding payments identified through EMS and IR3 datasets. Where the unique identifier and time period is the same for an individual earning withholding payments, delete their EMS data (WHP-EMS, S02-EMS, P02-EMS, C02-EMS) so that only one copy of their income (WHP-IR3) exists.

  14. To summarise, all self-employment income have now been identified and tagged as ‘C00-IR4’, ‘C01-EMS’, ‘C02-EMS’, ‘P00-IR7’, ‘P01-EMS’, ‘P02-EMS’, ‘S00-IR3’, ‘S01-EMS’, ‘S02-EMS’, ‘WHP-EMS’, ‘WHP-IR3’. We exclude all remaining data from EMS with income source W&S.

Parameters

The following parameters should be supplied to this module to run it in the database:

  1. {targetdb}: The SQL database on which the spell datasets are to be created.
  2. {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
  3. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  4. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.

Dependencies

[IDI_Clean_YYYYMM].[ir_clean].[ird_ems]
[IDI_Clean_YYYYMM].[ir_clean].[ird_rtns_keypoints_ir3]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2000_to_2014]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2013_to_2020]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir4s]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir20]
[IDI_Clean_YYYYMM].[ir_clean].[ird_customers]
[IDI_Clean_YYYYMM].[ir_clean].[ird_tax_registrations]
[IDI_Clean_YYYYMM].[ir_clean].[ird_cross_reference] 

Outputs

{targetdb}.{targetschema}.{projprefix}_self_employment_{refresh}

Variable Descriptions

Aspect Variables Description
Entity snz_uid Current refresh snz_uid
Period period_start_date period_end_date Payment start date Payment end date
Event source_frequency Frequency of income recorded in source data Note that self employment is recorded as both monthly and annual since it is sourced from multiple tables with different source frequencies
Information income_source Type of income being derived in this module
Note that: - C00-IR4 is Company director/shareholder income from IR4S table - C01-EMS is Company director/shareholder PAYE deducted income from EMS table - C02-EMS is Company director/shareholder WHT deducted income from EMS table - P00-IR7 is Partnership income from IR20/IR7 table - P01-EMS is Partner PAYE deducted income from EMS table - P02-EMS is Partner WHT deducted income from EMS table - S00-IR3 is Sole trader income from IR3 tables - S01-EMS is Sole Trader PAYE deducted income from EMS table - S02-EMS is Sole Trader WHT deducted income from EMS table - WHP-EMS is Withholding payments from EMS table - WHP-IR3 is Withholding payments from IR3 tables
taxable_income_indicator Indicator for whether income is taxable Note that: 1:income is taxable 0:income is not taxable Self employment income is taxable
gross_income Income before tax

Module Version & Change History

Date Version Comments
2023-07 Original code sourced from APC code created by Stats NZ (Ivan Welsh) in the IDI
2024-06 Initial streamlining and preparation for code module by Linda Martis, Nicholson Consulting

Code

:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"

/* Assign the target database to which all the components need to be created in. */
use $(targetdb);

/*
This script combines a number of different source of IR3 data into a single table.
Sources:IR3 table, adhoc 2000_to_2014, adhoc 2013_to_2020

Duplicates are removed from the data by filtering for the maximum return_version_nbr,
ird_timestamp_date, source_code, snz_unique_nbr and finally location_nbr

Create the storage table. This follows the format of the refresh IR3 table,
but without the leading ir_ir3_ prefix.

*/

DROP TABLE IF EXISTS #CombinedIR3;
CREATE TABLE #CombinedIR3 (
	    snz_uid                   INT NOT NULL
       ,snz_ird_uid               INT NOT NULL
       ,location_nbr              INT NOT NULL
       ,return_period_date        DATE NOT NULL
       ,snz_unique_nbr            BIGINT NOT NULL
       ,return_version_nbr        SMALLINT NULL
       ,tot_pship_income_amt      DECIMAL(13,2) NULL
      ,net_profit_amt            DECIMAL(13,2) NULL
       ,income_imp_ind            VARCHAR(1) NOT NULL
       ,tot_wholding_paymnts_amt  DECIMAL(13,2) NULL
	   ,sum_all_income_amt		  DECIMAL(13,2) NULL
       ,ird_timestamp_date        DATE NULL
	   /*  1 for refresh IR3 table, 2 for adhoc 2000_to_2014, 3 for adhoc 2013_to_2020 */
       ,source_table_code         TINYINT NOT NULL 
       ,PRIMARY KEY (snz_ird_uid, return_period_date)
);

/* Combine the source tables into one */
INSERT INTO #CombinedIR3
SELECT con.snz_uid, con.snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
       ir_ir3_tot_pship_income_amt, 
	   ir_ir3_net_profit_amt, 
	   ir_ir3_income_imp_ind, 
	    ir_ir3_tot_wholding_paymnts_amt, 
	   isnull(ir_ir3_tot_pship_income_amt, 0) +
      isnull(ir_ir3_net_profit_amt,0)+
       isnull(ir_ir3_tot_wholding_paymnts_amt,0)
	   AS sum_all_income_amt, 
	   ir_ir3_ird_timestamp_date, source_table_code
	FROM( 
		SELECT *, ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, ir_ir3_return_period_date ORDER BY ir_ir3_return_version_nbr desc, ir_ir3_ird_timestamp_date desc, source_table_code asc, ir_ir3_snz_unique_nbr desc, ir_ir3_location_nbr desc) AS row_nbr 
			FROM ( 
				SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
                                                      ir_ir3_tot_pship_income_amt, 
													  ir_ir3_net_profit_amt, 
													 ir_ir3_income_imp_ind, 
													  ir_ir3_tot_wholding_paymnts_amt, 
													 ir_ir3_ird_timestamp_date, 1 AS source_table_code
                                                FROM $(idicleanversion).ir_clean.ird_rtns_keypoints_ir3
                                               UNION ALL
                SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
                                                      ir_ir3_tot_pship_income_amt, 
													  ir_ir3_net_profit_amt, 
													 ir_ir3_income_imp_ind, 
													  ir_ir3_tot_wholding_paymnts_amt, 
													 ir_ir3_ird_timestamp_date, 2 AS source_table_code
                                                FROM IDI_Adhoc.clean_read_IR.ir_ir3_2000_to_2014
                                               UNION ALL
                SELECT snz_ird_uid, location_number AS ir_ir3_location_nbr, period AS ir_ir3_return_period_date,
                                                     ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, period ORDER BY return_version, partnership_income desc, selfemployed_income desc, withholding_payment desc) AS ir_ir3_snz_unique_nbr,
                                                     return_version AS ir_ir3_return_version_nbr, 
													  partnership_income AS ir_ir3_tot_pship_income_amt,
													 selfemployed_income AS ir_ir3_net_profit_amt,
                                                     'N' AS ir_ir3_income_imp_ind, 
													 withholding_payment AS ir_ir3_tot_wholding_paymnts_amt,
                                                     CAST(timestamp AS DATE) AS ir_ir3_ird_timestamp_date, 3 AS source_table_code
                                                FROM IDI_Adhoc.clean_read_IR.ir_ir3_2013_to_2020
                                             ) tu
        ) t5
 INNER JOIN $(idicleanversion).security.concordance con
       ON con.snz_ird_uid = t5.snz_ird_uid
	WHERE row_nbr = 1
   AND YEAR(ir_ir3_return_period_date) BETWEEN 2000 AND year(GETDATE());

RAISERROR('Inserted %d rows from IR3 into #CombinedIR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
This script brings EMS data for specific income sources into one table.

Data is stored in calendar year format mth_01_amt = January and so on
*/

/* Create the equivalent of the data.income_cal_yr table */
DROP TABLE IF EXISTS #IncomeCalYr;
CREATE TABLE #IncomeCalYr (
    year_nbr SMALLINT NOT NULL
	  ,snz_uid INT NOT NULL
    ,snz_ird_uid INT NOT NULL
    ,snz_employer_ird_uid INT NOT NULL
    ,income_source_code CHAR(7) NOT NULL
    ,withholding_type_code CHAR(1) NOT NULL
    ,mth_01_amt DECIMAL(19,2) NOT NULL
    ,mth_02_amt DECIMAL(19,2) NOT NULL
    ,mth_03_amt DECIMAL(19,2) NOT NULL
    ,mth_04_amt DECIMAL(19,2) NOT NULL
    ,mth_05_amt DECIMAL(19,2) NOT NULL
    ,mth_06_amt DECIMAL(19,2) NOT NULL
    ,mth_07_amt DECIMAL(19,2) NOT NULL
    ,mth_08_amt DECIMAL(19,2) NOT NULL
    ,mth_09_amt DECIMAL(19,2) NOT NULL
    ,mth_10_amt DECIMAL(19,2) NOT NULL
    ,mth_11_amt DECIMAL(19,2) NOT NULL
    ,mth_12_amt DECIMAL(19,2) NOT NULL
    ,tot_yr_amt DECIMAL(19,2) NOT NULL
    ,PRIMARY KEY (snz_ird_uid, year_nbr, snz_employer_ird_uid, income_source_code, withholding_type_code)
);

/*
 STEP 1: Populate the income cal with EMS data
 */
INSERT INTO #IncomeCalYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code, withholding_type_code,
    mth_01_amt, mth_02_amt, mth_03_amt, mth_04_amt, mth_05_amt, mth_06_amt,
    mth_07_amt, mth_08_amt, mth_09_amt, mth_10_amt, mth_11_amt, mth_12_amt, tot_yr_amt
)
SELECT year_nbr
       ,snz_uid
       ,snz_ird_uid
       ,snz_employer_ird_uid
       ,income_source_code
       ,withholding_type_code
       ,ISNULL(m1, 0) AS mth_01_amt
       ,ISNULL(m2, 0) AS mth_02_amt
       ,ISNULL(m3, 0) AS mth_03_amt
       ,ISNULL(m4, 0) AS mth_04_amt
       ,ISNULL(m5, 0) AS mth_05_amt
       ,ISNULL(m6, 0) AS mth_06_amt
       ,ISNULL(m7, 0) AS mth_07_amt
       ,ISNULL(m8, 0) AS mth_08_amt
       ,ISNULL(m9, 0) AS mth_09_amt
       ,ISNULL(m10, 0) AS mth_10_amt
       ,ISNULL(m11, 0) AS mth_11_amt
       ,ISNULL(m12, 0) AS mth_12_amt
       ,ISNULL(m1, 0) + ISNULL(m2, 0) + ISNULL(m3, 0) + ISNULL(m4, 0) + ISNULL(m5, 0) + ISNULL(m6, 0) +
       ISNULL(m7, 0) + ISNULL(m8, 0) + ISNULL(m9, 0) + ISNULL(m10, 0) + ISNULL(m11, 0) + ISNULL(m12, 0) AS tot_yr_amt
  FROM (SELECT YEAR(ir_ems_return_period_date) AS year_nbr
               ,CONCAT('m', MONTH(ir_ems_return_period_date)) AS month_label
               ,MAX(snz_uid) AS snz_uid  
               ,MAX(snz_ird_uid) AS snz_ird_uid
               ,snz_employer_ird_uid
               ,ir_ems_income_source_code + '-EMS' AS income_source_code
               ,MIN(ir_ems_withholding_type_code) AS withholding_type_code
               ,SUM(ir_ems_gross_earnings_amt) as amt
          FROM $(idicleanversion).ir_clean.ird_ems
         WHERE YEAR(ir_ems_return_period_date) BETWEEN 2000 - 1 AND year(GETDATE()) 
               AND ■■■■■■■■■■■
			   AND ir_ems_income_source_code in ('W&S','WHP')
         GROUP BY YEAR(ir_ems_return_period_date) 
               ,MONTH(ir_ems_return_period_date) 
               ,snz_ird_uid 
               ,snz_employer_ird_uid
               ,ir_ems_income_source_code
             ) AS month_groups
 PIVOT (SUM(amt) FOR month_label IN (m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12)) AS pivoted;
 
RAISERROR('Inserted %d rows from EMS into #IncomeCalYr', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
 Update W&S income when it is actually self employed income
 This step adds the (S01) - Sole Trader recieving PAYE deducted income source
 Rules: (A) W&S income
        (B) PAYE
        (C) Payee = Payer
        (D) Paye is an individual
        (E) Evidence of self employed: IR3 net profit <> 0 or GST registered
*/ 
UPDATE inc WITH (TABLOCK)
   SET income_source_code = 'S01-EMS'
  FROM #IncomeCalYr inc
 WHERE inc.income_source_code    = 'W&S-EMS'                   /* Wages & Salaries */
   AND inc.withholding_type_code = 'P'                         /* PAYE */
   AND inc.snz_ird_uid = inc.snz_employer_ird_uid              /* payEE = payER */
   AND EXISTS ( SELECT 1                                         
                  FROM $(idicleanversion).ir_clean.ird_customers cus            
                 WHERE cus.snz_ird_uid = inc.snz_ird_uid     
                   AND cus.ir_cus_entity_type_code = 'I')      /* payEE is an individual */
   AND (EXISTS (SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_tax_registrations reg      /* GST registered */
                 WHERE reg.snz_ird_uid = inc.snz_ird_uid
                   AND reg.ir_treg_tax_type_code = 'GST'
                   AND inc.year_nbr BETWEEN YEAR(reg.ir_treg_treg_start_date)
                                        AND YEAR(reg.ir_treg_treg_end_date)
                ) 
        OR EXISTS ( SELECT 1
                      FROM #CombinedIR3 ir3
                     WHERE ir3.snz_ird_uid = inc.snz_ird_uid
                       AND ir3.net_profit_amt <> 0             /* net profit non zero */
                       AND inc.year_nbr BETWEEN YEAR(ir3.return_period_date) - 1
                                            AND YEAR(ir3.return_period_date)
                )
        )

RAISERROR('Updated %d rows from W&S to S01', 0, 1, @@ROWCOUNT) WITH NOWAIT;

/*
Build temp table to hold self employed income details from
         cross reference, IR20 & IR4
*/ 

DROP TABLE IF EXISTS #tmp_sei_individual;
CREATE TABLE #tmp_sei_individual (
    snz_ird_uid INT NOT NULL,
    snz_employer_ird_uid INT NOT NULL,
    start_cal_year_nbr INT NOT NULL,
    end_cal_year_nbr INT NOT NULL,
    sei_type_code CHAR(1) NOT NULL
);

INSERT INTO #tmp_sei_individual (
    snz_ird_uid, snz_employer_ird_uid, start_cal_year_nbr, end_cal_year_nbr, sei_type_code
)
/* Relationships in cross ref table */
SELECT cln.ir_xrf_to_snz_ird_uid AS snz_ird_uid,
       cln.ir_xrf_from_snz_ird_uid AS snz_employer_ird_uid,
       cln.ir_xrf_first_year_nbr AS start_cal_year_nbr,
       COALESCE(cln.ir_xrf_latest_year_nbr, YEAR(cln.ir_xrf_applied_date)) AS end_cal_year_nbr,
       CASE
            WHEN cln.ir_xrf_reference_type_code = 'PTR' THEN 'P'
            WHEN cln.ir_xrf_reference_type_code IN ('DIR', 'SHR', 'EOH') THEN 'C'
            ELSE 'U'
        END
  FROM $(idicleanversion).ir_clean.ird_cross_reference cln
 WHERE cln.ir_xrf_reference_type_code IN ('PTR', 'DIR', 'SHR', 'EOH')
   AND EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = cln.ir_xrf_to_snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               )
/* Relationships in IR20 table */
UNION
SELECT cln.snz_ird_uid AS snz_ird_uid,
       cln.snz_employer_ird_uid AS snz_employer_ird_uid,
       YEAR(cln.ir_ir20_return_period_date) - 1 AS start_cal_year_nbr,
       YEAR(cln.ir_ir20_return_period_date) AS end_cal_year_nbr,
       'P' AS sei_type_code
  FROM $(idicleanversion).ir_clean.ird_attachments_ir20 cln
 WHERE EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = cln.snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               )
/* Relationships in IR4 table */
UNION
SELECT cln.snz_ird_uid AS snz_ird_uid,
       cln.snz_employer_ird_uid AS ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ AS start_cal_year_nbr,
       9999 AS end_cal_year_nbr,
       'C' AS sei_type_code
  FROM $(idicleanversion).ir_clean.ird_attachments_ir4s cln
 WHERE EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = cln.snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               );

RAISERROR('Inserted %d rows for cross referencing', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Update W&S income when it is actually self employed income
 This step adds the (C01) - Company director/shareholder recieving PAYE deducted income
                and (P01) - Partner recieving PAYE deducted income
 Rules: (A) W&S income
        (B) PAYE
        (C) Payee <> Payer
        (D) Payee is an individual
        (E) Evidence of self employed: IR20 or IR4 or cross ref
        (F) Favour C02 over P02
*/

UPDATE inc WITH (TABLOCk)
   SET income_source_code = CASE WHEN sei.sei_type_code = 'C' THEN 'C01-EMS'
                                 WHEN sei.sei_type_code = 'P' THEN 'P01-EMS'
                                 ELSE 'U01-EMS'
                             END
  FROM #IncomeCalYr inc
  JOIN ( SELECT snz_ird_uid, snz_employer_ird_uid, 
                start_cal_year_nbr, end_cal_year_nbr,
                sei_type_code = MIN(sei_type_code)  /* favour 'C' over 'P' */
           FROM #tmp_sei_individual
          GROUP BY snz_ird_uid, snz_employer_ird_uid, start_cal_year_nbr, end_cal_year_nbr
       ) sei ON sei.snz_ird_uid = inc.snz_ird_uid
            AND sei.snz_employer_ird_uid = inc.snz_employer_ird_uid
            AND inc.year_nbr BETWEEN sei.start_cal_year_nbr AND sei.end_cal_year_nbr
 WHERE inc.income_source_code = 'W&S-EMS'
   AND inc.withholding_type_code = 'P'
   AND inc.snz_ird_uid <> inc.snz_employer_ird_uid;

RAISERROR('Updated %d rows from W&S to C01/P01', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Update WHP income when it is actually self employed income
 This step adds the (C02) - Company director/shareholder recieving WHT deducted income
                and (P02) - Partner recieving WHT deducted income
                and (S02) - Sole Trader recieving WHT deducted income
 Rules: (A) WHP income
        (B) WHT
        (C) Payee is an individual
        (D) Evidence of self employed: then C02 or P02 else S02
        (E) Favour C02 over P02, THEN S02
*/

UPDATE inc WITH (TABLOCK)
   SET income_source_code = CASE WHEN sei.sei_type_code = 'C' THEN 'C02-EMS'
                                 WHEN sei.sei_type_code = 'P' THEN 'P02-EMS'
                                 ELSE 'S02-EMS'
                             END
  FROM #IncomeCalYr inc
  LEFT JOIN ( SELECT snz_ird_uid, snz_employer_ird_uid, 
                     start_cal_year_nbr, end_cal_year_nbr,
                     sei_type_code = MIN(sei_type_code)
                FROM #tmp_sei_individual
               GROUP BY snz_ird_uid, snz_employer_ird_uid, start_cal_year_nbr, end_cal_year_nbr
            ) sei ON sei.snz_ird_uid = inc.snz_ird_uid
                 AND sei.snz_employer_ird_uid = inc.snz_employer_ird_uid
                 AND inc.year_nbr BETWEEN sei.start_cal_year_nbr AND sei.end_cal_year_nbr
 WHERE inc.income_source_code = 'WHP-EMS'
   AND inc.withholding_type_code = 'W'
   AND EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = inc.snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
              );

RAISERROR('Updated %d rows from W&S to C02/P02/S02', 0, 1, @@ROWCOUNT) WITH NOWAIT; 
GO

/*
This script converts #IncomeCalYr to tax year format mth_01_amt = April and so on
*/

/* Create the equivalent of the data.income_tax_yr table */
DROP TABLE IF EXISTS #IncomeTaxYr;
CREATE TABLE #IncomeTaxYr (
    year_nbr SMALLINT NOT NULL
    ,snz_uid INT NOT NULL
    ,snz_ird_uid INT NOT NULL
    ,snz_employer_ird_uid INT NOT NULL
    ,income_source_code CHAR(7) NOT NULL
    ,withholding_type_code CHAR(1) NOT NULL
    ,mth_01_amt DECIMAL(19,2) NULL
    ,mth_02_amt DECIMAL(19,2) NULL
    ,mth_03_amt DECIMAL(19,2) NULL
    ,mth_04_amt DECIMAL(19,2) NULL
    ,mth_05_amt DECIMAL(19,2) NULL
    ,mth_06_amt DECIMAL(19,2) NULL
    ,mth_07_amt DECIMAL(19,2) NULL
    ,mth_08_amt DECIMAL(19,2) NULL
    ,mth_09_amt DECIMAL(19,2) NULL
    ,mth_10_amt DECIMAL(19,2) NULL
    ,mth_11_amt DECIMAL(19,2) NULL
    ,mth_12_amt DECIMAL(19,2) NULL
    ,tot_yr_amt DECIMAL(19,2) NULL
);

/*
Build the income tax year table from cal year table
*/

INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code, withholding_type_code,
    mth_01_amt, mth_02_amt, mth_03_amt, mth_04_amt, mth_05_amt, mth_06_amt,
    mth_07_amt, mth_08_amt, mth_09_amt, mth_10_amt, mth_11_amt, mth_12_amt, 
    tot_yr_amt
)
SELECT inc.year_nbr AS year_nbr
       ,inc.snz_uid AS snz_uid
       ,inc.snz_ird_uid AS snz_ird_uid
       ,inc.snz_employer_ird_uid AS snz_employer_ird_uid
       ,inc.income_source_code AS income_source_code
       ,inc.withholding_type_code AS withholding_type_code
       ,ISNULL(inc.prev_apr_amt, 0) AS mth_01_amt 
       ,ISNULL(inc.prev_may_amt, 0) AS mth_02_amt
       ,ISNULL(inc.prev_jun_amt, 0) AS mth_03_amt
       ,ISNULL(inc.prev_jul_amt, 0) AS mth_04_amt
       ,ISNULL(inc.prev_aug_amt, 0) AS mth_05_amt
       ,ISNULL(inc.prev_sep_amt, 0) AS mth_06_amt
       ,ISNULL(inc.prev_oct_amt, 0) AS mth_07_amt
       ,ISNULL(inc.prev_nov_amt, 0) AS mth_08_amt
       ,ISNULL(inc.prev_dec_amt, 0) AS mth_09_amt
       ,ISNULL(inc.this_jan_amt, 0) AS mth_10_amt
       ,ISNULL(inc.this_feb_amt, 0) AS mth_11_amt
       ,ISNULL(inc.this_mar_amt, 0) AS mth_12_amt
       ,ISNULL(inc.prev_apr_amt, 0) + ISNULL(inc.prev_may_amt, 0) + ISNULL(inc.prev_jun_amt, 0) +
       ISNULL(inc.prev_jul_amt, 0) + ISNULL(inc.prev_aug_amt, 0) + ISNULL(inc.prev_sep_amt, 0) +
       ISNULL(inc.prev_oct_amt, 0) + ISNULL(inc.prev_nov_amt, 0) + ISNULL(inc.prev_dec_amt, 0) +
       ISNULL(inc.this_jan_amt, 0) + ISNULL(inc.this_feb_amt, 0) + ISNULL(inc.this_mar_amt, 0) AS tot_yr_amt
  FROM (SELECT ISNULL(this_yr.year_nbr, prev_yr.year_nbr + 1) AS year_nbr
               ,ISNULL(this_yr.snz_uid, prev_yr.snz_uid) AS snz_uid
               ,ISNULL(this_yr.snz_ird_uid, prev_yr.snz_ird_uid) AS snz_ird_uid
               ,ISNULL(this_yr.snz_employer_ird_uid, prev_yr.snz_employer_ird_uid) AS snz_employer_ird_uid
               ,ISNULL(this_yr.income_source_code, prev_yr.income_source_code) AS income_source_code
               ,ISNULL(this_yr.withholding_type_code, prev_yr.withholding_type_code) AS withholding_type_code
               ,ISNULL(prev_yr.mth_04_amt, 0) AS prev_apr_amt
               ,ISNULL(prev_yr.mth_05_amt, 0) AS prev_may_amt
               ,ISNULL(prev_yr.mth_06_amt, 0) AS prev_jun_amt
               ,ISNULL(prev_yr.mth_07_amt, 0) AS prev_jul_amt
               ,ISNULL(prev_yr.mth_08_amt, 0) AS prev_aug_amt
               ,ISNULL(prev_yr.mth_09_amt, 0) AS prev_sep_amt
               ,ISNULL(prev_yr.mth_10_amt, 0) AS prev_oct_amt
               ,ISNULL(prev_yr.mth_11_amt, 0) AS prev_nov_amt
               ,ISNULL(prev_yr.mth_12_amt, 0) AS prev_dec_amt
               ,ISNULL(this_yr.mth_01_amt, 0) AS this_jan_amt
               ,ISNULL(this_yr.mth_02_amt, 0) AS this_feb_amt
               ,ISNULL(this_yr.mth_03_amt, 0) AS this_mar_amt
          FROM #IncomeCalYr this_yr
          FULL JOIN #IncomeCalYr prev_yr
               ON prev_yr.year_nbr = this_yr.year_nbr - 1
              AND prev_yr.snz_uid = this_yr.snz_uid
              AND prev_yr.snz_ird_uid = this_yr.snz_ird_uid
              AND prev_yr.snz_employer_ird_uid = this_yr.snz_employer_ird_uid
              AND prev_yr.income_source_code = this_yr.income_source_code
              AND prev_yr.withholding_type_code = this_yr.withholding_type_code
        ) inc
 WHERE (0 <> ISNULL(inc.prev_apr_amt, 0) + ISNULL(inc.prev_may_amt, 0) + ISNULL(inc.prev_jun_amt, 0) +
           ISNULL(inc.prev_jul_amt, 0) + ISNULL(inc.prev_aug_amt, 0) + ISNULL(inc.prev_sep_amt, 0) +
           ISNULL(inc.prev_oct_amt, 0) + ISNULL(inc.prev_nov_amt, 0) + ISNULL(inc.prev_dec_amt, 0) +
           ISNULL(inc.this_jan_amt, 0) + ISNULL(inc.this_feb_amt, 0) + ISNULL(inc.this_mar_amt, 0))
   AND inc.year_nbr BETWEEN 2000 AND year(GETDATE()); 

RAISERROR('Inserted %d rows converting cal_yr to tax_yr', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
This script adds self employment data from tables other than EMS into #IncomeTaxYr table
*/

/*
Add self employed income from IR3 to tax_yr table - Sole trader
 Adds the (S00) - Sole Trader income from IR3
 Rules: (A) IR3 net profit amount <> 0
*/
INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
    withholding_type_code, tot_yr_amt
)
SELECT YEAR(ir3.return_period_date) AS year_nbr
       ,MIN(ir3.snz_uid) AS snz_uid
       ,MIN(ir3.snz_ird_uid) AS snz_ird_uid
       ,MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid
       ,'S00-IR3' AS income_source_code
       ,'X' AS withholding_type_code
       ,SUM(ir3.net_profit_amt) AS tot_yr_amt
  FROM #CombinedIR3 ir3
 WHERE ir3.net_profit_amt <> 0
   AND YEAR(ir3.return_period_date) BETWEEN 2000 AND year(GETDATE())
 GROUP BY YEAR(ir3.return_period_date), ir3.snz_ird_uid;

RAISERROR('Inserted %d rows for S00-IR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
 Add self employed income from IR4 to tax_yr table
 Adds the (C00) - Company director/shareholder income from IR4S
 Rules: (A) IR■■■■■■■■ exists
 */

INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
    withholding_type_code, tot_yr_amt
)
SELECT YEAR(ir4.ir_ir4_return_period_date) AS year_nbr
       ,MIN(ir4.snz_uid) AS snz_uid
       ,MIN(ir4.snz_ird_uid) AS snz_ird_uid
       ,MIN(ir4.snz_employer_ird_uid) AS snz_employer_ird_uid
       ,'C00-IR4' AS income_source_code
       ,'X' AS withholding_type_code
       ,SUM(ir4.ir_ir4_tot_sholder_sal_809_amt) AS tot_yr_amt
  FROM $(idicleanversion).ir_clean.ird_attachments_ir4s ir4
 WHERE EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = ir4.snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               )
   AND YEAR(ir4.ir_ir4_return_period_date) BETWEEN 2000 AND year(GETDATE()) 
   AND ir4.ir_ir4_income_imp_ind <> 'Y'  /* Exclude the 'imputed' incomes as they be wrong */
   AND ir4.ir_ir4_tot_sholder_sal_809_amt <> 0
 GROUP BY YEAR(ir4.ir_ir4_return_period_date), ir4.snz_ird_uid, ir4.snz_employer_ird_uid;

RAISERROR('Inserted %d rows for C00-IR4', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Add self employed income from IR20 to tax_yr table
 Adds the (P00) - Partnership income from I20
 Rules: (A) IR20 tot share income <> 0
*/

INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
    withholding_type_code, tot_yr_amt
)
SELECT YEAR(ir20.ir_ir20_return_period_date) AS year_nbr,
       MIN(ir20.snz_uid) AS snz_uid,
       MIN(ir20.snz_ird_uid) AS snz_ird_uid,
       MIN(ir20.snz_employer_ird_uid) AS snz_employer_ird_uid,
       'P00-IR7' AS income_source_code,
       'X' AS withholding_type_code,
       SUM(ir20.ir_ir20_tot_share_of_inc_865_amt) AS tot_yr_amt
  FROM $(idicleanversion).ir_clean.ird_attachments_ir20 ir20
 WHERE ir20.ir_ir20_tot_share_of_inc_865_amt <> 0
   AND EXISTS ( SELECT 1
                  FROM $(idicleanversion).ir_clean.ird_customers cus
                 WHERE cus.snz_ird_uid = ir20.snz_ird_uid
                   AND cus.ir_cus_entity_type_code = 'I'
               )
   AND EXISTS ( SELECT 1
                  FROM #CombinedIR3 ir3
                 WHERE ir3.snz_ird_uid = ir20.snz_ird_uid
                   AND ir3.return_period_date = ir20.ir_ir20_return_period_date
                   AND ir3.tot_pship_income_amt <> 0
               )
 AND YEAR(ir20.ir_ir20_return_period_date) BETWEEN 2000 AND year(GETDATE()) 
 GROUP BY YEAR(ir20.ir_ir20_return_period_date), ir20.snz_ird_uid, ir20.snz_employer_ird_uid;

RAISERROR('Inserted %d rows for P00-IR7', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Add self employed income from IR3 to tax_yr table - witholding payments
 Adds the (WHP-IR3) - Other income from IR3
 Rules: (A) IR3 tot_wholding_paymnts_amt <> 0
*/

INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
   withholding_type_code, tot_yr_amt
)
SELECT YEAR(ir3.return_period_date) AS year_nbr,
       MIN(ir3.snz_uid) AS snz_uid,
       MIN(ir3.snz_ird_uid) AS snz_ird_uid,
       MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid,
       'WHP-IR3' AS income_source_code,
       'W' AS withholding_type_code,
       SUM(ir3.tot_wholding_paymnts_amt) AS tot_yr_amt
  FROM #CombinedIR3 ir3
 WHERE ir3.tot_wholding_paymnts_amt <> 0
   AND YEAR(ir3.return_period_date) BETWEEN 2000 AND year(GETDATE())
 GROUP BY YEAR(ir3.return_period_date), ir3.snz_ird_uid;

RAISERROR('Inserted %d rows for WHP-IR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
 Deleting wages and salary from #IncomeTaxYr table as it is not self employment. All remaining data is self employment
*/

DELETE FROM #IncomeTaxYr 
  FROM #IncomeTaxYr pop
 WHERE pop.income_source_code IN ('W&S-EMS')

RAISERROR('Deleted %d ■■■■■■■■■■■, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Update the table to aggregate income sources from #IncomeTaxYr table
We don't care about the employer or withholding type code
So create a final table that aggregates on these
Note: data stored as tax year mth_01_amt = April and so on
*/

DROP TABLE IF EXISTS #AggregateIRIncome; 
CREATE TABLE #AggregateIRIncome (
    year_nbr SMALLINT NOT NULL,
    snz_uid INT NOT NULL,
    income_source_code CHAR(7) NOT NULL
	,mth_01_amt DECIMAL(19,2) NOT NULL
    ,mth_02_amt DECIMAL(19,2) NOT NULL
    ,mth_03_amt DECIMAL(19,2) NOT NULL
    ,mth_04_amt DECIMAL(19,2) NOT NULL
    ,mth_05_amt DECIMAL(19,2) NOT NULL
    ,mth_06_amt DECIMAL(19,2) NOT NULL
    ,mth_07_amt DECIMAL(19,2) NOT NULL
    ,mth_08_amt DECIMAL(19,2) NOT NULL
    ,mth_09_amt DECIMAL(19,2) NOT NULL
    ,mth_10_amt DECIMAL(19,2) NOT NULL
    ,mth_11_amt DECIMAL(19,2) NOT NULL
    ,mth_12_amt DECIMAL(19,2) NOT NULL
    ,tot_yr_amt DECIMAL(19,2) NOT NULL
);

INSERT INTO #AggregateIRIncome WITH (TABLOCK) (
    year_nbr, snz_uid, income_source_code, 
	mth_01_amt, mth_02_amt, mth_03_amt, mth_04_amt, mth_05_amt, mth_06_amt,
    mth_07_amt, mth_08_amt, mth_09_amt, mth_10_amt, mth_11_amt, mth_12_amt, 
	tot_yr_amt
)
SELECT year_nbr,
       MIN(snz_uid) AS snz_uid,
       income_source_code,
	   sum(ISNULL(mth_01_amt,0)), 
	   sum(ISNULL(mth_02_amt,0)), 
	   sum(ISNULL(mth_03_amt,0)), 
	   sum(ISNULL(mth_04_amt,0)), 
	   sum(ISNULL(mth_05_amt,0)), 
	   sum(ISNULL(mth_06_amt,0)),
	   sum(ISNULL(mth_07_amt,0)), 
	   sum(ISNULL(mth_08_amt,0)), 
	   sum(ISNULL(mth_09_amt,0)), 
	   sum(ISNULL(mth_10_amt,0)), 
	   sum(ISNULL(mth_11_amt,0)), 
	   sum(ISNULL(mth_12_amt,0)),
       SUM(ISNULL(tot_yr_amt,0)) AS tot_yr_amt
  FROM #IncomeTaxYr
  WHERE snz_uid IS NOT NULL
 GROUP BY year_nbr, snz_uid, income_source_code;
 
RAISERROR('Inserted %d rows when aggregating', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Delete double counting of WHP income from EMS when it exists in IR3.
*/

DELETE FROM #AggregateIRIncome
  FROM #AggregateIRIncome pop
 WHERE EXISTS (SELECT 1
                 FROM #AggregateIRIncome pop2
				WHERE pop.snz_uid = pop2.snz_uid
				  AND pop.year_nbr = pop2.year_nbr
				  AND pop2.income_source_code = 'WHP-IR3')
       AND pop.income_source_code IN ('WHP-EMS', 'C02-EMS', 'P02-EMS', 'S02-EMS')

RAISERROR('Deleted %d ■■■■■■■■■■■, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Create a table using #AggregateIRIncome
Data converted from tax year to calendar year format mth_01_amt = January and so on
*/

DROP TABLE IF EXISTS $(targetschema).$(projprefix)_self_employment; 
CREATE TABLE $(targetschema).$(projprefix)_self_employment (
	 snz_uid INT NOT NULL
	,period_start_date DATE NOT NULL
	,period_end_date DATE NOT NULL
	,source_frequency CHAR(7) NOT NULL
	,income_source CHAR(7) NOT NULL
	,taxable_income_indicator TINYINT NOT NULL 
	,gross_income DECIMAL(19,2) NULL
	);

WITH table_unpivot as (
SELECT 
[snz_uid]
,month_nbr
,year_nbr
,CASE WHEN income_source_code LIKE ('%EMS') THEN 'monthly' /* EMS data is monthly format */
	WHEN income_source_code NOT LIKE ('%EMS') THEN 'annual'  /* IR3, IR4, IR7, IR20 data is tax year format */
	END as source_frequency
,income_source_code as income_source
,1 as taxable_income_indicator
,gross_income
FROM (
select 
	snz_uid
	,year_nbr
	,income_source_code
	,mth_01_amt, mth_02_amt, mth_03_amt, mth_04_amt, mth_05_amt, mth_06_amt,mth_07_amt, mth_08_amt, mth_09_amt, mth_10_amt, mth_11_amt, mth_12_amt,tot_yr_amt
FROM #AggregateIRIncome
  ) source_table
  UNPIVOT
	(gross_income FOR month_nbr IN
		(mth_01_amt, mth_02_amt, mth_03_amt, mth_04_amt, mth_05_amt, mth_06_amt,mth_07_amt, mth_08_amt, mth_09_amt, mth_10_amt, mth_11_amt, mth_12_amt,tot_yr_amt)
	) AS unpvt
)
,table_concat as (
SELECT *
FROM table_unpivot t1
WHERE CONCAT(month_nbr,source_frequency) != 'tot_yr_amtmonthly' /* excluding rows that are just the sum of EMS monthly amounts */
)
,table_months as (
select 
	tx.*
	,CASE WHEN month_nbr = 'mth_01_amt' THEN '04'
		WHEN month_nbr = 'mth_02_amt' THEN '05'
		WHEN month_nbr = 'mth_03_amt' THEN '06'
		WHEN month_nbr = 'mth_04_amt' THEN '07'
		WHEN month_nbr = 'mth_05_amt' THEN '08'
		WHEN month_nbr = 'mth_06_amt' THEN '09'
		WHEN month_nbr = 'mth_07_amt' THEN '10'
		WHEN month_nbr = 'mth_08_amt' THEN '11'
		WHEN month_nbr = 'mth_09_amt' THEN '12'
		WHEN month_nbr = 'mth_10_amt' THEN '01'
		WHEN month_nbr = 'mth_11_amt' THEN '02'
		WHEN month_nbr = 'mth_12_amt' THEN '03'
		WHEN month_nbr = 'tot_yr_amt' THEN '03' /* always annual. so start date will always be 1-Apr prev year */
	END as month_cal_yr
	FROM table_concat tx
)
,table_calyr as (
select 
	t2.*
	,CASE WHEN month_cal_yr in (04,05,06,07,08,09,10,11,12) THEN year_nbr-1 /* previous year number */
		WHEN month_cal_yr in (01,02,03) THEN year_nbr /* current year number */
	END as year_cal_yr
	FROM table_months t2
)
 INSERT INTO $(targetschema).$(projprefix)_self_employment 

select snz_uid
,CASE WHEN month_nbr = 'tot_yr_amt' THEN DATEFROMPARTS(year_cal_yr-1, month_cal_yr+1, 01) /* start date is 1 Apr previous year */
	WHEN month_nbr != 'tot_yr_amt' THEN DATEFROMPARTS(year_cal_yr, month_cal_yr, 01) /* start date is start of month */
END as 'period_start_date'
,EOMONTH(DATEFROMPARTS(year_cal_yr, month_cal_yr, 01)) as 'period_end_date' /* end date is end of month */
,source_frequency
,income_source
,taxable_income_indicator
,gross_income
FROM table_calyr
WHERE gross_income <> 0;

RAISERROR('Created table in Sandpit with %d ■■■■■■■■■■■, @@ROWCOUNT) WITH NOWAIT;
GO

ALTER TABLE $(targetdb).$(targetschema).$(projprefix)_self_employment
REBUILD PARTITION = ALL WITH (DATA_COMPRESSION=PAGE);

DROP TABLE IF EXISTS #CombinedIR3;
DROP TABLE IF EXISTS #IncomeCalYr;
DROP TABLE IF EXISTS #tmp_sei_individual;
DROP TABLE IF EXISTS #IncomeTaxYr;
DROP TABLE IF EXISTS #AggregateIRIncome