Module output
SQL::[IDI_Community].[cm_read_income_t1_wages_salary].income_t1_wages_salary_YYYYMM
SAS: libname cmti ODBC dsn=idi_community_srvprd schema=cm_read_income_t1_wages_salary; proc print data = cmti.income_t1_wages_salary_YYYYMM
How to access a code module in the Data Lab : Read here
Purpose of Wages and Salary module
The purpose of this module is to construct spells of income from wages and salary for people in employed in New Zealand.
Note: Wages and Salary are paid by NZ registered employers to its employees. This does not imply the employees are physically in New Zealand.
Key concepts
Wages and salary income
Wages and salary income is income received from all current and previous wage and salary jobs held over the reference period, and any job-related bonuses, commissions, redundancies or other taxable income such as honoraria or directors fees.
Total income spell
An income spell is the minimum payment period per income type. Wages and salary in practice may be paid in a daily, weekly, fortnightly, monthly spell etc. Employers do not need to file if they do not pay their employees during their regular pay cycle. See Payday filing for more information on payday filing.
Source frequency
Source frequency is the frequency with which an income type is stored in the source data. Source frequency for wages and salary in the ir_ems table is monthly.
Gross income amount
Gross income amount is the taxable income, before tax, that a person earns.
Observed tax
Tax recorded as paid to IRD as part of the income filing process. Observed tax data for wages and salary is available in the IR EMS table as PAYE. This is the amount of tax that has been has been deducted from source (eg PAYE and tax filing).
Note: observed tax deduction is not the same as actual tax liability since this can only be calculated once all income sources have been identified over a tax period. This information has not been included in the initial version of the module.
Granularity
The modules are setup to be event format: One row per event, events have a person, start and end dates, and descriptive information. The variables below will uniquely identify each event in the above table:
Entity id: snz uid
Period: start end, end date
Event information: income source, taxable income indicator
Data sources
Employer monthly schedule: ir_clean.ird_ems
Employer monthly schedule (EMS) table contains employee level data from the EMS return. EMS is available in the LBD/IDI. This is a monthly tax return filed by all businesses with paid employees, which summarises the monthly wage and salary payments made to each of their employees, and the ‘pay-as-you-earn’ (PAYE) income tax deductions made. EMS is a mandatory reporting requirement for all employing firms.
Since April 2019, Inland Revenue has shifted from monthly filings under the EMS to compulsory payday filing under the Employment Information - Employee (EI-E) system. This follows a transition period since April 2018 in which firms could select whether to file monthly or through the payday filing system. For consistency, Stats NZ has made a processed version of the EI-E data available, allocating payday information to calendar months and including these records as part of the EMS tables.
Note:
-
Employers are able to file late returns and/or amend EMS returns relating to prior periods.
-
The return period associated with an EMS is not always the same as the month in which a person was employed because it records the month in which they were paid.
-
Regular earnings of some self-employed persons can be reported in the EMS as wages or salaries. The earnings of independent contractors in the EMS are known as withholding or schedular payments. (These are excluded from this module)
-
Each record in the EMS corresponds to a job (an employer-employee relationship) and includes the employee’s tax code and employment start or end dates if they are in the month in which they were paid.
-
The EMS form is also used by government agencies to report government transfer payments to individuals that are taxed at source. They include income-tested benefits, New Zealand superannuation, student allowances, paid parental leave, and accident compensation payments.
-
No ‘cleaning’ is done before the data leaves IR in the EIE file. The data is as it is submitted by employers and/or drawn directly from IRs admin system. If errors in the information provided by employers is identified (either by IR or the employer), then the underlying record will be updated, The next file will show this change. All data points in the files supplied by IR are either drawn directly from a return/other ‘file’ a customer has submitted, or it is system generated. From the EIE file for example, information about the type of return, processing date, return version, customer type/subtype etc is system generated. Information about pay periods, employment periods, earnings, deductions, and contributions come directly from the return.
-
Supplementary tables used to identify and exclude income that is not classified as wages and salary
IR3: ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014, ir_ir3_2013_to_2020
IR3 is the income tax return for individuals used to confirm the amount of personal income tax to be paid at the end of each tax year with data available from March 2000 to present. The main purpose of IR3 is to include any monetary payments that have not been taxed at source. IR3 is used for self-employment (filed annually by sole traders) and includes non-zero partnership, or shareholder salary income, as well as rental income. IR3 is mandatory for individuals who earned more than $200 (before tax) income other than salary, wages, interest, dividends, and/or taxable Māori authority distributions. IR3 is filed annually.
There may be a substantial lag in reporting IR3 data as individual businesses and those using tax agents have up to a year after the end of the financial year to file an annual tax return with IR.
IR3 data comes from a different table to the EIE data but is still direct information submitted by a customer
Note: 3 IR2 tables (2 ad-hoc tables and 1 clean) are required for creating the timeseries of IR3 data. IR3s belonging to people who reported no self-employment, partnership or shareholder income were excluded for tax years prior to 2019. They are in the ad-hoc tables.
IR4: ird_attachments_ir4s
IR4 is filed by companies and includes remuneration income paid to shareholders, directors, and relatives of shareholders with data available from March 1995 to present. This table holds information on non-zero shareholder income and shows non-PAYE salary payments made to working proprietors of businesses. IR4 and IR4s is required from all active NZ resident companies. IR4S is filed annually.
IR4 data comes from a different table to the EIE data but is still direct information submitted by a customer
IR20: ird_attachments_ir20
IR7 (formerly IR20) is filed for partnership and look-through companies with data available from March 1995 to present. It is filed annually. This table contains information about zero and non-zero partnership income
ird_customers
The IR customers file records all Inland Revenue’s entities, their description, status (active, deceased, etc), commencement/cessation, student loan/tax residency status and business industry code where relevant.
A full file is extracted each month containing all new records and changes to existing records.
The information in this file is not derived from an IR form. It is a combination of different fields that are updated when this specific customer information changes. IR can be notified by post, in person, by phone, email, myIR, secure mail.
The population is all entities that engage with IR. At the highest level it can be:
- a person
- a company
- an incorporated or unincorporated society or club
- a joint venture or partnership
- a trustee or a trust or estate
- a public or local authority.
ird_tax_registrations
The IR tax registration file records the status of an entities tax type registration with IR. It contains start/end dates, tax return filing frequencies, if the entity has a tax agent, if they are a corporate entity and if they file electronically.
The population is all entities that engage with IR. At the highest level it can be:
- a person
- a company
- an incorporated or unincorporated society or club
- a joint venture or partnership
- a trustee or a trust or estate
- a public or local authority.
ird_cross_reference
The IR cross reference file links two IRD numbers, documents the period they are linked and describes the relationship between them.
The information in this file is not derived from an IR form. It is a combination of different fields that are updated when relationships between entities change. A change can be submitted by post, in person, by phone, email, myIR, secure mail.
The population is individual to business and business to business relationships between IR entities
Practical notes
Period
Wages and Salary information is from 1999 onwards.
Identification of Earnings not liable to ACC levy and bonuses
All employees must pay an ACC earners’ levy to cover the cost of non work-related injuries. IR collect this on behalf of the ACC. The earners’ levy has been built into the PAYE tables and is deducted along with the tax. Almost all earnings subject to PAYE are liable for the levy with the exception of schedular payments, retirement payments, redundancy payments, benefits from an employee share scheme, jury fees, witness fees, and taxable pensions. See IR employers guide for more information. Inclusion of earnings not liable for ACC levy is not essential for this module. Additionally, we will not identify bonus in this module as that is not easy to identify in IDI due to quality of the lumpsum payment flag, as not all employers update it consistently.
Other exclusions from this module
We are not able to identify and quantify income support payment off-sets, debt repayments and re-directions.
Negative amounts
No exclusions made from EMS for negative amounts.
Seasonal workers identification
Not applicable total income module. Refer to employment spells module for more information.
Staged approach to include tax within the total income module
Stage 1 of module creation will only include gross income amounts. Stage 2 (pending) will include observed tax deducted at source since they are available in EMS and MSD tables. This can be used to calculate the tax at a tax year level which includes tax deducted at source and from tax filings. Stage 3 (pending) aims to derive tax at a monthly level. This is complex and will require support from SME to apply rules to derive tax. This can be used to calculate tax for any time period.
Sources of validation
Wage and salary distributions for individuals.xls (Wage and salary statistics datasets)
The module output matches well with the validation dataset
References
- IDI Derived tables https://vhin.co.nz/wp-content/uploads/2017/04/data-dictionary-for-ir-derived-tables-in-the-idi.docx
- Document : experimental-income-estimates-from-linked-admin-data-methods-and-results-revised.pdf Experimental estimates of income from linked administrative data | Stats NZ
- Payday filing
- Tax codes for individuals (ird.govt.nz) About tax codes
- Register as an employer (ird.govt.nz) Register as an employer
- Tax schedular payments and file employment information (ird.govt.nz) (Tax schedular payments and file employment information)
Key Contacts
Role | Organisation | Person |
---|---|---|
Dev lead | MBIE | Eric Krassoi Peach |
Module Coder | Nicholson Consulting | Linda Martis |
Code Supply | Stats NZ MSD Independent contractor |
Ivan Welsh Marc de Boer Keith McLeod |
Peer review (code) | Stats NZ | Rodney Jer |
Peer review (documentation) | IRD IRD |
Joanne Butterfield Scott Anderson |
Module Steward | TBD | TBD |
Module business rules
Defining spell start and end dates
Start and end dates are based on EMS table. The return period column in EMS is considered to be the end date and the start of that month will be the start date.
Person identifier snz_uid
A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.
Minimum income threshold
Currently no minimum threshold applied.
Interoperability with other modules
This module is in the business landscape so could be that it is most closely associated with business profile modules and employment spells. It is also in the socioeconomic landscape where income is a determinant of positive social outcomes, so associated with social housing, education modules, health modules, driving modules, employment assistance modules etc.
Taxable income indicator
Taxable income indicator to be hard coded in modules. Within wages and salary module, all income is taxable.
Code logic
The code is based on the rules used for APC and derived data tables in the IDI.
Steps 1 to 3 are mainly collating data from various IDI source tables. Rest of the steps are identifying wages and salary and excluding self-employment paye deducted income.
-
We use ir_ems table to obtain wages and salary data with the intention of identifying wages and salary income and excluding self-employed income , i.e, sole traders/partners/directors earning PAYE deducted income.
1.1 Extract all monthly data from ir_ems that have ir_ems_income_source_code ‘W&S’, which stands for wages and salary, and where return period between 2000 and 2024 (latest year). Period starts at 2000 as first records of the EMS table begin in April 1999.
1.2 Withholding type code will be P (paye deductions) or W (withholding payments). Where an individual has both codes for a given period, we select P. -
Identify self-employed individuals from 3 IR3 tables, particularly sole traders earning PAYE income, in order to exclude them from ir_ems data. To do so, we will
2.1 Extract data from IR3 tables, specifically ir_clean.ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014 and ir_ir3_2013_to_2020 from IDI_Adhoc. Table prioritisation will occur with ir_clean.ird_rtns_keypoints_ir3 being the first source of data, followed by adhoc 2000_to_2014 and adhoc 2013_to_2020.
2.2 We will use data from net profit column.
2.3 Remove duplicates in tables by using maximum of location number, return_version_number and ird_timestamp_date. -
Identify self-employed individuals from ird cross reference, ir20 and ir4s tables, particularly partners and company directors, in order to exclude them from ir_ems data. To do so, we will
3.1 Extract data from ird_cross_reference where ir_xrf_reference_type_code (code for shareholder/partner etc) is PTR (partner), DIR, SHR or EOH (company dorector), and customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation).
3.2 Extract data from ird_attachments_ir20 for those individuals whose customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation). (These individuals are partners).
3.3 Extract data from ird_attachments_ir4s for those individuals whose customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation). (These individuals are company directors). -
Exclude sole traders earning PAYE deducted income from ir_ems data where all the below conditions are met
4.1 income source code is W&S and withholding type code is ‘P’ in ir_ems data (implying PAYE deductions), and
4.2 snz_ird_uid is equal to snz_employer_ird_uid (implying payee equal to payer), and
4.3 customer entity type code from ird_customers is ‘I’ (implying payee is an individual, not an organisation), and
4.4 Either
4.4.1 Tax registration type code is GST in ird_tax_registration table, within the time peroid of interest, or
4.4.2 Net profit exists within IR3 data extracted above in Step 2 within the time period of interest -
Exclude company directors and partners earning PAYE deducted income from ir_ems data where all the below conditions are met
5.1 Individuals identified in step 3 exist in the ir_ems data (matched using unique identifiers and unique employer identifiers within the time period of interest, and
5.2 Payer not equal to payee, i.e., the individuals do not have the same unique identifiers (snz_uid) and unique employer identifiers (snz_employer_uid), and
5.3 withholding type code is ‘P’ in ir_ems data (implying PAYE deductions) and income source code is W&S in ir_ems_data (implying wages and salary) -
The remaining individuals from the ir_ems tables will be those who earn wages and salary.
-
Aggregate the ir_ems data by year, snz_uid and income source code and sum up the monthly amounts. These monthly amounts will be the wages and salary amounts.
Parameters
The following parameters should be supplied to this module to run it in the database:
- {targetdb}: The SQL database on which the spell datasets are to be created.
- {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
- {targetschema}: The project schema under the target database into which the spell datasets are to be created.
- {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.
Datasets dependencies:
[IDI_Clean_YYYYMM].[ir_clean].[ird_ems]
[IDI_Clean_YYYYMM].[ir_clean].[ird_rtns_keypoints_ir3]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2000_to_2014]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2013_to_2020]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir4s]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir20]
[IDI_Clean_YYYYMM].[ir_clean].[ird_customers]
[IDI_Clean_YYYYMM].[ir_clean].[ird_tax_registrations]
[IDI_Clean_YYYYMM].[ir_clean].[ird_cross_reference]
Outputs
{targetdb}.{targetschema}.{projprefix}_wages_salary_{refresh}
Variable Descriptions
Aspect | Variables | Description |
---|---|---|
Entity | snz_uid | Current refresh snz_uid |
Period | period_start_date period_end_date |
Payment start date Payment end date |
Event | source_frequency | Frequency of income recorded in source data Wages and Salary is recorded in EMS as monthly income |
Information | income_source | Type of income being derived in this module W&S-EMS means wages and salary from EMS table |
taxable_income_indicator | Indicator for whether income is taxable 1:income is taxable, 0:income is not taxable Wages and salary are taxable |
|
gross_income | Income before tax |
Module Version & Change History
Date | Version | Comments |
---|---|---|
2023-07 | Original code sourced from APC code created by Stats NZ (Ivan Welsh) in the IDI | |
2024-06 | Initial | streamlining and preparation for code module by Linda Martis, Nicholson Consulting |
Code
:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
/* Assign the target database to which all the components need to be created in. */
use $(targetdb);
/*
This script combines a number of different source of IR3 data into a single table.
Sources:IR3 table, adhoc 2000_to_2014, adhoc 2013_to_2020
Duplicates are removed from the data by filtering for the maximum return_version_nbr,
ird_timestamp_date, source_code, snz_unique_nbr and finally location_nbr.
Create the storage table. This follows the format of the refresh IR3 table,
but without the leading ir_ir3_ prefix.
*/
DROP TABLE IF EXISTS #CombinedIR3;
CREATE TABLE #CombinedIR3 (
snz_uid INT NOT NULL
,snz_ird_uid INT NOT NULL
,location_nbr INT NOT NULL
,return_period_date DATE NOT NULL
,snz_unique_nbr BIGINT NOT NULL
,return_version_nbr SMALLINT NULL
,net_profit_amt DECIMAL(13,2) NULL
,income_imp_ind VARCHAR(1) NOT NULL
,tot_wholding_paymnts_amt DECIMAL(13,2) NULL
,sum_all_income_amt DECIMAL(13,2) NULL
,ird_timestamp_date DATE NULL
/* 1 for refresh IR3 table, 2 for adhoc 2000_to_2014, 3 for adhoc 2013_to_2020 */
,source_table_code TINYINT NOT NULL
,PRIMARY KEY (snz_ird_uid, return_period_date)
);
/* Combine the source tables into one */
INSERT INTO #CombinedIR3
SELECT con.snz_uid, con.snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr,
ir_ir3_net_profit_amt,
ir_ir3_income_imp_ind,
ir_ir3_tot_wholding_paymnts_amt,
isnull(ir_ir3_net_profit_amt,0)+
isnull(ir_ir3_tot_wholding_paymnts_amt,0)
AS sum_all_income_amt,
ir_ir3_ird_timestamp_date, source_table_code
FROM(
SELECT *, ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, ir_ir3_return_period_date ORDER BY ir_ir3_return_version_nbr desc, ir_ir3_ird_timestamp_date desc, source_table_code asc, ir_ir3_snz_unique_nbr desc, ir_ir3_location_nbr desc) AS row_nbr
FROM (
SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr,
ir_ir3_net_profit_amt,
ir_ir3_income_imp_ind,
ir_ir3_tot_wholding_paymnts_amt,
ir_ir3_ird_timestamp_date, 1 AS source_table_code
FROM $(idicleanversion).ir_clean.ird_rtns_keypoints_ir3
UNION ALL
SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr,
ir_ir3_net_profit_amt,
ir_ir3_income_imp_ind,
ir_ir3_tot_wholding_paymnts_amt,
ir_ir3_ird_timestamp_date, 2 AS source_table_code
FROM IDI_Adhoc.clean_read_IR.ir_ir3_2000_to_2014
UNION ALL
SELECT snz_ird_uid, location_number AS ir_ir3_location_nbr, period AS ir_ir3_return_period_date,
ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, period ORDER BY return_version, selfemployed_income desc, withholding_payment desc) AS ir_ir3_snz_unique_nbr,
return_version AS ir_ir3_return_version_nbr,
selfemployed_income AS ir_ir3_net_profit_amt,
'N' AS ir_ir3_income_imp_ind,
withholding_payment AS ir_ir3_tot_wholding_paymnts_amt,
CAST(timestamp AS DATE) AS ir_ir3_ird_timestamp_date, 3 AS source_table_code
FROM IDI_Adhoc.clean_read_IR.ir_ir3_2013_to_2020
) tu
) t5
INNER JOIN $(idicleanversion).security.concordance con
ON con.snz_ird_uid = t5.snz_ird_uid
WHERE row_nbr = 1
AND YEAR(ir_ir3_return_period_date) BETWEEN 2000 AND year(GETDATE());
RAISERROR('Inserted %d rows from IR3 into #CombinedIR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
/*
This script brings EMS data for specific income sources into one table.
Data is stored in calendar year format mth_01_amt = January and so on
*/
/* Create the equivalent of the data.income_cal_yr table */
DROP TABLE IF EXISTS #IncomeCalYr;
CREATE TABLE #IncomeCalYr (
year_nbr SMALLINT NOT NULL
,return_period DATE NOT NULL
,snz_uid INT NOT NULL
,snz_ird_uid INT NOT NULL
,snz_employer_ird_uid INT NOT NULL
,income_source_code CHAR(7) NOT NULL
,withholding_type_code CHAR(1) NOT NULL
,gross_income DECIMAL(19,2) NOT NULL
,PRIMARY KEY (snz_ird_uid, return_period, snz_employer_ird_uid, income_source_code, withholding_type_code)
);
/*
Populate the income cal with EMS data
*/
INSERT INTO #IncomeCalYr WITH (TABLOCK) (
year_nbr, return_period, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code, withholding_type_code, gross_income
)
SELECT YEAR(ir_ems_return_period_date) AS year_nbr
, EOMONTH(ir_ems_return_period_date) AS return_period
,MAX(snz_uid) AS snz_uid
,MAX(snz_ird_uid) AS snz_ird_uid
,snz_employer_ird_uid
,ir_ems_income_source_code + '-EMS' AS income_source_code
,MIN(ir_ems_withholding_type_code) AS withholding_type_code
,SUM(ir_ems_gross_earnings_amt) as gross_income
FROM $(idicleanversion).ir_clean.ird_ems
WHERE YEAR(ir_ems_return_period_date) BETWEEN 2000 - 1 AND year(GETDATE())
AND snz_uid > 0
AND ir_ems_income_source_code in ('W&S')
GROUP BY YEAR(ir_ems_return_period_date)
,EOMONTH(ir_ems_return_period_date)
,snz_ird_uid
,snz_employer_ird_uid
,ir_ems_income_source_code
RAISERROR('Inserted %d rows from EMS into #IncomeCalYr', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
/* Update W&S income when it is actually self employed income
This step adds the (S01) - Sole Trader recieving PAYE deducted income source
Rules: (A) W&S income
(B) PAYE
(C) Payee = Payer
(D) Paye is an individual
(E) Evidence of self employed: IR3 net profit <> 0 or GST registered
*/
UPDATE inc WITH (TABLOCK)
SET income_source_code = 'S01-EMS'
FROM #IncomeCalYr inc
WHERE inc.income_source_code = 'W&S-EMS' /* Wages & Salaries */
AND inc.withholding_type_code = 'P' /* PAYE */
AND inc.snz_ird_uid = inc.snz_employer_ird_uid /* payEE = payER */
AND EXISTS ( SELECT 1
FROM $(idicleanversion).ir_clean.ird_customers cus
WHERE cus.snz_ird_uid = inc.snz_ird_uid
AND cus.ir_cus_entity_type_code = 'I') /* payEE is an individual */
AND (EXISTS (SELECT 1
FROM $(idicleanversion).ir_clean.ird_tax_registrations reg /* GST registered */
WHERE reg.snz_ird_uid = inc.snz_ird_uid
AND reg.ir_treg_tax_type_code = 'GST'
AND inc.year_nbr BETWEEN YEAR(reg.ir_treg_treg_start_date)
AND YEAR(reg.ir_treg_treg_end_date)
)
OR EXISTS ( SELECT 1
FROM #CombinedIR3 ir3
WHERE ir3.snz_ird_uid = inc.snz_ird_uid
AND ir3.net_profit_amt <> 0 /* net profit non zero */
AND inc.year_nbr BETWEEN YEAR(ir3.return_period_date) - 1
AND YEAR(ir3.return_period_date)
)
)
RAISERROR('Updated %d rows from W&S to S01', 0, 1, @@ROWCOUNT) WITH NOWAIT;
/*
Build temp table to hold self employed income details from
cross reference, IR20 & IR4
*/
DROP TABLE IF EXISTS #tmp_sei_individual;
CREATE TABLE #tmp_sei_individual (
snz_ird_uid INT NOT NULL,
snz_employer_ird_uid INT NOT NULL,
start_cal_year_nbr INT NOT NULL,
end_cal_year_nbr INT NOT NULL,
sei_type_code CHAR(1) NOT NULL
);
INSERT INTO #tmp_sei_individual (
snz_ird_uid, snz_employer_ird_uid, start_cal_year_nbr, end_cal_year_nbr, sei_type_code
)
/* Relationships in cross ref table */
SELECT cln.ir_xrf_to_snz_ird_uid AS snz_ird_uid,
cln.ir_xrf_from_snz_ird_uid AS snz_employer_ird_uid,
cln.ir_xrf_first_year_nbr AS start_cal_year_nbr,
COALESCE(cln.ir_xrf_latest_year_nbr, YEAR(cln.ir_xrf_applied_date)) AS end_cal_year_nbr,
CASE
WHEN cln.ir_xrf_reference_type_code = 'PTR' THEN 'P'
WHEN cln.ir_xrf_reference_type_code IN ('DIR', 'SHR', 'EOH') THEN 'C'
ELSE 'U'
END
FROM $(idicleanversion).ir_clean.ird_cross_reference cln
WHERE cln.ir_xrf_reference_type_code IN ('PTR', 'DIR', 'SHR', 'EOH')
AND EXISTS ( SELECT 1
FROM $(idicleanversion).ir_clean.ird_customers cus
WHERE cus.snz_ird_uid = cln.ir_xrf_to_snz_ird_uid
AND cus.ir_cus_entity_type_code = 'I'
)
/* Relationships in IR20 table */
UNION
SELECT cln.snz_ird_uid AS snz_ird_uid,
cln.snz_employer_ird_uid AS snz_employer_ird_uid,
YEAR(cln.ir_ir20_return_period_date) - 1 AS start_cal_year_nbr,
YEAR(cln.ir_ir20_return_period_date) AS end_cal_year_nbr,
'P' AS sei_type_code
FROM $(idicleanversion).ir_clean.ird_attachments_ir20 cln
WHERE EXISTS ( SELECT 1
FROM $(idicleanversion).ir_clean.ird_customers cus
WHERE cus.snz_ird_uid = cln.snz_ird_uid
AND cus.ir_cus_entity_type_code = 'I'
)
/* Relationships in IR4 table */
UNION
SELECT cln.snz_ird_uid AS snz_ird_uid,
cln.snz_employer_ird_uid AS snz_employer_ird_uid,
1900 AS start_cal_year_nbr,
9999 AS end_cal_year_nbr,
'C' AS sei_type_code
FROM $(idicleanversion).ir_clean.ird_attachments_ir4s cln
WHERE EXISTS ( SELECT 1
FROM $(idicleanversion).ir_clean.ird_customers cus
WHERE cus.snz_ird_uid = cln.snz_ird_uid
AND cus.ir_cus_entity_type_code = 'I'
);
RAISERROR('Inserted %d rows for cross referencing', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
/*
Update W&S income when it is actually self employed income
This step adds the (C01) - Company director/shareholder recieving PAYE deducted income
and (P01) - Partner recieving PAYE deducted income
Rules: (A) W&S income
(B) PAYE
(C) Payee <> Payer
(D) Payee is an individual
(E) Evidence of self employed: IR20 or IR4 or cross ref
(F) Favour C02 over P02
*/
UPDATE inc WITH (TABLOCk)
SET income_source_code = CASE WHEN sei.sei_type_code = 'C' THEN 'C01-EMS'
WHEN sei.sei_type_code = 'P' THEN 'P01-EMS'
ELSE 'U01-EMS'
END
FROM #IncomeCalYr inc
JOIN ( SELECT snz_ird_uid, snz_employer_ird_uid,
start_cal_year_nbr, end_cal_year_nbr,
sei_type_code = MIN(sei_type_code) /* favour 'C' over 'P' */
FROM #tmp_sei_individual
GROUP BY snz_ird_uid, snz_employer_ird_uid, start_cal_year_nbr, end_cal_year_nbr
) sei ON sei.snz_ird_uid = inc.snz_ird_uid
AND sei.snz_employer_ird_uid = inc.snz_employer_ird_uid
AND inc.year_nbr BETWEEN sei.start_cal_year_nbr AND sei.end_cal_year_nbr
WHERE inc.income_source_code = 'W&S-EMS'
AND inc.withholding_type_code = 'P'
AND inc.snz_ird_uid <> inc.snz_employer_ird_uid;
RAISERROR('Updated %d rows from W&S to C01/P01', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
/*
Create a Sandpit table using #IncomeCalYr
Filtering for wages and salary only
All wages and salary stored in wide format
Note: data stored as tax year mth_01_amt = January and so on
*/
DROP TABLE IF EXISTS $(targetdb).$(targetschema).$(projprefix)_wages_salary;
CREATE TABLE $(targetdb).$(targetschema).$(projprefix)_wages_salary (
snz_uid INT NOT NULL
,period_start_date DATE NOT NULL
,period_end_date DATE NOT NULL
,source_frequency CHAR(7) NOT NULL
,income_source CHAR(7) NOT NULL
,taxable_income_indicator TINYINT NOT NULL
,gross_income DECIMAL(19,2) NULL
);
INSERT INTO $(targetdb).$(targetschema).$(projprefix)_wages_salary
SELECT snz_uid
,DATEFROMPARTS(year(return_period), month(return_period), 01) as period_start_date
,return_period AS period_end_date
,'monthly' as source_frequency
,income_source_code as income_source
,1 as taxable_income_indicator
,SUM(gross_income) AS gross_income
FROM #IncomeCalYr
WHERE snz_uid IS NOT NULL
and income_source_code in ('W&S-EMS')
and gross_income <> 0
GROUP BY snz_uid, return_period, income_source_code;
RAISERROR('Created table in Sandpit with %d rows', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
ALTER TABLE $(targetdb).$(targetschema).$(projprefix)_wages_salary
REBUILD PARTITION = ALL WITH (DATA_COMPRESSION=PAGE);
DROP TABLE IF EXISTS #CombinedIR3;
DROP TABLE IF EXISTS #IncomeCalYr;
DROP TABLE IF EXISTS #tmp_sei_individual