Outputs:
SQL: [IDI_Community].[inc_other_income].[other_income_202603]
SAS: libname cm_other_income dsn=IDI_community_srvprd schema=inc_other_income; run ;
How to access a code module in the Data Lab: Read here
Inputs:
| Dependency | Dependency Type |
|---|---|
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2000_to_2014] |
Source |
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2013_to_2020] |
Source |
[IDI_Clean_202603].[ir_clean].[ird_rtns_keypoints_ir3] |
Source |
[IDI_Clean_202603].[security].[concordance] |
Source |
Other income module
Module output
SQL:[IDI_Community].[cm_read_other_income].other_income_YYYYMM
SAS: libname cmti ODBC dsn=IDI_community_srvprd schema=cm_read_other_income; proc print data = cmti.other_income_YYYYMM
How to access a code module in the Data Lab : Read here
Purpose of Other income module
The purpose of this module is to construct the other income for people in New Zealand and their relevant income spells.
Within this module, other income includes:
- Overseas income
- Other income
Key concepts
Other income
Other income includes overseas income and other sources of personal income from sources not defined under investments. “Other sources of personal income” include any sources of income not reported elsewhere. These may include taxable sales of shares or property, financial arrangement income, cash jobs, barter transactions, illegal income, and employee share scheme income if not otherwise reported by the employer (Refer IR3G pages 39-41)
Other income does not include social transfers in kind such as public education or government-subsidised health care services. Also excluded are child support payments, reimbursement of expenses, money received from borrowing, contingent income, and unrealised income. Irregular payments such as lump sum inheritance payments are excluded.
Total income spell
A total income spell is the minimum period per income type. Other income in practice may be earned in a daily, weekly, fortnightly, monthly etc.
Source frequency
Source frequency is the frequency with which an income type is stored in the data. Source frequency for other income is annual.
Gross income amount
Gross income amount is the taxable income before tax that a person receives.
Observed tax
Tax recorded as paid to IRD in the IDI. This is the amount of tax that has been has been deducted from source (eg PAYE and tax filing). Observed tax data for other income can be sourced from IR3 table.
Note: observed tax deduction is not the same as actual tax liability since this can only be calculated once all income sources have been identified over a tax period. This information has not been included in the initial version of the module.
Granularity
The modules are setup to be event format: One row per event, events have a person, start and end dates, and descriptive information. The variables below will uniquely identify each event in the above table:
Entity id: snz uid
Period: start end, end date
Event information: income source, taxable income indicator
Data sources
IR3 :ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014, ir_ir3_2013_to_2020
IR3 is the income tax return for individuals used to confirm the amount of personal income tax to be paid at the end of each tax year with data available from March 2000 to present. The main purpose of IR3 is to include any monetary payments that have not been taxed at source.
IR3 is used for self-employment (filed annually by sole traders) and includes non-zero partnership, or shareholder salary income, as well as rental income. IR3 is mandatory for individuals who earned more than $200 (before tax) income other than salary, wages, interest, dividends, and/or taxable Maori authority distributions. IR3 is filed annually.
Starting from July 2017, IR has extended the supply of IR3 information to the IDI to include further information about investment income, including earnings from interest, dividend, estate trust, overseas, and other sources. Also included are non-taxable income from a range of tax credit entitlements, tax rebate, and student loan-liable income.
There may be a substantial lag in reporting IR3 data as individual businesses and those using tax agents have up to a year after the end of the financial year to file an annual tax return with IR.
Note: 3 IR2 tables (2 ad-hoc tables and 1 clean) are required for creating the timeseries of IR3 data. IR3s belonging to people who reported no self-employment, partnership or shareholder income were excluded for tax years prior to 2019. They are in the ad-hoc tables.
Practical notes
Period
The other income information is from 1999 onwards.
Negative amounts
No exclusions made for negative amounts.
Staged approach to include tax within the total income module
Stage 1 of module creation will only include gross income amounts. Stage 2 (pending) will include observed tax deducted at source since they are available in EMS and MSD tables. This can be used to calculate the tax at a tax year level which includes tax deducted at source and from tax filings. Stage 3 (pending) aims to derive tax at a monthly level. This is complex and will require support from SME to apply rules to derive tax. This can be used to calculate tax for any time period.
Sources of Validation
No validation dataset available, hence unable to validate the module output.
References
- Document : experimental-income-estimates-from-linked-admin-data-methods-and-results-revised.pdf https://www.stats.govt.nz/experimental/experimental-estimates-of-income-from-linked-administrative-data/
- My income is not taxed before I get paid My income is not taxed before I get paid
Key Contacts
| Role | Organisation | Person |
|---|---|---|
| Dev lead | MBIE | Eric Krassoi Peach |
| Module Coder | Nicholson Consulting | Linda Martis |
| Code Supply | Stats NZ MSD |
Ivan Welsh Marc de Boer |
| Peer review (code) | Stats NZ Stats NZ Stats NZ |
Michael Challands Dean Edwards Eriko Kamikubo-Gould |
| Peer review (documentation) | IRD IRD |
Joanne Butterfield Scott Anderson |
| Module Steward | TBD | TBD |
Module business rules
Defining spell start and end dates
Start and end dates are based on IR3 dates. The return period column in the tables is considered to be the end date. Start will be that start of the annual tax year (1-Apr).
Person identifier snz_uid
A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.
Minimum income threshold
Currently no minimum threshold applied.
Interoperability with other modules
This module is in the business landscape so could be that it is most closely associated with business profile modules and employment spells. It is also in the socioeconomic landscape where income is a determinant of positive social outcomes, so associated with social housing, education modules, health modules, driving modules, employment assistance modules etc.
Taxable income indicator
Taxable income indicator to be hard coded in modules. Within other income module, all income is taxable.
Code logic
The code is based on the rules used for APC and derived data tables in the IDI.
- Identify individuals earning overseas income and other income from 3 IR3 tables. To do so, we will
a. Extract data from IR3 tables, specifically ir_clean.ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014 and ir_ir3_2013_to_2020 from IDI_Adhoc. Table prioritisation will occur with ir_clean.ird_rtns_keypoints_ir3 being the first source of data, followed by adhoc 2000_to_2014 and adhoc 2013_to_2020.
b. We will use data which contains overseas income and other income amounts.
c. Remove duplicates in tables by using maximum of location number, return_version_number and ird_timestamp_date. - Using the data obtained above in point 1,
a. aggregate overseas income by person’s unique identifiers and time period. Relabel income as SEA-IR3.
b. aggregate other income by person’s unique identifiers and time period. Relabel income as OTH-IR3. - To summarise, all the labelled income above becomes other income.
Parameters
The following parameters should be supplied to this module to run it in the database:
- {targetdb}: The SQL database on which the spell datasets are to be created.
- {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
- {targetschema}: The project schema under the target database into which the spell datasets are to be created.
- {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.
Variable Descriptions
| Aspect | Variables | Description |
|---|---|---|
| Entity | snz_uid | Current refresh snz_uid |
| Period | period_start_date period_end_date |
Payment start date Payment end date |
| Event | source_frequency | Frequency of income recorded in source data Note that Other income is recorded in IR3 as annual income |
| Information | income_source | Type of income being derived in this module |
| Note that: - SEA-IR3 is overseas income from IR3 tables - OTH-IR3 is other income from IR3 tables |
||
| taxable_income_indicator | Indicator for whether income is taxable Note that: 1:income is taxable 0:income is not taxable Other income is taxable |
|
| gross_income | Income before tax |
Module Version & Change History
| Date | Version | Comments |
|---|---|---|
| 2023-07 | Original code sourced from APC code created by Stats NZ (Ivan Welsh) in the IDI | |
| 2024-06 | Initial | streamlining and preparation for code module by Linda Martis, Nicholson Consulting |
| 2024-10 | Namespace | Namespace clash iwth investment income module for temp tables #CombinedIR3 and #IncomeTaxYr changed to #otCombinedIR3 and #otIncomeTaxYr |
Code
:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
/* Assign the target database to which all the components need to be created in. */
use $(targetdb);
/*
This script combines a number of different source of IR3 data into a single table.
Sources:IR3 table, adhoc 2000_to_2014, adhoc 2013_to_2020
Duplicates are removed from the data by filtering for the maximum return_version_nbr,
ird_timestamp_date, source_code, snz_unique_nbr and finally location_nbr.
Create the storage table. This follows the format of the refresh IR3 table,
but without the leading ir_ir3_ prefix.
*/
DROP TABLE IF EXISTS #oiCombinedIR3;
CREATE TABLE #oiCombinedIR3(
snz_uid INT NOT NULL
, snz_ird_uid INT NOT NULL
, location_nbr INT NOT NULL
, return_period_date DATE NOT NULL
, snz_unique_nbr BIGINT NOT NULL
, return_version_nbr SMALLINT NULL
, income_imp_ind VARCHAR(1) NOT NULL
, overseas_income_amt DECIMAL(13,2) NULL
, other_income_amt DECIMAL(13,2) NULL
, sum_all_income_amt DECIMAL(13,2) NULL
, ird_timestamp_date DATE NULL
/* 1 for refresh IR3 table, 2 for adhoc 2000_to_2014, 3 for adhoc 2013_to_2020 */
, source_table_code TINYINT NOT NULL
,
PRIMARY KEY(
snz_ird_uid
, return_period_date
)
);
/* Combine the source tables into one */
INSERT INTO #oiCombinedIR3
SELECT con.snz_uid AS snz_uid
, con.snz_ird_uid AS snz_ird_uid
, ir_ir3_location_nbr AS location_nbr
, ir_ir3_return_period_date AS return_period_date
, ir_ir3_snz_unique_nbr AS snz_unique_nbr
, ir_ir3_return_version_nbr AS return_version_nbr
, ir_ir3_income_imp_ind AS income_imp_ind
, ir_ir3_overseas_income_amt AS overseas_income_amt
, ir_ir3_other_income_amt AS other_income_amt
, ISNULL(ir_ir3_overseas_income_amt,0)+ ISNULL(ir_ir3_other_income_amt,0) AS sum_all_income_amt
, ir_ir3_ird_timestamp_date AS ird_timestamp_date
, source_table_code
FROM (
SELECT *
, ROW_NUMBER() OVER (
PARTITION BY snz_ird_uid, ir_ir3_return_period_date
ORDER BY ir_ir3_return_version_nbr DESC, ir_ir3_ird_timestamp_date DESC, source_table_code ASC, ir_ir3_snz_unique_nbr DESC, ir_ir3_location_nbr DESC) AS row_nbr
FROM (
SELECT snz_ird_uid
, ir_ir3_location_nbr
, ir_ir3_return_period_date
, ir_ir3_snz_unique_nbr
, ir_ir3_return_version_nbr
, ir_ir3_income_imp_ind
, ir_ir3_overseas_income_amt
, ir_ir3_other_income_amt
, ir_ir3_ird_timestamp_date
, 1 AS source_table_code
FROM $(idicleanversion).ir_clean.ird_rtns_keypoints_ir3
UNION ALL
SELECT snz_ird_uid
, ir_ir3_location_nbr
, ir_ir3_return_period_date
, ir_ir3_snz_unique_nbr
, ir_ir3_return_version_nbr
, ir_ir3_income_imp_ind
, ir_ir3_overseas_income_amt
, ir_ir3_other_income_amt
, ir_ir3_ird_timestamp_date
, 2 AS source_table_code
FROM IDI_Adhoc.clean_read_IR.ir_ir3_2000_to_2014
UNION ALL
SELECT snz_ird_uid
, location_number AS ir_ir3_location_nbr
, period AS ir_ir3_return_period_date
, ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, period ORDER BY return_version, overseas_income DESC, other_income DESC) AS ir_ir3_snz_unique_nbr
, return_version AS ir_ir3_return_version_nbr
, 'N' AS ir_ir3_income_imp_ind
, overseas_income AS ir_ir3_overseas_income_amt
, other_income AS ir_ir3_other_income_amt
, CAST(timestamp AS DATE) AS ir_ir3_ird_timestamp_date
, 3 AS source_table_code
FROM IDI_Adhoc.clean_read_IR.ir_ir3_2013_to_2020
) tu
) t5
INNER JOIN $(idicleanversion).security.concordance con
ON con.snz_ird_uid = t5.snz_ird_uid
WHERE row_nbr = 1
AND YEAR(ir_ir3_return_period_date) BETWEEN 2000 AND YEAR(GETDATE());
RAISERROR(
'Inserted %d rows FROM IR3 INTO #oiCombinedIR3'
, 0
, 1
, @@ROWCOUNT
)
WITH NOWAIT;
GO
/*
This script creates #IncomeTaxYr table
Only annual data stored from #oiCombinedIR3
*/
/* Create the equivalent of the data.income_tax_yr table */
DROP TABLE IF EXISTS #oiIncomeTaxYr;
CREATE TABLE #oiIncomeTaxYr(
year_nbr DATE NOT NULL
, snz_uid INT NOT NULL
, snz_ird_uid INT NOT NULL
, snz_employer_ird_uid INT NOT NULL
, income_source_code CHAR(7) NOT NULL
, withholding_type_code CHAR(1) NOT NULL
, tot_yr_amt DECIMAL(19,2) NOT NULL
);
/*
Add self employed income from IR3 to tax_yr table - Overseas income
Adds the (SEA) - Overseas income from IR3
Rules: (A) IR3 overseas_income_amt <> 0
*/
INSERT INTO #oiIncomeTaxYr
WITH(
TABLOCK
)(
year_nbr
, snz_uid
, snz_ird_uid
, snz_employer_ird_uid
, income_source_code
, withholding_type_code
, tot_yr_amt
)
SELECT (ir3.return_period_date) AS year_nbr
, MIN(ir3.snz_uid) AS snz_uid
, MIN(ir3.snz_ird_uid) AS snz_ird_uid
, MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid
, 'SEA-IR3' AS income_source_code
, 'X' AS withholding_type_code
, SUM(ir3.overseas_income_amt) AS tot_yr_amt
FROM #oiCombinedIR3 ir3
WHERE ir3.overseas_income_amt <> 0
AND YEAR(ir3.return_period_date) BETWEEN 2000 AND YEAR(GETDATE())
GROUP BY (ir3.return_period_date)
, ir3.snz_ird_uid;
RAISERROR(
'Inserted %d rows for SEA-IR3'
, 0
, 1
, @@ROWCOUNT
)
WITH NOWAIT;
GO
/*
Add self employed income from IR3 to tax_yr table - Other income
Adds the (OTH) - Other income from IR3
Rules: (A) IR3 other_income_amt <> 0
*/
INSERT INTO #oiIncomeTaxYr
WITH(
TABLOCK
)(
year_nbr
, snz_uid
, snz_ird_uid
, snz_employer_ird_uid
, income_source_code
, withholding_type_code
, tot_yr_amt
)
SELECT(
ir3.return_period_date
) AS year_nbr
, MIN(ir3.snz_uid) AS snz_uid
, MIN(ir3.snz_ird_uid) AS snz_ird_uid
, MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid
, 'OTH-IR3' AS income_source_code
, 'X' AS withholding_type_code
, SUM(ir3.other_income_amt) AS tot_yr_amt
FROM #oiCombinedIR3 ir3
WHERE ir3.other_income_amt <> 0
AND YEAR(
ir3.return_period_date
) BETWEEN 2000 AND YEAR(GETDATE())
GROUP BY(
ir3.return_period_date
)
, ir3.snz_ird_uid;
RAISERROR(
'Inserted %d rows for OTH-IR3'
, 0
, 1
, @@ROWCOUNT
)
WITH NOWAIT;
GO
/*
Update the #oiIncomeTaxYr to aggregate income sources
We don't care about the employer or withholding type code
So create a final table that aggregates on these
*/
DROP TABLE IF EXISTS #AggregateIR3Income;
CREATE TABLE #AggregateIR3Income(
year_nbr DATE NOT NULL
, snz_uid INT NOT NULL
, income_source_code CHAR(7) NOT NULL
, tot_yr_amt DECIMAL(19,2) NOT NULL
);
INSERT INTO #AggregateIR3Income
WITH(
TABLOCK
)(
year_nbr
, snz_uid
, income_source_code
, tot_yr_amt
)
SELECT year_nbr
, MIN(snz_uid) AS snz_uid
, income_source_code
, SUM(tot_yr_amt) AS tot_yr_amt
FROM #oiIncomeTaxYr
WHERE snz_uid IS NOT NULL
GROUP BY year_nbr
, snz_uid
, income_source_code;
RAISERROR(
'Inserted %d rows WHEN aggregating'
, 0
, 1
, @@ROWCOUNT
)
WITH NOWAIT;
GO
/*
Create a Sandpit table using #AggregateIR3Income
No filters needed as all data is overseas and other income
Note: data stored as tax year and annual. No monthy data.
*/
DROP TABLE IF EXISTS [$(targetdb)].[$(targetschema)].[$(projprefix)_other_income];
CREATE TABLE [$(targetdb)].[$(targetschema)].[$(projprefix)_other_income](
snz_uid INT NOT NULL
, period_start_date DATE NOT NULL
, period_end_date DATE NOT NULL
, source_frequency CHAR(7) NOT NULL
, income_source CHAR(7) NOT NULL
, taxable_income_indicator TINYINT NOT NULL
, gross_income DECIMAL(19,2) NULL
);
INSERT INTO [$(targetdb)].[$(targetschema)].[$(projprefix)_other_income]
SELECT snz_uid
, DATEFROMPARTS(YEAR(year_nbr)-1, MONTH(DATEADD(MONTH,1,year_nbr)), 01) AS 'period_start_date'
, year_nbr AS 'period_end_date'
, 'annual' AS source_frequency
, income_source_code AS income_source
, 1 AS taxable_income_indicator
, tot_yr_amt AS gross_income
FROM #AggregateIR3Income
WHERE tot_yr_amt <> 0;
RAISERROR(
'Inserted %d rows INTO final TABLE IN Sandpit'
, 0
, 1
, @@ROWCOUNT
)
WITH NOWAIT;
GO
ALTER TABLE $(targetdb).$(targetschema).$(projprefix)_other_income REBUILD PARTITION = ALL WITH (DATA_COMPRESSION=PAGE);
DROP TABLE IF EXISTS #oiCombinedIR3;
DROP TABLE IF EXISTS #oiIncomeTaxYr;
DROP TABLE IF EXISTS #AggregateIR3Income;