Total income module collection: Other income

cori.qian
24 June 2024

Outputs:

SQL: [IDI_Community].[inc_other_income].[other_income_202603]
SAS: libname cm_other_income dsn=IDI_community_srvprd schema=inc_other_income; run ;
How to access a code module in the Data Lab: Read here

Inputs:

Dependency Dependency Type
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2000_to_2014] Source
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2013_to_2020] Source
[IDI_Clean_202603].[ir_clean].[ird_rtns_keypoints_ir3] Source
[IDI_Clean_202603].[security].[concordance] Source

Other income module

Module output

SQL:[IDI_Community].[cm_read_other_income].other_income_YYYYMM
SAS: libname cmti ODBC dsn=IDI_community_srvprd schema=cm_read_other_income; proc print data = cmti.other_income_YYYYMM
How to access a code module in the Data Lab : Read here

Purpose of Other income module

The purpose of this module is to construct the other income for people in New Zealand and their relevant income spells.

Within this module, other income includes:

Key concepts

Other income

Other income includes overseas income and other sources of personal income from sources not defined under investments. “Other sources of personal income” include any sources of income not reported elsewhere. These may include taxable sales of shares or property, financial arrangement income, cash jobs, barter transactions, illegal income, and employee share scheme income if not otherwise reported by the employer (Refer IR3G pages 39-41)

Other income does not include social transfers in kind such as public education or government-subsidised health care services. Also excluded are child support payments, reimbursement of expenses, money received from borrowing, contingent income, and unrealised income. Irregular payments such as lump sum inheritance payments are excluded.

Total income spell

A total income spell is the minimum period per income type. Other income in practice may be earned in a daily, weekly, fortnightly, monthly etc.

Source frequency

Source frequency is the frequency with which an income type is stored in the data. Source frequency for other income is annual.

Gross income amount

Gross income amount is the taxable income before tax that a person receives.

Observed tax

Tax recorded as paid to IRD in the IDI. This is the amount of tax that has been has been deducted from source (eg PAYE and tax filing). Observed tax data for other income can be sourced from IR3 table.

Note: observed tax deduction is not the same as actual tax liability since this can only be calculated once all income sources have been identified over a tax period. This information has not been included in the initial version of the module.

Granularity

The modules are setup to be event format: One row per event, events have a person, start and end dates, and descriptive information. The variables below will uniquely identify each event in the above table:

Entity id: snz uid
Period: start end, end date
Event information: income source, taxable income indicator

Data sources

IR3 :ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014, ir_ir3_2013_to_2020

IR3 is the income tax return for individuals used to confirm the amount of personal income tax to be paid at the end of each tax year with data available from March 2000 to present. The main purpose of IR3 is to include any monetary payments that have not been taxed at source.

IR3 is used for self-employment (filed annually by sole traders) and includes non-zero partnership, or shareholder salary income, as well as rental income. IR3 is mandatory for individuals who earned more than $200 (before tax) income other than salary, wages, interest, dividends, and/or taxable Maori authority distributions. IR3 is filed annually.

Starting from July 2017, IR has extended the supply of IR3 information to the IDI to include further information about investment income, including earnings from interest, dividend, estate trust, overseas, and other sources. Also included are non-taxable income from a range of tax credit entitlements, tax rebate, and student loan-liable income.

There may be a substantial lag in reporting IR3 data as individual businesses and those using tax agents have up to a year after the end of the financial year to file an annual tax return with IR.

Note: 3 IR2 tables (2 ad-hoc tables and 1 clean) are required for creating the timeseries of IR3 data. IR3s belonging to people who reported no self-employment, partnership or shareholder income were excluded for tax years prior to 2019. They are in the ad-hoc tables.

Practical notes

Period

The other income information is from 1999 onwards.

Negative amounts

No exclusions made for negative amounts.

Staged approach to include tax within the total income module

Stage 1 of module creation will only include gross income amounts. Stage 2 (pending) will include observed tax deducted at source since they are available in EMS and MSD tables. This can be used to calculate the tax at a tax year level which includes tax deducted at source and from tax filings. Stage 3 (pending) aims to derive tax at a monthly level. This is complex and will require support from SME to apply rules to derive tax. This can be used to calculate tax for any time period.

Sources of Validation

No validation dataset available, hence unable to validate the module output.

References

  1. Document : experimental-income-estimates-from-linked-admin-data-methods-and-results-revised.pdf https://www.stats.govt.nz/experimental/experimental-estimates-of-income-from-linked-administrative-data/
  2. My income is not taxed before I get paid My income is not taxed before I get paid

Key Contacts

Role Organisation Person
Dev lead MBIE Eric Krassoi Peach
Module Coder Nicholson Consulting Linda Martis
Code Supply Stats NZ
MSD
Ivan Welsh
Marc de Boer
Peer review (code) Stats NZ
Stats NZ
Stats NZ
Michael Challands
Dean Edwards
Eriko Kamikubo-Gould
Peer review (documentation) IRD
IRD
Joanne Butterfield
Scott Anderson
Module Steward TBD TBD

Module business rules

Defining spell start and end dates

Start and end dates are based on IR3 dates. The return period column in the tables is considered to be the end date. Start will be that start of the annual tax year (1-Apr).

Person identifier snz_uid

A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Minimum income threshold

Currently no minimum threshold applied.

Interoperability with other modules

This module is in the business landscape so could be that it is most closely associated with business profile modules and employment spells. It is also in the socioeconomic landscape where income is a determinant of positive social outcomes, so associated with social housing, education modules, health modules, driving modules, employment assistance modules etc.

Taxable income indicator

Taxable income indicator to be hard coded in modules. Within other income module, all income is taxable.

Code logic

The code is based on the rules used for APC and derived data tables in the IDI.

  1. Identify individuals earning overseas income and other income from 3 IR3 tables. To do so, we will
    a. Extract data from IR3 tables, specifically ir_clean.ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014 and ir_ir3_2013_to_2020 from IDI_Adhoc. Table prioritisation will occur with ir_clean.ird_rtns_keypoints_ir3 being the first source of data, followed by adhoc 2000_to_2014 and adhoc 2013_to_2020.
    b. We will use data which contains overseas income and other income amounts.
    c. Remove duplicates in tables by using maximum of location number, return_version_number and ird_timestamp_date.
  2. Using the data obtained above in point 1,
    a. aggregate overseas income by person’s unique identifiers and time period. Relabel income as SEA-IR3.
    b. aggregate other income by person’s unique identifiers and time period. Relabel income as OTH-IR3.
  3. To summarise, all the labelled income above becomes other income.

Parameters

The following parameters should be supplied to this module to run it in the database:

  1. {targetdb}: The SQL database on which the spell datasets are to be created.
  2. {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
  3. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  4. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.

Variable Descriptions

Aspect Variables Description
Entity snz_uid Current refresh snz_uid
Period period_start_date
period_end_date
Payment start date
Payment end date
Event source_frequency Frequency of income recorded in source data
Note that Other income is recorded in IR3 as annual income
Information income_source Type of income being derived in this module
Note that:
- SEA-IR3 is overseas income from IR3 tables
- OTH-IR3 is other income from IR3 tables
taxable_income_indicator Indicator for whether income is taxable
Note that:
1:income is taxable
0:income is not taxable
Other income is taxable
gross_income Income before tax

Module Version & Change History

Date Version Comments
2023-07 Original code sourced from APC code created by Stats NZ (Ivan Welsh) in the IDI
2024-06 Initial streamlining and preparation for code module by Linda Martis, Nicholson Consulting
2024-10 Namespace Namespace clash iwth investment income module for temp tables #CombinedIR3 and #IncomeTaxYr changed to #otCombinedIR3 and #otIncomeTaxYr

Code

:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"

/* Assign the target database to which all the components need to be created in. */
use $(targetdb);

/*
This script combines a number of different source of IR3 data into a single table.
Sources:IR3 table, adhoc 2000_to_2014, adhoc 2013_to_2020

Duplicates are removed from the data by filtering for the maximum return_version_nbr,
ird_timestamp_date, source_code, snz_unique_nbr and finally location_nbr.

Create the storage table. This follows the format of the refresh IR3 table,
but without the leading ir_ir3_ prefix.
*/

DROP TABLE IF EXISTS #oiCombinedIR3;
CREATE TABLE #oiCombinedIR3(
    snz_uid INT NOT NULL
        , snz_ird_uid INT NOT NULL
        , location_nbr INT NOT NULL
        , return_period_date DATE NOT NULL
        , snz_unique_nbr BIGINT NOT NULL
        , return_version_nbr SMALLINT NULL
        , income_imp_ind VARCHAR(1) NOT NULL
        , overseas_income_amt DECIMAL(13,2) NULL
        , other_income_amt DECIMAL(13,2) NULL
        , sum_all_income_amt DECIMAL(13,2) NULL
        , ird_timestamp_date DATE NULL
	/* 1 for refresh IR3 table, 2 for adhoc 2000_to_2014, 3 for adhoc 2013_to_2020 */
        , source_table_code TINYINT NOT NULL
        , 
    PRIMARY KEY(
        snz_ird_uid
            , return_period_date
    )
);

/* Combine the source tables into one */
INSERT INTO #oiCombinedIR3
SELECT con.snz_uid AS snz_uid
    , con.snz_ird_uid AS snz_ird_uid
    , ir_ir3_location_nbr AS location_nbr
    , ir_ir3_return_period_date AS return_period_date
    , ir_ir3_snz_unique_nbr AS snz_unique_nbr
    , ir_ir3_return_version_nbr AS return_version_nbr
    , ir_ir3_income_imp_ind AS income_imp_ind
    , ir_ir3_overseas_income_amt AS overseas_income_amt
    , ir_ir3_other_income_amt AS other_income_amt
    , ISNULL(ir_ir3_overseas_income_amt,0)+ ISNULL(ir_ir3_other_income_amt,0) AS sum_all_income_amt
    , ir_ir3_ird_timestamp_date AS ird_timestamp_date
    , source_table_code
FROM (
    SELECT *
        , ROW_NUMBER() OVER (
			PARTITION BY snz_ird_uid, ir_ir3_return_period_date 
			ORDER BY ir_ir3_return_version_nbr DESC, ir_ir3_ird_timestamp_date DESC, source_table_code ASC, ir_ir3_snz_unique_nbr DESC, ir_ir3_location_nbr DESC) AS row_nbr
    FROM (
        SELECT snz_ird_uid
            , ir_ir3_location_nbr
            , ir_ir3_return_period_date
            , ir_ir3_snz_unique_nbr
            , ir_ir3_return_version_nbr
            , ir_ir3_income_imp_ind
            , ir_ir3_overseas_income_amt
            , ir_ir3_other_income_amt
            , ir_ir3_ird_timestamp_date
            , 1 AS source_table_code
        FROM $(idicleanversion).ir_clean.ird_rtns_keypoints_ir3
        UNION ALL
        SELECT snz_ird_uid
            , ir_ir3_location_nbr
            , ir_ir3_return_period_date
            , ir_ir3_snz_unique_nbr
            , ir_ir3_return_version_nbr
            , ir_ir3_income_imp_ind
            , ir_ir3_overseas_income_amt
            , ir_ir3_other_income_amt
            , ir_ir3_ird_timestamp_date
            , 2 AS source_table_code
        FROM IDI_Adhoc.clean_read_IR.ir_ir3_2000_to_2014
        UNION ALL
        SELECT snz_ird_uid
            , location_number AS ir_ir3_location_nbr
            , period AS ir_ir3_return_period_date
            , ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, period ORDER BY return_version, overseas_income DESC, other_income DESC) AS ir_ir3_snz_unique_nbr
            , return_version AS ir_ir3_return_version_nbr
            , 'N' AS ir_ir3_income_imp_ind
            , overseas_income AS ir_ir3_overseas_income_amt
            , other_income AS ir_ir3_other_income_amt
            , CAST(timestamp AS DATE) AS ir_ir3_ird_timestamp_date
            , 3 AS source_table_code
        FROM IDI_Adhoc.clean_read_IR.ir_ir3_2013_to_2020
    ) tu
) t5
INNER JOIN $(idicleanversion).security.concordance con
ON con.snz_ird_uid = t5.snz_ird_uid
WHERE row_nbr = 1
AND YEAR(ir_ir3_return_period_date) BETWEEN 2000 AND YEAR(GETDATE());
 RAISERROR(
    'Inserted %d rows FROM IR3 INTO #oiCombinedIR3'
        , 0
        , 1
        , @@ROWCOUNT
)
WITH NOWAIT;
GO

/*
This script creates #IncomeTaxYr table
Only annual data stored from #oiCombinedIR3
*/

/* Create the equivalent of the data.income_tax_yr table */
DROP TABLE IF EXISTS #oiIncomeTaxYr;
CREATE TABLE #oiIncomeTaxYr(
    year_nbr DATE NOT NULL
        , snz_uid INT NOT NULL
        , snz_ird_uid INT NOT NULL
        , snz_employer_ird_uid INT NOT NULL
        , income_source_code CHAR(7) NOT NULL
        , withholding_type_code CHAR(1) NOT NULL
        , tot_yr_amt DECIMAL(19,2) NOT NULL
);

/*
Add self employed income from IR3 to tax_yr table - Overseas income
Adds the (SEA) - Overseas income from IR3
Rules: (A) IR3 overseas_income_amt <> 0
*/
INSERT INTO #oiIncomeTaxYr
WITH(
    TABLOCK
)(
    year_nbr
        , snz_uid
        , snz_ird_uid
        , snz_employer_ird_uid
        , income_source_code
        , withholding_type_code
        , tot_yr_amt
)
SELECT (ir3.return_period_date) AS year_nbr
    , MIN(ir3.snz_uid) AS snz_uid
    , MIN(ir3.snz_ird_uid) AS snz_ird_uid
    , MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid
    , 'SEA-IR3' AS income_source_code
    , 'X' AS withholding_type_code
    , SUM(ir3.overseas_income_amt) AS tot_yr_amt
FROM #oiCombinedIR3 ir3
WHERE ir3.overseas_income_amt <> 0
AND YEAR(ir3.return_period_date) BETWEEN 2000 AND YEAR(GETDATE())
GROUP BY (ir3.return_period_date)
    , ir3.snz_ird_uid;
 RAISERROR(
    'Inserted %d rows for SEA-IR3'
        , 0
        , 1
        , @@ROWCOUNT
)
WITH NOWAIT;
GO

/*
Add self employed income from IR3 to tax_yr table - Other income
Adds the (OTH) - Other income from IR3
Rules: (A) IR3 other_income_amt <> 0
*/
INSERT INTO #oiIncomeTaxYr
WITH(
    TABLOCK
)(
    year_nbr
        , snz_uid
        , snz_ird_uid
        , snz_employer_ird_uid
        , income_source_code
        , withholding_type_code
        , tot_yr_amt
)
SELECT(
    ir3.return_period_date
) AS year_nbr
    , MIN(ir3.snz_uid) AS snz_uid
    , MIN(ir3.snz_ird_uid) AS snz_ird_uid
    , MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid
    , 'OTH-IR3' AS income_source_code
    , 'X' AS withholding_type_code
    , SUM(ir3.other_income_amt) AS tot_yr_amt
FROM #oiCombinedIR3 ir3
WHERE ir3.other_income_amt <> 0
AND YEAR(
    ir3.return_period_date
) BETWEEN 2000 AND YEAR(GETDATE())
GROUP BY(
    ir3.return_period_date
)
    , ir3.snz_ird_uid;
 RAISERROR(
    'Inserted %d rows for OTH-IR3'
        , 0
        , 1
        , @@ROWCOUNT
)
WITH NOWAIT;
GO

/*
Update the #oiIncomeTaxYr to aggregate income sources
We don't care about the employer or withholding type code
So create a final table that aggregates on these
*/

DROP TABLE IF EXISTS #AggregateIR3Income;
CREATE TABLE #AggregateIR3Income(
    year_nbr DATE NOT NULL
        , snz_uid INT NOT NULL
        , income_source_code CHAR(7) NOT NULL
        , tot_yr_amt DECIMAL(19,2) NOT NULL
);

INSERT INTO #AggregateIR3Income
WITH(
    TABLOCK
)(
    year_nbr
        , snz_uid
        , income_source_code
        , tot_yr_amt
)
SELECT year_nbr
    , MIN(snz_uid) AS snz_uid
    , income_source_code
    , SUM(tot_yr_amt) AS tot_yr_amt
FROM #oiIncomeTaxYr
WHERE snz_uid IS NOT NULL
GROUP BY year_nbr
    , snz_uid
    , income_source_code;
 RAISERROR(
    'Inserted %d rows WHEN aggregating'
        , 0
        , 1
        , @@ROWCOUNT
)
WITH NOWAIT;
GO

/*
Create a Sandpit table using #AggregateIR3Income 
No filters needed as all data is overseas and other income
Note: data stored as tax year and annual. No monthy data.
*/

DROP TABLE IF EXISTS [$(targetdb)].[$(targetschema)].[$(projprefix)_other_income];
CREATE TABLE [$(targetdb)].[$(targetschema)].[$(projprefix)_other_income](
    snz_uid INT NOT NULL
        , period_start_date DATE NOT NULL
        , period_end_date DATE NOT NULL
        , source_frequency CHAR(7) NOT NULL
        , income_source CHAR(7) NOT NULL
        , taxable_income_indicator TINYINT NOT NULL
        , gross_income DECIMAL(19,2) NULL
);

INSERT INTO [$(targetdb)].[$(targetschema)].[$(projprefix)_other_income]

SELECT snz_uid
    , DATEFROMPARTS(YEAR(year_nbr)-1, MONTH(DATEADD(MONTH,1,year_nbr)), 01) AS 'period_start_date'
    , year_nbr AS 'period_end_date'
    , 'annual' AS source_frequency
    , income_source_code AS income_source
    , 1 AS taxable_income_indicator
    , tot_yr_amt AS gross_income
FROM #AggregateIR3Income
WHERE tot_yr_amt <> 0;
 RAISERROR(
    'Inserted %d rows INTO final TABLE IN Sandpit'
        , 0
        , 1
        , @@ROWCOUNT
)
WITH NOWAIT;
GO

ALTER TABLE $(targetdb).$(targetschema).$(projprefix)_other_income REBUILD PARTITION = ALL WITH (DATA_COMPRESSION=PAGE);

DROP TABLE IF EXISTS #oiCombinedIR3;
DROP TABLE IF EXISTS #oiIncomeTaxYr;
DROP TABLE IF EXISTS #AggregateIR3Income;