Module output
SQL::[IDI_Community].[inc_total_income].total_income_YYYYMM
SAS: libname cmti ODBC dsn=idi_community_srvprd schema=inc_total_income; proc print data = cmti.total_income_YYYYMM
How to access a code module in the Data Lab : Read here
This script creates the final combined table for the Total Income Module. The table is called Tier-2 Total Income. It is a combination of 9 Tier-1 Total income modules.
Purpose of the module
The purpose of this module is to calculate the total personal income for people in New Zealand by income source, and when they received it.
This can be used to produce estimates of the total income of people in any monthly or annual time period. Personal income is inferred from the following income sources available in the IDI from both the taxation and benefit system: wages and salary, income support payments, tax credits, ACC compensation, student allowance, paid parental leave, self-employment, investments other income.
For more documentation and code for each income source listed above, refer to Tier-1 modules in Commons
Key concepts
Income spell
An income spell is the minimum payment period per income type. Payment for each income type can vary depending on the income type – as daily, weekly, fortnightly, monthly, annual.
Source frequency
Source frequency is the frequency with which an income type is stored in the source data. Source frequency from source data for multiple income types can vary from monthly, annually, daily spells to multi-year spells.
Gross income amount
Gross income amount is the taxable income, before tax, that a person earns.
Observed tax
The amount of tax that has been deducted from source (eg: PAYE and tax filing). Note that observed tax deduction is not the same as actual tax liability since this can only be calculated once all income sources have been identified over a tax period. The information has not been included in the initial version of the module.
Granularity
The modules are setup to be event format: One row per event, events have a person, start and end dates, and descriptive information. The variables below will uniquely identify each event:
Entity id: snz uid
Period: start end and end date
Event information: income source, taxable income indicator
Data sources in the IDI
[IDI_Clean_YYYYMM].[ir_clean].[ird_ems]
[IDI_Clean_YYYYMM].[ir_clean].[ird_rtns_keypoints_ir3]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2000_to_2014]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2013_to_2020]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir4s]
[IDI_Clean_YYYYMM].[ir_clean].[ird_attachments_ir20]
[IDI_Clean_YYYYMM].[ir_clean].[ird_customers]
[IDI_Clean_YYYYMM].[ir_clean].[ird_tax_registrations]
[IDI_Clean_YYYYMM].[ir_clean].[ird_cross_reference]
[IDI_Clean_YYYYMM].[ir_clean].[ird_pts]
[IDI_Clean_YYYYMM].[ir_clean].[ird_autocalc_information]
[IDI_Clean_YYYYMM].[wff_clean].[fam_return_dtls]
[IDI_Clean_YYYYMM].[msd_clean].[msd_first_tier_expenditure]
[IDI_Clean_YYYYMM].[msd_clean].[msd_second_tier_expenditure]
[IDI_Clean_YYYYMM].[msd_clean].[msd_third_tier_expenditure]
For more details about how the underlying data is collected and transformed from source agency into IDI, and differences in data if any, refer to individual Tier-1 modules
Practical notes
Exclusions from this module
We are not able to identify and quantify income support payment off-sets, debt repayments and re-directions.
Negative amounts
No exclusions made for negative amounts.
Seasonal workers identification
Not applicable total income module. Refer to employment spells module for more information.
Staged approach to include tax within the total income module
Stage 1 of module creation will only include gross income amounts. Stage 2 (pending) will include observed tax deducted at source since they are available in EMS and MSD tables. This can be used to calculate the tax at a tax year level which includes tax deducted at source and from tax filings. Stage 3 (pending) aims to derive tax at a monthly level. This is complex and will require support from SME to apply rules to derive tax. This can be used to calculate tax for any time period.
Sources of validation
Validation is done at a Tier-1 module level
References
- IDI Derived tables https://vhin.co.nz/wp-content/uploads/2017/04/data-dictionary-for-ir-derived-tables-in-the-idi.docx
- Document : experimental-income-estimates-from-linked-admin-data-methods-and-results-revised.pdf https://www.stats.govt.nz/experimental/experimental-estimates-of-income-from-linked-administrative-data/
Key Contacts
| Role | Organisation | Person |
|---|---|---|
| Dev lead | MBIE | Eric Krassoi Peach |
| Module Coder | Nicholson Consulting | Linda Martis |
| Peer review (code) | Independent | Vinay Benny |
| Module Steward | Stats NZ | Cori Qian |
Module business rules
Defining spell start and end dates
Start and end dates are based on Tier-1 income tables. The start date will always be first day of the month, and the end date will be the last day of the month.
Person identifier snz_uid
A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.
Minimum income threshold
Currently no minimum threshold applied.
Interoperability with other modules
This module is in the business landscape so could be that it is most closely associated with business profile modules and employment spells. It is also in the socioeconomic landscape where income is a determinant of positive social outcomes, so associated with social housing, education modules, health modules, driving modules, employment assistance modules etc.
Taxable income indicator
Taxable income indicator indicates whether an income type is taxable or not. This comes directly from the Tier-1 modules.
Code logic:
9 Tier-1 modules are merged into one ‘Tier 2 Total Income module’.
The income within each module will be grouped for each snz_uid and start and end dates.
- Grouping of start and end dates will be based on whether an income type is earned for the same time period. Eg: if an individual earns investment income from multiple sources with the same tax year, their investment incomes will be summed up for those common periods.
- In the case of self-employed income, some individuals can have both monthly and annual (tax year) income for a given tax year. In such cases, their self-employment income will be summed up at a tax year level. However, if they only earn monthly self-employment income, we will sum up similar months only.
- In the case of income support payments, all spells periods (which can vary from one day to multi-year periods) are broken down to months. If payments are received for consecutive full months, those full months are combined and their incomes summed (this approach helps reduce table storage space).
Parameters
The following parameters should be supplied to this module to run it in the database:
- {targetdb}: The SQL database on which the spell datasets are to be created.
- {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
- {targetschema}: The project schema under the target database into which the spell datasets are to be created.
- {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.
Dependencies
All datasets mentioned above in ‘Data source in the IDI’ section
Module dependencies:
[IDI_Community].[cm_read_{projprefix}_wages_salary].[{projprefix}_wages_salary_yyyymm]
[IDI_Community].[cm_read_{projprefix}_acc_income].[{projprefix}_acc_income_yyyymm]
[IDI_Community].[cm_read_{projprefix}_paid_parental].[{projprefix}_ paid_parental_yyyymm]
[IDI_Community].[cm_read_{projprefix}_stu_allowance].[{projprefix}_ stu_allowance_yyyymm]
[IDI_Community].[cm_read_{projprefix}_self_employment].[{projprefix}_self_employment_yyyymm]
[IDI_Community].[cm_read_{projprefix}_tax_credits].[{projprefix}_tax_credits_yyyymm]
[IDI_Community].[cm_read_{projprefix}_inc_support_paymt].[{projprefix}_inc_support_paymt_yyyymm]
[IDI_Community].[cm_read_{projprefix}_investments].[{projprefix}_investments_yyyymm]
[IDI_Community].[cm_read_{projprefix}_other_income].[{projprefix}_other_income_yyyymm]
Outputs
{targetdb}.{targetschema}.{projprefix}_total_income_{refresh}
Variable Descriptions
The business key for this table is one row per snz_uid, start_date, end_date, income_source and taxable_income_indicator
| Aspect | Variables | Description |
|---|---|---|
| Entity | snz_uid | Current refresh snz_uid |
| Period | period_start_date period_end_date | Income month start date Income month end date |
| Event | income_source | Name of income type Tier-1 module the data is coming from |
| Information | taxable_income_indicator | Indicator for whether income is taxable Note that: 1:income is taxable 0:income is not taxable |
| gross_income | Income before tax |
Module Version & Change History
| Date | Version | Comments |
|---|---|---|
| 2024-06 | Initial | Initial version created by Linda Martis, Nicholson Consulting |
Code
:setvar targetdbtable "{targetdbtable}"
:setvar targetdbview "{targetdbview}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
:setvar idimetadataversion "{idimetaversion}"
/* Assign the target database to which all the components need to be created in. */
use $(targetdb);
Create temp tables for all 9 tier-1 modules. These temp tables will be merged to create the total income tier-2 module
Create temp table #merge_7_tier1_tables for 7 tier-1 modules with simple structure:
Aggregating incomes in each module using start date and end date.
The dates are always beginning or end of month, hence easy to group.
Each of these modules are either monthly or annual, but not both.
Each of these modules are either taxable or not taxable income, but not both
7 Tier-1 Modules are:
Wages and Salary,
ACC income,
Paid parental leave,
Student Allowance,
Investments,
Other income,
Tax credits
DROP TABLE IF EXISTS #merge_7_tier1_tables;
WITH union_tables AS (
SELECT [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date, EOMONTH([period_end_date]) AS period_end_date, SUM([gross_income]) AS gross_income, [taxable_income_indicator]
, 'Wages and Salary' AS income_source /* all source data is monthly */
FROM $(targetdbtable).$(targetschema).[income_t1_wages_salary]
GROUP BY [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1), EOMONTH([period_end_date]), [taxable_income_indicator]
UNION ALL
SELECT [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date, EOMONTH([period_end_date]) AS period_end_date, SUM([gross_income]) AS gross_income, [taxable_income_indicator]
, 'ACC Income' AS income_source /* all source data is monthly */
FROM $(targetdbview).$(targetschema).[income_t1_acc_income]
GROUP BY [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1), EOMONTH([period_end_date]), [taxable_income_indicator]
UNION ALL
SELECT [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date, EOMONTH([period_end_date]) AS period_end_date, SUM([gross_income]) AS gross_income, [taxable_income_indicator]
, 'Paid Parental Leave' AS income_source /* all source data is monthly */
FROM $(targetdbview).$(targetschema).[income_t1_paid_parental]
GROUP BY [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1), EOMONTH([period_end_date]), [taxable_income_indicator]
UNION ALL
SELECT [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date, EOMONTH([period_end_date]) AS period_end_date, SUM([gross_income]) AS gross_income, [taxable_income_indicator]
, 'Student Allowance' AS income_source /* all source data is monthly */
FROM $(targetdbview).$(targetschema).[income_t1_stu_allowance]
GROUP BY [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1), EOMONTH([period_end_date]), [taxable_income_indicator]
UNION ALL
SELECT [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date, EOMONTH([period_end_date]) AS period_end_date, SUM([gross_income]) AS gross_income, [taxable_income_indicator]
, 'Investments' AS income_source /* all source data is annual */
FROM $(targetdbtable).$(targetschema).[income_t1_investments]
GROUP BY [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1), EOMONTH([period_end_date]), [taxable_income_indicator]
UNION ALL
SELECT [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date, EOMONTH([period_end_date]) AS period_end_date, SUM([gross_income]) AS gross_income, [taxable_income_indicator]
, 'Other Income' AS income_source /* all source data is annual */
FROM $(targetdbtable).$(targetschema).[income_t1_other_income]
GROUP BY [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1), EOMONTH([period_end_date]), [taxable_income_indicator]
UNION ALL
SELECT [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date, EOMONTH([period_end_date]) AS period_end_date, SUM([gross_income]) AS gross_income, [taxable_income_indicator]
, 'Tax Credits' AS income_source /* all source data is annual */
FROM $(targetdbtable).$(targetschema).[income_t1_tax_credits]
GROUP BY [snz_uid], DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1), EOMONTH([period_end_date]), [taxable_income_indicator]
)
SELECT *
INTO #merge_7_tier1_tables
FROM union_tables;
RAISERROR('Created #merge_7_tier1_tables with %d ■■■■■■■■■■■, @@ROWCOUNT) WITH NOWAIT;
GO
Create temp table #selfEmployment_tier2 for self employment income:
The self-employment tier-1 module has multiple source frequency - both annual and monthly
COI decision made to aggregate data based on lowest common denominator of source frequency for a given tax year
This means that same start and end dates will be grouped to get total income
If a tax year contains an annual source of income, both monthly and annual incomes for that tax year will be grouped to get total income for tax year
The dates are always beginning or end of month
All self employment income is taxable
DROP TABLE IF EXISTS #selfEmployment_tier2;
/* all start and end dates are set to the start and end of the respective months. Adding tax year which will be used later as a grouping variable */
WITH create_tax_year AS (
SELECT [snz_uid]
, DATEFROMPARTS(YEAR([period_start_date]),MONTH([period_start_date]),1) AS period_start_date
, EOMONTH([period_end_date]) AS period_end_date
, CASE
WHEN MONTH([period_end_date]) between 4 and 12 THEN YEAR([period_end_date])+1
ELSE YEAR([period_end_date])
END AS tax_year
, [gross_income]
,[source_frequency]
FROM $(targetdbtable).$(targetschema).[income_t1_self_employment]
)
/* summing income where dates are same; adding merge indicator */
, create_merge_indicator AS (
SELECT snz_uid
, period_start_date
, period_end_date
, tax_year
, SUM(gross_income) AS gross_income
, source_frequency
, MAX(CASE WHEN source_frequency = 'annual' THEN 1 ELSE 0 END) OVER (PARTITION BY snz_uid, tax_year) as merge_indicator /* adding indicator that checks whether 'annual' exists within tax year. We do this since we sum income based on lowest common denominator. */
FROM create_tax_year
GROUP BY snz_uid, period_start_date, period_end_date, tax_year, source_frequency
)
/* where merge_indicator = 1, sum the gross income and assign new start , end dates */
, grouping_rows AS (
SELECT snz_uid
, DATEFROMPARTS(tax_year-1,4,1) AS period_start_date /* start date = 1/04/xxxx */
, DATEFROMPARTS(tax_year,3,31) AS period_end_date /* end date = 31/03/xxxx */
, SUM(gross_income) AS gross_income
FROM create_merge_indicator
WHERE merge_indicator = 1
GROUP BY snz_uid, tax_year, merge_indicator
UNION ALL /* merging data to bring all monthly and annual data (grouped) together */
SELECT snz_uid
,period_start_date
,period_end_date
,gross_income
FROM create_merge_indicator
WHERE merge_indicator = 0
)
SELECT snz_uid
,period_start_date
,period_end_date
,gross_income
,1 as taxable_income_indicator
, 'Self Employment' AS income_source
INTO #selfEmployment_tier2
FROM grouping_rows;
RAISERROR('Created #selfEmployment_tier2 with %d ■■■■■■■■■■■, @@ROWCOUNT) WITH NOWAIT;
GO
Create temp table #income_support_payment_tier2 for income support payments:
The income support payments tier-1 module has spells with varying number of days - from one day to multiple years
COI decision made to split spells into months so users can select months to arrive at income for any period of time
For the purposes of limiting the length of this table, we will group consecutive full months of income within a tax year
Income support payments has both taxable and non-taxable components - Main benefits and Retrement benefits are taxable, One off and supplementary benefits are not taxable
Create a months look up temp table
(creates month lookup table to split income support payment spells into monthly periods)
DROP TABLE IF EXISTS #mthlookup;
declare @startdate date='19900101';
declare @cutoffdate date=datefromparts(year(GETDATE())+1, 01, 01);
with seq(n) as
(select 0 union all select n+1 from seq
where n+1<DATEDIFF(MONTH,@startdate,@cutoffdate)),
d(d) as
(select dateadd(MONTH,n,@startdate) from seq),
dd as
(select startdate=convert(date,d),
enddate=EOMONTH(convert(date,d))
from d)
select *
into #mthlookup
from dd
option (maxrecursion 0);
RAISERROR('Inserted %d rows in #mthlookup', 0, 1, @@ROWCOUNT) WITH NOWAIT;
CREATE NONCLUSTERED INDEX IX_mthlookup ON #mthlookup (startdate, enddate);
GO
DROP TABLE IF EXISTS #create_unique_id;
/* add unique payment id. Used for grouping later */
SELECT [snz_uid]
,[period_start_date]
,[period_end_date]
,[taxable_income_indicator]
,[gross_income]
, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS payment_unique_id /* unique id created to identify each benefit. Will be used as grouping variable later */
INTO #create_unique_id
FROM $(targetdbview).$(targetschema).[income_t1_inc_support_paymt] /* all source data is spell based */;
RAISERROR('Inserted %d rows in #create_unique_id', 0, 1, @@ROWCOUNT) WITH NOWAIT;
CREATE NONCLUSTERED INDEX IX_isp ON #create_unique_id (period_start_date, period_end_date, payment_unique_id);
GO
DROP TABLE IF EXISTS #income_support_payment_tier2;
/* this breaks spells into months with one row per month */
WITH split_spells AS (
SELECT cid.[snz_uid]
, cid.period_start_date
, cid.period_end_date
, mth.startdate AS month_start_date
, mth.enddate AS month_end_date
, cid.[gross_income] AS total_gross_income
, cid.[taxable_income_indicator]
, cid.payment_unique_id
FROM #create_unique_id cid
INNER JOIN #mthlookup mth ON cid.period_start_date<=mth.enddate AND cid.period_end_date>=mth.startdate
)
/* add tax year - used for grouping later; add row numbers - used for partition later */
, add_variables AS (
SELECT snz_uid
, period_start_date
, period_end_date
, month_start_date
, month_end_date
, CASE
WHEN MONTH([month_end_date]) between 4 and 12 THEN YEAR([month_end_date])+1
ELSE YEAR([month_end_date])
END AS tax_year
, total_gross_income
, taxable_income_indicator
, payment_unique_id
, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS row_nbr
FROM split_spells
)
/* creating merge indicator for rows that have full months */
, create_merge_indicator AS (
SELECT *
,MAX(CASE WHEN (month_start_date >= period_start_date AND month_end_date <= period_end_date) THEN 1 ELSE 0 END) OVER (PARTITION BY row_nbr) as merge_indicator
FROM add_variables
)
/* merge rows that contain full month */
, merging AS (
SELECT snz_uid
, period_start_date
, period_end_date
, MIN(month_start_date) AS month_start_date /* the first date of the consecutive monthly periods */
, MAX(month_end_date) AS month_end_date /* the last date of the consecutive monthly periods */
, total_gross_income
, taxable_income_indicator
FROM create_merge_indicator
WHERE merge_indicator = 1
GROUP BY snz_uid, period_start_date, period_end_date, total_gross_income, taxable_income_indicator, payment_unique_id, tax_year
UNION ALL
SELECT snz_uid
, period_start_date
, period_end_date
, month_start_date
, month_end_date
, total_gross_income
, taxable_income_indicator
FROM create_merge_indicator
WHERE merge_indicator = 0
)
, merged_rows AS (
SELECT *
FROM merging
)
/* calculate gross income per row*/
, number_of_days AS (
SELECT *
/* calculate number of days in each row */
,CASE
WHEN (period_start_date BETWEEN month_start_date AND month_end_date) AND (period_end_date BETWEEN month_start_date AND month_end_date) THEN DATEDIFF(day, period_start_date, period_end_date) + 1
WHEN month_start_date <= period_start_date THEN DATEDIFF(day, period_start_date, month_end_date) + 1
WHEN month_start_date > period_start_date AND month_end_date < period_end_date THEN DATEDIFF(day, month_start_date, month_end_date) + 1
WHEN month_end_date >= period_end_date THEN DATEDIFF(day, month_start_date, period_end_date) + 1
END AS number_days
/* calculate number of days within the whole spell */
,DATEDIFF(day, period_start_date, period_end_date) + 1 AS total_number_days
FROM merged_rows
)
, calculate_gross_income AS (
SELECT *
, CAST(total_gross_income AS DECIMAL(19,2))/CAST(total_number_days AS DECIMAL(19,2))*CAST(number_days AS DECIMAL(19,2)) AS gross_income
FROM number_of_days
)
/* sum gross income by grouping variables */
SELECT
snz_uid
, month_start_date AS period_start_date
, month_end_date AS period_end_date
, SUM(gross_income) AS gross_income
, taxable_income_indicator
, 'Income Support Payments' AS income_source
INTO #income_support_payment_tier2
FROM calculate_gross_income
GROUP BY snz_uid, month_start_date , month_end_date, taxable_income_indicator;
RAISERROR('Inserted %d rows in #income_support_payment_tier2', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
Create Final Tier-2 table which combines all income sources above
DROP TABLE IF EXISTS $(targetdbtable).$(targetschema).$(projprefix)_total_income;
CREATE TABLE $(targetdbtable).$(targetschema).$(projprefix)_total_income (
snz_uid INT NOT NULL
,period_start_date DATE NOT NULL
,period_end_date DATE NOT NULL
,gross_income DECIMAL(19,2) NOT NULL
,taxable_income_indicator TINYINT NOT NULL
,income_source VARCHAR(30) NOT NULL
);
INSERT INTO $(targetdbtable).$(targetschema).$(projprefix)_total_income
SELECT *
FROM #merge_7_tier1_tables
UNION ALL
SELECT *
FROM #selfEmployment_tier2
UNION ALL
SELECT *
FROM #income_support_payment_tier2;
RAISERROR('Inserted %d rows in Sandpit in t2_total_income', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
ALTER TABLE $(targetdbtable).$(targetschema).$(projprefix)_total_income
REBUILD PARTITION = ALL WITH (DATA_COMPRESSION=PAGE);
DROP TABLE IF EXISTS #mthlookup;
DROP TABLE IF EXISTS #merge_7_tier1_tables;
DROP TABLE IF EXISTS #selfEmployment_tier2;
DROP TABLE IF EXISTS #create_unique_id;
DROP TABLE IF EXISTS #split_spells;
DROP TABLE IF EXISTS #add_variables;
DROP TABLE IF EXISTS #create_merge_indicator;
DROP TABLE IF EXISTS #merged_rows;
DROP TABLE IF EXISTS #calculate_gross_income;
DROP TABLE IF EXISTS #income_support_payment_tier2