Total income module collection: Investments

cori.qian
24 June 2024

Module output

SQL:[IDI_Community].[inc_investments].investments_YYYYMM
SAS: libname cmti ODBC dsn=idi_community_srvprd schema=inc_investments; proc print data = cmti.investments_YYYYMM
How to access a code module in the Data Lab : Read here

Purpose of Investments module

The purpose of this module is to construct the income through investments for people in New Zealand and their relevant income spells.
Within this module, Investment income includes:

Key concepts

Investment income

Investment income is the income, net profit or loss received from investments such as rent, Māori land or other leased land, dividends from New Zealand companies, income as the beneficiary of an estate or trust, royalties, interest from the following: banks, other financial institutions, bonds, stocks, money market funds, debentures or securities.

Total income spell

An income spell is the minimum payment period per income type. Investment income in practice may be earned in a daily, weekly, fortnightly, monthly etc depending on the type of investment.

Source frequency

Source frequency is the frequency with which an income type is stored in the data. Source frequency for investment income is annual.

Gross income amount

Gross income amount is the taxable income before tax that a person receives.

Observed tax

Tax recorded as paid to IRD in the IDI. This is the amount of tax that has been has been deducted from source (eg PAYE and tax filing). Observed tax data for investment income can be sourced from IR3, IR4 and IR7 tables.

Note: observed tax deduction is not the same as actual tax liability since this can only be calculated once all income sources have been identified over a tax period. This information has not been included in the initial version of the module.

Granularity

The modules are setup to be event format: One row per event, events have a person, start and end dates, and descriptive information. The variables below will uniquely identify each event in the above table:

Entity id: snz uid
Period: start end, end date
Event information: income source, taxable income indicator

Data sources

IR3: ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014, ir_ir3_2013_to_2020

IR3 is the income tax return for individuals used to confirm the amount of personal income tax to be paid at the end of each tax year with data available from March 2000 to present. The main purpose of IR3 is to include any monetary payments that have not been taxed at source. IR3 is used for self-employment (filed annually by sole traders) and includes non-zero partnership, or shareholder salary income, as well as rental income. IR3 is mandatory for individuals who earned more than $200 (before tax) income other than salary, wages, interest, dividends, and/or taxable Māori authority distributions. IR3 is filed annually.

Starting from July 2017, IR has extended the supply of IR3 information to the IDI to include further information about investment income, including earnings from interest, dividend, estate trust, overseas, and other sources. Also included are non-taxable income from a range of tax credit entitlements, tax rebate, and student loan-liable income.
There may be a substantial lag in reporting IR3 data as individual businesses and those using tax agents have up to a year after the end of the financial year to file an annual tax return with IR.

Note: 3 IR2 tables (2 ad-hoc tables and 1 clean) are required for creating the timeseries of IR3 data. IR3s belonging to people who reported no self-employment, partnership or shareholder income were excluded for tax years prior to 2019. They are in the ad-hoc tables.

PTS: ir_clean.ird_pts

This table contains information on personal tax summary (PTS) with data available from March 2000 to March 2020. PTS is a tax return for individual taxpayers to show their individual income and tax deduction details for a given tax year. This table can be used to capture interest income and dividend income.

Autocalc: ir_clean.ird_autocalc_information:

This table shows information used by IR to calculate automatic assessments with data available from April 2019 to present. This auto-calc process is applied to people whose income is only salary, wages, interest or dividends. This information is received by IR from employers and payers of investment income. Assessments are calculated and refunds are issued automatically.

Practical notes

Period

The Investment income information is from 1999 onwards.

Negative amounts

No exclusions made for negative amounts.

Staged approach to include tax within the total income module

Stage 1 of module creation will only include gross income amounts. Stage 2 (pending) will include observed tax deducted at source since they are available in EMS and MSD tables. This can be used to calculate the tax at a tax year level which includes tax deducted at source and from tax filings. Stage 3 (pending) aims to derive tax at a monthly level. This is complex and will require support from SME to apply rules to derive tax. This can be used to calculate tax for any time period.

References

  1. Document : experimental-income-estimates-from-linked-admin-data-methods-and-results-revised.pdf
  2. When we work out your tax for you: income tax assessments
  3. 2024 Individual income tax assessment - end of year process
  4. Investment income reporting

Key Contacts

Role Organisation Person
Dev lead MBIE Eric Krassoi Peach
Module Coder Nicholson Consulting Linda Martis
Code Supply Stats NZ MSD Ivan Welsh Marc de Boer
Peer review (code) Independent Vinay Benny
Peer review (documentation) IRD Joanne Butterfield
IRD Scott Anderson
Module Steward Stas NZ Cori Qian

Module business rules

Defining spell start and end dates

Start and end dates are based on IR3, AutoCalc and PTS dates. The return period column in the tables is considered to be the end date. Start will be that start of the annual tax year (1-Apr)

Person identifier snz_uid

A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh

Minimum income threshold

Currently no minimum threshold applied

Interoperability with other modules

This module is in the business landscape so could be that it is most closely associated with business profile modules and employment spells. It is also in the socioeconomic landscape where income is a determinant of positive social outcomes, so associated with social housing, education modules, health modules, driving modules, employment assistance modules etc.

Duplication in data

Interest and dividend data comes from IR3, PTS and Autocalc IR tables. APC code applies prioritisation of IR3 > AC > PTS to record interest and dividends and eliminate duplications.

Taxable income indicator

Taxable income indicator to be hard coded in modules. Within investment income module, all income is taxable.

Code logic:

The code is based on the rules used for APC and derived data tables in the IDI.

  1. Identify individuals earning investment income from 3 IR3 tables. To do so, we will
    a. Extract data from IR3 tables, specifically ir_clean.ird_rtns_keypoints_ir3, ir_ir3_2000_to_2014 and ir_ir3_2013_to_2020 from IDI_Adhoc. Table prioritisation will occur with ir_clean.ird_rtns_keypoints_ir3 being the first source of data, followed by adhoc 2000_to_2014 and adhoc 2013_to_2020.
    b. We will use data which contains rent amounts, gross interest amounts, dividends and estate trust income amounts.
    c. Remove duplicates in tables by using maximum of location number, return_version_number and ird_timestamp_date.
  2. Using the data obtained above in point 1,
    a. aggregate rental income by person’s unique identifiers and time period. Relabel income as S03-IR3.
    b. aggregate gross interest by person’s unique identifiers and time period. Relabel income as INT-IR3.
    c. aggregate gross dividend by person’s unique identifiers and time period. Relabel income as DIV-IR3.
    d. aggregate estate trust income by person’s unique identifiers and time period. Relabel income as EST-IR3.
  3. Identify individuals earning interest and dividend income from personal tax summary table (ir_clean.ird_pts table).
    a. Remove duplicates in table by using maximum of snz_uid and pts_timestamp_date.
  4. Using the data obtained above in point 3,
    a. Relabel interest income as INT-PTS.
    b. Relabel dividend income as DIV-PTS.
  5. Identify individuals earning interest, dividend, māori authority distributions and portfolio entity investment income from autocalc table (ir_clean.ird_autocalc_information table).
    a. Remove duplicates in table by using maximum of snz_uid, return_version_nbr number and ac_processing_date.
  6. Using the data obtained above in point 5,
    a. Relabel interest income as INT-AA.
    b. Relabel dividend income as DIV-AA.
    c. Relabel māori authority distributions income as MAD-AA.
    d. Relabel portfolio entity investment income as PIE-AA.
  7. To summarise, all the labelled income above becomes the investment income.
    a. For interest and dividend income, persons with income recorded in IR3, PTS and/or autocalc table within the same time period are considered as duplcates. To ensure only one entry for interest and divident income is recorded, we will prioritise IR3 data first, AA data second and PTS data third (IR3 > AC > PTS)

Parameters

The following parameters should be supplied to this module to run it in the database:

  1. {targetdb}: The SQL database on which the spell datasets are to be created.
  2. {idicleanversion}: The IDI Clean version that the spell datasets need to be based on.
  3. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  4. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.

Dependencies

[IDI_Clean_YYYYMM].[ir_clean].[ird_rtns_keypoints_ir3]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2000_to_2014]
[IDI_Adhoc].[clean_read_IR].[ir_ir3_2013_to_2020]
[IDI_Clean_YYYYMM].[ir_clean].[ird_pts] 
[IDI_Clean_YYYYMM].[ir_clean].[ird_autocalc_information] 

Outputs

{targetdb}.{targetschema}.{projprefix}_investments_{refresh}

Variable Descriptions

Aspect Variables Description
Entity snz_uid Current refresh snz_uid
Period period_start_date period_end_date Payment start date Payment end date
Event source_frequency Frequency of income recorded in source data Note that investments are recorded in IR3, PTS, and Autocalc tables as annual income
Information income_source Type of income being derived in this module
Note that: - INT-IR3 is gross interest income from IR3 tables - INT-AA is gross interest income from Autocalc table - INT-PTS is gross interest income from PTS table - DIV-IR3 is gross dividend income from IR3 tables - DIV-AA is gross dividend income from Autocalc table - DIV-PTS is gross dividend income from PTS table - EST-IR3 is estate trust income from IR3 tables - MAD-AA is Total Maori Authority Distribution from Autocalc table - PIE-AA is Total Portfolio Entity Investment income from Autocalc table - S03-IR3 is rental income income from IR3 tables
taxable_income_indicator Indicator for whether income is taxable Note that: 1:income is taxable 0:income is not taxable Investments are taxable
gross_income Income before tax

Module Version & Change History

Date Version Comments
2023-07 Original code sourced from APC code created by Stats NZ (Ivan Welsh) in the IDI
2024-06 Initial streamlining and preparation for code module by Linda Martis, Nicholson Consulting

Code

:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"

/* Assign the target database to which all the components need to be created in. */
use $(targetdb);

/*
This script combines a number of different source of IR3 data into a single table.
Sources:IR3 table, adhoc 2000_to_2014, adhoc 2013_to_2020

Duplicates are removed from the data by filtering for the maximum return_version_nbr,
ird_timestamp_date, source_code, snz_unique_nbr and finally location_nbr.

Create the storage table. This follows the format of the refresh IR3 table,
but without the leading ir_ir3_ prefix.
*/

DROP TABLE IF EXISTS #CombinedIR3;
CREATE TABLE #CombinedIR3 (
	    snz_uid                   INT NOT NULL
       ,snz_ird_uid               INT NOT NULL
       ,location_nbr              INT NOT NULL
       ,return_period_date        DATE NOT NULL
       ,snz_unique_nbr            BIGINT NOT NULL
       ,return_version_nbr        SMALLINT NULL
       ,income_imp_ind            VARCHAR(1) NOT NULL
       ,net_rents_826_amt         DECIMAL(13,2) NULL
       ,gross_interest_amt        DECIMAL(13,2) NULL
       ,gross_dividend_amt        DECIMAL(13,2) NULL
       ,estate_trust_income_amt   DECIMAL(13,2) NULL
	   ,sum_all_income_amt		  DECIMAL(13,2) NULL
       ,ird_timestamp_date        DATE NULL
	    /* 1 for refresh IR3 table, 2 for adhoc 2000_to_2014, 3 for adhoc 2013_to_2020 */
       ,source_table_code         TINYINT NOT NULL 
       ,PRIMARY KEY (snz_ird_uid, return_period_date)
);

/* Combine the source tables into one */
INSERT INTO #CombinedIR3
SELECT con.snz_uid AS snz_uid, con.snz_ird_uid AS snz_ird_uid, ir_ir3_location_nbr AS location_nbr, ir_ir3_return_period_date AS return_period_date, ir_ir3_snz_unique_nbr AS snz_unique_nbr, ir_ir3_return_version_nbr AS return_version_nbr, 
	   ir_ir3_income_imp_ind AS income_imp_ind, 
       ir_ir3_net_rents_826_amt AS net_rents_826_amt, 
       ir_ir3_gross_interest_amt AS gross_interest_amt, ir_ir3_gross_dividend_amt AS gross_dividend_amt, ir_ir3_estate_trust_income_amt AS estate_trust_income_amt,
       isnull(ir_ir3_net_rents_826_amt,0)+
       isnull(ir_ir3_gross_interest_amt,0)+
       isnull(ir_ir3_gross_dividend_amt,0)+
       isnull(ir_ir3_estate_trust_income_amt,0)
	   AS sum_all_income_amt, 
	   ir_ir3_ird_timestamp_date as ird_timestamp_date, source_table_code
	FROM( 
		SELECT *, ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, ir_ir3_return_period_date ORDER BY ir_ir3_return_version_nbr desc, ir_ir3_ird_timestamp_date desc, source_table_code asc, ir_ir3_snz_unique_nbr desc, ir_ir3_location_nbr desc) AS row_nbr 
			FROM ( 
                   SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
													 ir_ir3_income_imp_ind, 
                                                     ir_ir3_net_rents_826_amt, 
                                                     ir_ir3_gross_interest_amt, ir_ir3_gross_dividend_amt, ir_ir3_estate_trust_income_amt, 
													 ir_ir3_ird_timestamp_date, 1 AS source_table_code
                                                FROM $(idicleanversion).ir_clean.ird_rtns_keypoints_ir3
                                               UNION ALL
                     SELECT snz_ird_uid, ir_ir3_location_nbr, ir_ir3_return_period_date, ir_ir3_snz_unique_nbr, ir_ir3_return_version_nbr, 
													 ir_ir3_income_imp_ind, 
                                                     ir_ir3_net_rents_826_amt, 
                                                     ir_ir3_gross_interest_amt, ir_ir3_gross_dividend_amt, ir_ir3_estate_trust_income_amt, 
													 ir_ir3_ird_timestamp_date, 2 AS source_table_code
                                                FROM IDI_Adhoc.clean_read_IR.ir_ir3_2000_to_2014
                                               UNION ALL
                     SELECT snz_ird_uid, location_number AS ir_ir3_location_nbr, period AS ir_ir3_return_period_date,
                                                     ROW_NUMBER() OVER (PARTITION BY snz_ird_uid, period ORDER BY return_version, gross_interest desc, rents desc, gross_dividend desc, estate_income desc) AS ir_ir3_snz_unique_nbr,
                                                     return_version AS ir_ir3_return_version_nbr, 
                                                     'N' AS ir_ir3_income_imp_ind, 
													 rents AS ir_ir3_net_rents_826_amt, 
                                                     gross_interest AS ir_ir3_gross_interest_amt, gross_dividend AS ir_ir3_gross_dividend_amt,
                                                     estate_income AS ir_ir3_estate_trust_income_amt, 
                                                     CAST(timestamp AS DATE) AS ir_ir3_ird_timestamp_date, 3 AS source_table_code
                                                FROM IDI_Adhoc.clean_read_IR.ir_ir3_2013_to_2020
                                             ) tu
        ) t5
 INNER JOIN $(idicleanversion).security.concordance con
       ON con.snz_ird_uid = t5.snz_ird_uid
 WHERE row_nbr = 1
   AND YEAR(ir_ir3_return_period_date) BETWEEN 2000 AND year(GETDATE());

RAISERROR('Inserted %d rows from IR3 into #CombinedIR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
This script creates #IncomeTaxYr table
Only annual data stored from #CombinedIR3
*/

/* Create the equivalent of the data.income_tax_yr table */
DROP TABLE IF EXISTS #IncomeTaxYr;
CREATE TABLE #IncomeTaxYr (
    year_nbr DATE NOT NULL
    ,snz_uid INT NOT NULL
    ,snz_ird_uid INT NOT NULL
    ,snz_employer_ird_uid INT NOT NULL
    ,income_source_code CHAR(7) NOT NULL
    ,withholding_type_code CHAR(1) NOT NULL

    ,tot_yr_amt DECIMAL(19,2) NOT NULL
);

/*
Add self employed income from IR3 to tax_yr table - Rental income
Adds the (S03) - Rental income from IR3
Rules: (A) IR3 net rents amount <> 0
*/
INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
    withholding_type_code, tot_yr_amt
)
SELECT (ir3.return_period_date) AS year_nbr
       ,MIN(ir3.snz_uid) AS snz_uid
       ,MIN(ir3.snz_ird_uid) AS snz_ird_uid
       ,MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid
       ,'S03-IR3' AS income_source_code
       ,'X' AS withholding_type_code
       ,SUM(ir3.net_rents_826_amt) AS tot_yr_amt
  FROM #CombinedIR3 ir3
 WHERE ir3.net_rents_826_amt <> 0
   AND YEAR(ir3.return_period_date) BETWEEN 2000 AND year(GETDATE()) 
 GROUP BY (ir3.return_period_date), ir3.snz_ird_uid;

RAISERROR('Inserted %d rows for S03-IR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Add self employed income from IR3 to tax_yr table - Gross interest
Adds the (INT) - Gross interest from IR3
Rules: (A) IR3 gross_interest_amt <> 0
*/ 
INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
    withholding_type_code, tot_yr_amt
)
SELECT (ir3.return_period_date) AS year_nbr,
       MIN(ir3.snz_uid) AS snz_uid,
       MIN(ir3.snz_ird_uid) AS snz_ird_uid,
       MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid,
       'INT-IR3' AS income_source_code,
       'X' AS withholding_type_code,
       SUM(ir3.gross_interest_amt) AS tot_yr_amt
  FROM #CombinedIR3 ir3
 WHERE ir3.gross_interest_amt <> 0
   AND YEAR(ir3.return_period_date) BETWEEN 2000 AND year(GETDATE()) 
 GROUP BY (ir3.return_period_date), ir3.snz_ird_uid;

RAISERROR('Inserted %d rows for INT-IR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO
/*
Add self employed income from IR3 to tax_yr table - Gross dividend
Adds the (DIV) - Gross dividend from IR3
Rules: (A) IR3 gross_dividend_amt <> 0
*/

INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
    withholding_type_code, tot_yr_amt
)
SELECT (ir3.return_period_date) AS year_nbr,
       MIN(ir3.snz_uid) AS snz_uid,
       MIN(ir3.snz_ird_uid) AS snz_ird_uid,
       MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid,
       'DIV-IR3' AS income_source_code,
       'X' AS withholding_type_code,
       SUM(ir3.gross_dividend_amt) AS tot_yr_amt
  FROM #CombinedIR3 ir3
 WHERE ir3.gross_dividend_amt <> 0
   AND YEAR(ir3.return_period_date) BETWEEN 2000 AND year(GETDATE())
 GROUP BY (ir3.return_period_date), ir3.snz_ird_uid;

RAISERROR('Inserted %d rows for DIV-IR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Add self employed income from IR3 to tax_yr table - Estate trust income
 Adds the (EST) - Estate trust income from IR3
 Rules: (A) IR3 estate_trust_income_amt <> 0
*/
INSERT INTO #IncomeTaxYr WITH (TABLOCK) (
    year_nbr, snz_uid, snz_ird_uid, snz_employer_ird_uid, income_source_code,
    withholding_type_code, tot_yr_amt
)
SELECT (ir3.return_period_date) AS year_nbr,
       MIN(ir3.snz_uid) AS snz_uid,
       MIN(ir3.snz_ird_uid) AS snz_ird_uid,
       MIN(ir3.snz_ird_uid) AS snz_employer_ird_uid,
       'EST-IR3' AS income_source_code,
       'X' AS withholding_type_code,
       SUM(ir3.estate_trust_income_amt) AS tot_yr_amt
  FROM #CombinedIR3 ir3
 WHERE ir3.estate_trust_income_amt <> 0
   AND YEAR(ir3.return_period_date) BETWEEN 2000 AND year(GETDATE()) 
 GROUP BY (ir3.return_period_date), ir3.snz_ird_uid;

RAISERROR('Inserted %d rows for EST-IR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
This script creates #PTSIncome and #AAIncome table
Only annual data stored from ird.pts and ird_autocalc_information
*/

/*
Deduplicated PTS
 Deduplicates the PTS table
*/ 

DROP TABLE IF EXISTS #PTSIncomeDedup;
CREATE TABLE #PTSIncomeDedup (
    year_nbr DATE NOT NULL,
    snz_uid INT NOT NULL,
    interest_amt DECIMAL(19,2) NULL,
    dividend_amt DECIMAL(19,2) NULL,
    rwt_amt DECIMAL(19,2) NULL,
    dwt_amt DECIMAL(19,2) NULL
    ,PRIMARY KEY (snz_uid, year_nbr)
);

INSERT INTO #PTSIncomeDedup WITH (TABLOCK) ( 
    year_nbr, snz_uid, interest_amt, dividend_amt, rwt_amt, dwt_amt
)
SELECT year_nbr, snz_uid, interest_amt, dividend_amt, rwt_amt, dwt_amt
  FROM (/* Filter for max timestamp */
        SELECT *, MAX(ir_pts_snz_unique_nbr) OVER (PARTITION BY snz_uid, year_nbr) AS max_nbr
          FROM (/* Filter for valid interest amounts */
                SELECT snz_uid,
                       (ir_pts_return_period_date) AS year_nbr,
                       ir_pts_tot_interest_amt AS interest_amt,
                       ir_pts_tot_dividend_amt AS dividend_amt,
                       ir_pts_tot_rwt_amt AS rwt_amt,
                       ir_pts_tot_dwt_amt AS dwt_amt,
                       ir_pts_snz_unique_nbr,
                       ir_pts_timestamp_date,
                       MAX(ir_pts_timestamp_date) OVER (PARTITION BY snz_uid, 
                                                                     ir_pts_return_period_date) AS max_timestamp
                  FROM $(idicleanversion).ir_clean.ird_pts
                 WHERE YEAR(ir_pts_return_period_date) BETWEEN 2000 AND year(GETDATE())
                ) t
         WHERE ir_pts_timestamp_date = max_timestamp
        ) t3
 WHERE ir_pts_snz_unique_nbr = max_nbr;
 
RAISERROR('Inserted %d rows for dedpulicated PTS', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*  Create the table for PTS data */ 
DROP TABLE IF EXISTS #PTSIncome;
CREATE TABLE #PTSIncome (
    year_nbr DATE NOT NULL,
    snz_uid INT NOT NULL,
    income_source_code CHAR(7) NOT NULL,
    tot_yr_amt DECIMAL(19,2) NOT NULL,
);
/*
PTS Interest
Adds the INT source from the PTS table. Is deduplicated
first by timestamp then by unique_nbr
*/
INSERT INTO #PTSIncome WITH (TABLOCK) ( 
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, 'INT-PTS' AS income_source_code, interest_amt
  FROM #PTSIncomeDedup
 WHERE interest_amt <> 0;

RAISERROR('Inserted %d rows for INT-PTS', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
PTS Dividends
Adds the DIV source from the PTS table. Is deduplicated
first by timestamp then by unique_nbr
*/
INSERT INTO #PTSIncome WITH (TABLOCK) ( 
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, 'DIV-PTS' AS income_source_code, dividend_amt
  FROM #PTSIncomeDedup
 WHERE dividend_amt <> 0;

RAISERROR('Inserted %d rows for DIV-PTS', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/* Create the table for AA data */
DROP TABLE IF EXISTS #AAIncome;
CREATE TABLE #AAIncome (
    year_nbr DATE NOT NULL,
    snz_uid INT NOT NULL,
    income_source_code CHAR(7) NOT NULL,
    tot_yr_amt DECIMAL(19,2) NOT NULL,
);

/*
Deduplicated AA
Deduplicates the AA table
*/

DROP TABLE IF EXISTS #AAIncomeDedup;
CREATE TABLE #AAIncomeDedup (
    year_nbr DATE NOT NULL,
    snz_uid INT NOT NULL,
    interest_amt DECIMAL(19,2) NULL,
    dividend_amt DECIMAL(19,2) NULL,
    maori_authority_amt DECIMAL(19,2) NULL,
    pie_amt DECIMAL(19,2) NULL,
    rwt_amt DECIMAL(19,2) NULL,
    dwt_amt DECIMAL(19,2) NULL,
    pir_amt DECIMAL(19,2) NULL
    ,PRIMARY KEY (snz_uid, year_nbr)
);
 
INSERT INTO #AAIncomeDedup WITH (TABLOCK) ( 
    year_nbr, snz_uid, interest_amt, dividend_amt, maori_authority_amt, pie_amt, rwt_amt, dwt_amt, pir_amt
)
SELECT year_nbr, snz_uid, interest_amt, dividend_amt, maori_authority_amt, pie_amt, rwt_amt, dwt_amt, pir_amt
  FROM (/* Filter for max version number */
        SELECT *, MAX(ir_ac_snz_unique_nbr) OVER (PARTITION BY snz_uid, year_nbr) AS max_nbr
          FROM (/* Filter for max processing date */
                SELECT *, MAX(ir_ac_return_version_nbr) OVER (PARTITION BY snz_uid, year_nbr) AS max_version
                  FROM (/* Filter for valid interest amounts */
                        SELECT snz_uid,
                               (ir_ac_return_period_date) AS year_nbr,
                               ir_ac_tot_gross_intrst_amt AS interest_amt,
                               ir_ac_tot_gross_dvdnd_amt AS dividend_amt,
                               ir_ac_tot_ma_authrty_distbtn_amt AS maori_authority_amt,
                               ir_ac_tot_pie_income_amt AS pie_amt,
                               ir_ac_rwt_on_interest_amt AS rwt_amt,
                               ir_ac_rwt_on_dvdnd_amt AS dwt_amt,
                               ir_ac_tot_pie_deductions_amt AS pir_amt,
                               ir_ac_snz_unique_nbr,
                               ir_ac_return_version_nbr,
                               ir_ac_processing_date,
                               MAX(ir_ac_processing_date) OVER (PARTITION BY snz_uid, 
                                                                             ir_ac_return_period_date) AS max_processing
                          FROM $(idicleanversion).ir_clean.ird_autocalc_information
                         WHERE YEAR(ir_ac_return_period_date) BETWEEN 2000 AND year(GETDATE()) 
                        ) t
                 WHERE ir_ac_processing_date = max_processing
                ) t3
         WHERE ir_ac_return_version_nbr = max_version
        ) t4
 WHERE ir_ac_snz_unique_nbr = max_nbr;

RAISERROR('Inserted %d rows for dedpulicated AA', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
AA Interest
Adds the INT source from the AA table. Is deduplicated
first by processing date, then timestamp then by unique_nbr
*/
INSERT INTO #AAIncome WITH (TABLOCK) ( 
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, 'INT-AA' AS income_source_code, interest_amt
  FROM #AAIncomeDedup
 WHERE interest_amt <> 0;

RAISERROR('Inserted %d rows for INT-AA', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
AA Dividends
Adds the DIV source from the AA table. Is deduplicated
first by processing date, then timestamp then by unique_nbr
*/
INSERT INTO #AAIncome WITH (TABLOCK) ( 
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, 'DIV-AA' AS income_source_code, dividend_amt
  FROM #AAIncomeDedup
 WHERE dividend_amt <> 0;

RAISERROR('Inserted %d rows for DIV-AA', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
AA Maori authority distributions
Adds the MAD source from the AA table. Is deduplicated
first by processing date, then timestamp then by unique_nbr
*/
INSERT INTO #AAIncome WITH (TABLOCK) ( 
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, 'MAD-AA' AS income_source_code, maori_authority_amt
  FROM #AAIncomeDedup
 WHERE maori_authority_amt <> 0;

RAISERROR('Inserted %d rows for MAD in AA', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
AA PIE Income
Adds the PIE source from the AA table. Is deduplicated
first by processing date, then timestamp then by unique_nbr
*/
INSERT INTO #AAIncome WITH (TABLOCK) ( 
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, 'PIE-AA' AS income_source_code, pie_amt
  FROM #AAIncomeDedup
 WHERE pie_amt <> 0;

RAISERROR('Inserted %d rows for PIE-AA', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Create table #IRCombined to combine investment sources from IR3, AA and PTS
When combining the IR3, PTS and AA INT and DIV sources, the priority order is IR3 > AA > PTS.
Start by adding all #AggregateIR3Income INT/DIV data to the table, followed by #AAIncome and lastly by #PTSIncome and then remaining investments from #AggregateIR3Income and #AAIncome
*/

/*
IR3 INT/DIV data
*/

 DROP TABLE IF EXISTS #IRCombined; 
CREATE TABLE #IRCombined (
    year_nbr DATE NOT NULL,
    snz_uid INT NOT NULL,
    income_source_code CHAR(7) NOT NULL
    ,tot_yr_amt DECIMAL(19,2) NOT NULL
);

INSERT INTO #IRCombined WITH (TABLOCK) (
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr
	, MIN(snz_uid) AS snz_uid 
	, income_source_code
	, SUM(tot_yr_amt) AS  tot_yr_amt 
  FROM #IncomeTaxYr 
 WHERE income_source_code IN ('INT-IR3', 'DIV-IR3', 'TAX-IR3') 
 GROUP BY year_nbr, snz_uid, income_source_code;

RAISERROR('Inserted %d rows of INT-IR3, DIV-IR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
AA INT/DIV data
*/
INSERT INTO #IRCombined WITH (TABLOCK) (
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, income_source_code, tot_yr_amt
  FROM #AAIncome aa
 WHERE income_source_code IN ('INT-AA ', 'DIV-AA ')
   AND NOT EXISTS ( SELECT 1
                      FROM ( SELECT * 
                               FROM #IRCombined 
                              WHERE income_source_code IN ('INT-IR3', 'DIV-IR3')
                           ) ir3
                     WHERE aa.year_nbr = ir3.year_nbr
                       AND aa.snz_uid = ir3.snz_uid
                       AND LEFT(aa.income_source_code, 3) = LEFT(ir3.income_source_code, 3)
                   );

RAISERROR('Inserted %d rows of INT-AA, DIV-AA', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
PTS INT/DIV data
*/
INSERT INTO #IRCombined WITH (TABLOCK) (
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, income_source_code, tot_yr_amt
  FROM #PTSIncome pts
 WHERE income_source_code IN ('INT-PTS', 'DIV-PTS')
   AND NOT EXISTS ( SELECT 1
                      FROM ( SELECT * 
                               FROM #IRCombined 
                              WHERE LEFT(income_source_code, 3) IN ('INT', 'DIV')
                           ) ir3aa
                     WHERE pts.year_nbr = ir3aa.year_nbr
                       AND pts.snz_uid = ir3aa.snz_uid
                       AND LEFT(pts.income_source_code, 3) = LEFT(ir3aa.income_source_code, 3)
                   );

RAISERROR('Inserted %d rows of INT-PTS, DIV-PTS', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Remaining DerivedIR3 data, excluding zeroes
*/
INSERT INTO #IRCombined WITH (TABLOCK) (
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr
	, MIN(snz_uid) AS snz_uid
	, income_source_code
	, SUM(tot_yr_amt) AS  tot_yr_amt
  FROM #IncomeTaxYr 
 WHERE income_source_code NOT IN ('INT-IR3', 'DIV-IR3', 'TAX-IR3',
                                  'INT-AA ', 'DIV-AA ', 'RWT-AA ', 'DWT-AA ',
                                  'INT-PTS', 'DIV-PTS', 'RWT-PTS', 'DWT-PTS')
GROUP BY year_nbr, snz_uid, income_source_code;

RAISERROR('Inserted %d rows of non INT/DIV for DerivedIR3', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Remaining AA data (PTS has no more data)
*/
INSERT INTO #IRCombined WITH (TABLOCK) (
    year_nbr, snz_uid, income_source_code, tot_yr_amt
)
SELECT year_nbr, snz_uid, income_source_code, tot_yr_amt
  FROM #AAIncome
 WHERE income_source_code NOT IN ('INT-AA ', 'DIV-AA ', 'RWT-AA ', 'DWT-AA ');

RAISERROR('Inserted %d rows of non INT/DIV for AA', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

/*
Create a Sandpit table using #IRCombined 
Note: data stored as tax year data. No monthy data.
*/

DROP TABLE IF EXISTS $(targetdb).$(targetschema).$(projprefix)_investments;
CREATE TABLE $(targetdb).$(targetschema).$(projprefix)_investments (
	 snz_uid INT NOT NULL
	,period_start_date DATE NOT NULL
	,period_end_date DATE NOT NULL
	,source_frequency CHAR(7) NOT NULL
	,income_source CHAR(7) NOT NULL
	,taxable_income_indicator TINYINT NOT NULL 
	,gross_income DECIMAL(19,2) NULL
	);

INSERT INTO $(targetdb).$(targetschema).$(projprefix)_investments

SELECT
	snz_uid
	,DATEFROMPARTS(YEAR(year_nbr)-1, MONTH(DATEADD(month,1,year_nbr)), 01) as 'period_start_date'
	,year_nbr as 'period_end_date'
	,'annual' as source_frequency
	,income_source_code as income_source
	,1 as taxable_income_indicator
	,tot_yr_amt as gross_income
FROM #IRCombined
WHERE tot_yr_amt <> 0

RAISERROR('Inserted %d rows in final table in Sandpit', 0, 1, @@ROWCOUNT) WITH NOWAIT;
GO

ALTER TABLE $(targetdb).$(targetschema).$(projprefix)_investments
REBUILD PARTITION = ALL WITH (DATA_COMPRESSION=PAGE);

DROP TABLE IF EXISTS #CombinedIR3;
DROP TABLE IF EXISTS #IncomeTaxYr;
DROP TABLE IF EXISTS #PTSIncomeDedup;
DROP TABLE IF EXISTS #PTSIncome;
DROP TABLE IF EXISTS #AAIncome;
DROP TABLE IF EXISTS #AAIncomeDedup;
DROP TABLE IF EXISTS #IRCombined