Ethnic Communities' ethnicities

nick.preval
27 June 2025

Module Output

SQL:[IDI_Community].[eth_ec_ethnicities].[ec_ethnicities_YYYYMM]
SAS: libname cm_ec_ethnicities dsn=idi_community_srvprd schema=eth_ec_ethnicities; run ;
How to access a code module in the Data Lab:Read here

Purpose

The purpose of this module is to provide IDI users with ethnicty data relating to the Ethnic Communities populations that the Ministry for Ethnic Communities (MEC) serves coded
according to the Stats NZ published Ethnic Communities View of ETHNIC05 V2.1 concordance available here:
https://aria.stats.govt.nz/aria/#ConcordanceView:uri=http://stats.govt.nz/cms/ConcordanceVersion/QtaGtA8eVTvtszOV

The expanded definition of African (African+) developed by MEC which includes ethnicities such as Afrikaans, and Zimbabwean European that are not included in that
standard Stats NZ definition may increase the potential to analyse and output African data or MELAA data using the MELAA+ (Middle Eastern, Latin American and African+) definition.
For example in the 2023 Census there were ten thousand people under “Other” which our African+ definition includes as African, this represents a significant increase in population size.
Similarly, the MEC Continental European grouping is not part of the standard Stats NZ ethnicity approach but is a separate group in the Ethnic Communities View of ETHNIC05 V2.1.
We hope that making this grouping more accessible to IDI users may increase the visibility of this group in IDI analyses and output. We also include ethnicity data for the following
Ethnic Communities ethnicities: Chinese, Indian, South East Asian, Other Asian, Latin American, and Middle Eastern. For completeness we include the following Non Ethnic Communities ethnicities:
Other European, Maori and Pacific Peoples. Finally, we include European and MELAA ethnic groupings based on the Standard Stats NZ definitions to aid in comparisons researchers way
want to make.

Comparison against other sources

Comparison of Census 2023 individual data outputted by Stats NZ using our Ethnic Communities ethnicity definition with Census 2023 data created using this code showed a high degree of similarity in terms of proportions for Asian and MELAA+, and a small discrepancy in the proportion for Continental Europeans (within 1%).

References & Contacts

  1. Aria concordance between Ethnicity New Zealand Standard Classification 2005 V2.1 and Ethnic Communities View of ETHNIC05 V2.1
    https://aria.stats.govt.nz/aria/#ConcordanceView:uri=http://stats.govt.nz/cms/ConcordanceVersion/QtaGtA8eVTvtszOV
  2. Personal details table documentation: “DRAFT IDI Personal Details dervided table - under the hood” provided by Caitlin Henderson (Stats NZ)
  3. This code built on “prioritised_level2_ethnicity”, SQL code written by Craig Wright and made available via Github:
    Social Wellbeing Agency. Definitions library. Source code. GitHub - nz-social-wellbeing-agency/definitions_library: Definitions from SWA's projects suitable for use with the dataset assembly tool.
  4. Fiona Thomson - Director, Analytics Monitoring and Evaluation (AME) - Ministry for Ethnic Communities - @fiona.thomson
  5. Nick Preval - Senior Analyst - Ministry for Ethnic Communities - @nick.preval
  6. Deb Potter - Principal Analyst - Ministry for Ethnic Communities - @deb.potter

Code module technical information

The following sections provide detail on how the EC_ethnicity table is built in the IDI.

Community of Interest

Role Organisation Name
Lead Developer Ministry for Ethnic Communities Nick Preval
Peer reviewers(code) Keith McLeod Consulting Keith McLeod
Ministry for Ethnic Communities Wimcor von Tonder
Keith McLeod Consulting Keith McLeod
Peer reviewers(documentation) Ministry for Ethnic Communities Deb Potter
Ministry for Ethnic Communities Fiona Thomson
Keith McLeod Consulting Keith McLeod
Stats NZ Hannes Diener
Policy expert Ministry for Ethnic Communities Deb Potter
Module steward Ministry for Ethnic Communities Fiona Thomson

Stewardship team

Role Organisation Name
Lead Developer Ministry for Ethnic Communities Nick Preval
Policy expert Ministry for Ethnic Communities Deb Potter
Module steward Ministry for Ethnic Communities Fiona Thomson

Module Business Rules

  1. The following sources are used to obtain ethnicity information. Group and rank information are taken from the Personal Details table construction methodology.
Source Abbreviation Group Rank Eth. data avail.
Census 2023 CEN2023* 1 1 L4
Census 2018 CEN2018* 1 2 L4
Census 2013 CEN2013 1 3 L4
Department of Internal Affairs DIA 1 4 L4
Ministry of Health MOH 1 5 L2
Ministry of Education MOE 1 6 L3
Accident Compensation Corp ACC 1 7 L2
Ministry of Social Development MSD 1 8 L2
Student loan data from Studylink SLA_MSD 2 1 L3
Household Economic Survey HES 2 2 L1
Survey of Family Income and Employment SOFIE 2 3 L2
Longitudinal Immigration Survey of NZ LISNZ 2 4 L4
Household Labour Force Survey HLFS 2 5 L4 (L1 pre-2016)
Auckland City Mission ACM 2 6 L2
General Social survey GSS 2 7 L4
PIAAC PIAAC 2 8 L4
NZ Crime and Victims Survey NZCVS 2 9 L1, L2 (Indian and Chinese only), L4
Working for Families WFF 3 1 L1
Student loan data from Ministry of Education SLA_SLAM_MOE 3 2 L1
Student loan data from Inland Revenue SLA_SLAM_IR 3 3 L1
*Imputed data is not used as per Personal Details table approach
  1. All available ethnicity information from relevant tables is processed into four columns (L1,L2,L3,L4). Lower level values are derived if necessary:
    e.g. if a source has L4 data only, L1-L3 are generated by the code.

  2. Rows which do not contain any L1 values other than 9 are dropped (consistent with Personal Details table business rules).

  3. The following ethnicity variables are generated using the four columns:

Variable name Rule used
Ethnic Community groups
Asian L1 = 4
Chinese L2 = 42 (Chinese)
Indian L2 = 43 (Indian)
Other Asian L2 = 40 (Asian nfd ) or
L3 = 444 (Other Asian)
South East Asian L2 = 41 (South East Asian)
MELAA+ L1 = 5 (MELAA) or
L4 = 61115 (Mauritian), 61116 (Seychelles), 61117 (Other South African),
12948 (South African European), 12949 (Afrikaner), 12950 (Zimbabwean European)
African+ L2 = 53 (African) or
L4 = 61115 (Mauritian), 61116 (Seychelles), 61117 (Other South African),
12948 (South African European), 12949 (Afrikaner), 12950 (Zimbabwean European)
Latin American L2 = 52 (Latin American)
Middle Eastern L2 = 51 (Middle Eastern)
Continental European L3 = 122 (Dutch), 123 (Greek), 124 (Polish), 125 (South Slav), 126 (Italian),
127 (German) or
L4 = 12911 (Albanian), 12912 (Armenian), 12913 (Austrian), 12914 (Belgian),
12915 (Bulgarian), 12916 (Belorussian), 12918 (Cypriot nfd), 12919 (Czech),
12920 (Danish), 12921 (Estonian), 12922 (Finnish), 12923 (Flemish),
12924 (French), 12926 (Hungarian), 12927 (Icelandic), 12928 (Latvian),
12929 (Lithuanian), 12930 (Maltese), 12931 (Norwegian), 12932 (Portuguese),
12933 (Romanian), 12934 (Romani), 12935 (Russian) ,12937 (Slavic),
12938 (Slovak), 12939 (Spanish), 12940 (Swiss), 12941 (Swedish),
12942 (Ukrainian)
---------------------------------------------------------------------------------------
Non-Ethnic Community groups
Other European L2 = 10 (European NFD) or 11 (New Zealand European)
L3 = 121 (British and Irish), 128 (Australian European)
L4 = 12943 (American), 12945 (Canadian), 12947 (New Caledonian), 12999 (European NEC)
Maori L1 = 2
Pacific Peoples L1 = 3
---------------------------------------------------------------------------------------
Standard Stats NZ groups
European L1 = 1
MELAA L1 = 5
  1. Where the code generates multiple rows for a given source and snz_uid, rows are collapsed into a single row, preserving all ethnicity information.

  2. The highest ranked source/row for each snz_uid is retained.

  3. A “Total Ethnic Communities” variable is derived (if Asian = 1 or MELAA+ = 1 or Continental European = 1)

  4. Zeros are replaced with NULL where there is not enough information to be certain that they are accurate. For example, if the top ranked data source for a particular snz_uid
    was L1 and the only L1 ethnicity was a 6 (Other) then we would code MELAA+ as NULL rather than zero because the 6 is consistent with L4 variables such as 61115 (Mauritian).

Open Issues/Comments

  1. This module draws ethnicity data from a broad range of datasets - the majority of these datasets do not contain Level 4 data. We were concerned to ensure that users do
    not conflate cases where there is no information about a given ethnicity and cases where there is but the correct value is 0. We used the approach described above, but more
    complex methods which combine the top ranked data sources available for each level for each snz_uid were considered as a different way of addressing this issue. Ultimately
    the increased complexity was not felt to be justified. Future iterations of this module might explore this question further and also review whether adopting the Personal
    Details table ranking is the best approach.

  2. Note that “Other Asian” is not a residual category, although it may appear to be so at first glance to users. The Ethnic Communities definition of Other Asian is an L3
    grouping and differs from the Stats NZ definition which is a L2 grouping and functions closer to being a residual category (except that it excludes Asian nfd).
    Ethnic Communities “Other Asian” includes Asian nfd but does not include Sri Lankan (L3 = 441),Japanese (L3 = 442), or Korean (L3 = 443) and are not
    included in the current table due to the relatively small size of these groups.

Parameters

The following parameters should be supplied to this module to run it in the database:

  1. {targetdb}: The SQL database on which the spell datasets are to be created.
  2. “{idicleanversion}”: The IDI Clean version that the spell datasets need to be based on.
  3. {targetschema}: The project schema under the target database into which the spell datasets are to be created.
  4. {projprefix}: A (short) prefix that enables you to identify the spell dataset easily in the schema, and prevent overwriting any existing datasets that have the same name.

Dependencies

  1. Tables used by code module
Source Tables
Census 2023 {idicleanversion}.[cen_clean].[census_individual_2023]
Census 2018 {idicleanversion}.[cen_clean].[census_individual_2018]
Census 2013 {idicleanversion}.[cen_clean].[census_individual_2013]
Department of Internal Affairs {idicleanversion}.[dia_clean].[births]
Department of Internal Affairs {idicleanversion}.[dia_clean].[deaths]
Ministry of Health {idicleanversion}.[moh_clean].[nes_enrolment]
Ministry of Health {idicleanversion}.[moh_clean].[pop_cohort_demographics]
Ministry of Health {idiadhoc}.[clean_read_MOH_CIR].[moh_cir_nhi_20230519]
Ministry of Education {idicleanversion}.[moe_clean].[student_per]
Ministry of Education {idicleanversion}.[moe_clean].[enrolment]
Accident Compensation Corp {idicleanversion}.[acc_clean].[clients]
Ministry of Social Development {idicleanversion}.[msd_clean].[msd_swn]
Student loan data from Studylink {idicleanversion}.[sla_clean].[msd_borrowing]
Household Economic Survey {idiadhoc}.[hes_clean].[hes_person]
Household Economic Survey {idicleanversion}.[hes_clean].[hes_person]
Survey of Families Income and Employment {idicleanversion}.[sofie_clean].[person_waves]
Longitudinal Immigration Survey for NZ {idicleanversion}.[lisnz_clean].[ethnicity]
Household Labourforce Survey {idicleanversion}.[hlfs_clean].[data]
Household Labourforce Survey {idicleanversion}.[hlfs_clean].[nzis_pre2016]
Auckland City Mission {idicleanversion}.[acm_clean].[acm_client]
General Social survey {idicleanversion}.[gss_clean].[gss_identity]
PIAAC {idicleanversion}.[piaac_clean].[data]
NZ Crime and Victims Survey {idicleanversion}.[nzcvs_clean].[core]
Working for Families {idicleanversion}.[wff_clean].[spells]
Student loan data from Ministry of Education {idicleanversion}.[sla_clean].[slam_moe]
Student loan data from Inland Revenue {idicleanversion}.[sla_clean].[slam_ird]

Outputs

{targetdb}.{targetschema}.{projprefix}_ec_ethnicities

Variable Descriptions

Column name Description Values
snz_uid The unique STATS NZ person identifier for the the individual snz_uid
asian Indicates whether the top ranked data source identifies an individual as Asian (1,0)
chinese Indicates whether the top ranked data source identifies an individual as Chinese (1,0,NULL)
indian Indicates whether the top ranked data source identifies an individual as Indian (1,0,NULL)
other_asian Indicates whether the top ranked data source identifies an individual as Other Asian (1,0,NULL)
se_asian Indicates whether the top ranked data source identifies an individual as South East Asian (1,0,NULL)
melaa_plus Indicates whether the top ranked data source identifies an individual as MELAA+ (1,0,NULL)
middle_eastern Indicates whether the top ranked data source identifies an individual as Middle Eastern (1,0,NULL)
latin_american Indicates whether the top ranked data source identifies an individual as Latin American (1,0,NULL)
african_plus Indicates whether the top ranked data source identifies an individual as African+ (1,0,NULL)
continental_euro Indicates whether the top ranked data source identifies an individual as Continental European (1,0,NULL)
total_ethnic_comm Indicates whether the top ranked data source identifies an individual as a member of any Ethnic Communities grouping (1,0,NULL)
other_euro Indicates whether the top ranked data source identifies an individual as Other European as per MEC definition (1,0,NULL)
maori Indicates whether the top ranked data source identifies an individual as Maori (1,0)
pacific Indicates whether the top ranked data source identifies an individual as Pacific (1,0)
melaa_snz Indicates whether the top ranked data source identifies an individual as MELAA as per Stats NZ definition (1,0)
euro_snz Indicates whether the top ranked data source identifies an individual as European as per Stats NZ definition (1,0)
source_name Top ranked data source
ethnic_code_l1 Individual Level 1 ethnicity values extracted from top ranked data source, separated using a “;”
ethnic_code_l2 Individual Level 2 ethnicity values extracted from top ranked data source, separated using a “;”
ethnic_code_l3 Individual Level 3 ethnicity values extracted from top ranked data source, separated using a “;”
ethnic_code_l4 Individual Level 4 ethnicity values extracted from top ranked data source, separated using a “;”

Module Version & Change History

Date Version Comments
10 June 2025 Initial Version based on specifications from Commissioning document.
20 October 2025 Update Updated NZCVS code reflecting various changes including dropped/ modified tables and new
L4 ethnicity data made available for the first time

Code


:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
:setvar idimetaversion "{idimetaversion}"

USE $(targetdb);
GO

DROP TABLE IF EXISTS #temp_ethnicity_list;

CREATE TABLE #temp_ethnicity_list (
	snz_uid INT,
	ethnic_code_l1  VARCHAR(45),
	ethnic_code_l2  VARCHAR(45),
	ethnic_code_l3  VARCHAR(80),
	ethnic_code_l4  VARCHAR(100),
	source_name VARCHAR(12),
	group_ INT,
	rank_  INT
);

/* CENSUS 2023 - note imputed data is excluded, no null values in ethnicity columns */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT 	[snz_uid]
		, CONCAT(
		SUBSTRING([cen_ind_ethnic_output_concat],1,1),';',
		SUBSTRING([cen_ind_ethnic_output_concat],7,1),';',
		SUBSTRING([cen_ind_ethnic_output_concat],13,1),';',
		SUBSTRING([cen_ind_ethnic_output_concat],19,1),';',
		SUBSTRING([cen_ind_ethnic_output_concat],25,1),';',
		SUBSTRING([cen_ind_ethnic_output_concat],31,1)) AS ethnic_code_l1
		, CONCAT(
		SUBSTRING([cen_ind_ethnic_output_concat],1,2),';',
		SUBSTRING([cen_ind_ethnic_output_concat],7,2),';',
		SUBSTRING([cen_ind_ethnic_output_concat],13,2),';',
		SUBSTRING([cen_ind_ethnic_output_concat],19,2),';',
		SUBSTRING([cen_ind_ethnic_output_concat],25,2),';',
		SUBSTRING([cen_ind_ethnic_output_concat],31,2)) AS ethnic_code_l2
		, CONCAT(
		SUBSTRING([cen_ind_ethnic_output_concat],1,3),';',
		SUBSTRING([cen_ind_ethnic_output_concat],7,3),';',
		SUBSTRING([cen_ind_ethnic_output_concat],13,3),';',
		SUBSTRING([cen_ind_ethnic_output_concat],19,3),';',
		SUBSTRING([cen_ind_ethnic_output_concat],25,3),';',
		SUBSTRING([cen_ind_ethnic_output_concat],31,3)) AS ethnic_code_l3
		, [cen_ind_ethnic_output_concat] AS ethnic_code_l4 
		, 'Cen_23' AS source_name
		, 1 AS group_
		, 1 AS rank_
FROM $(idicleanversion).[cen_clean].[census_individual_2023]
WHERE snz_uid IS NOT NULL AND [cen_ind_ethgr_src_code] IN ('11'); 

/* CEN2018 - note imputed data is excluded */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT 	[snz_uid]
		, [cen_ind_eth_output_level1] AS ethnic_code_l1
		, [cen_ind_eth_output_level2] AS ethnic_code_l2
		, [cen_ind_eth_output_level3] AS ethnic_code_l3 
		, [cen_ind_ethgr_code] AS ethnic_code_l4
		, 'Cen_18' AS source_name
		, 1 AS group_
		, 2 AS rank_
FROM $(idicleanversion).[cen_clean].[census_individual_2018]
WHERE [snz_uid] IS NOT NULL AND [cen_ind_ethgr_impt_ind] in ('11')
;

/* CEN2013 */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT	[snz_uid]
		, CONCAT(
		SUBSTRING([cen_ind_eth_rand6_grp1_code],1,1),';',
		SUBSTRING([cen_ind_eth_rand6_grp2_code],1,1),';',
		SUBSTRING([cen_ind_eth_rand6_grp3_code],1,1),';',
		SUBSTRING([cen_ind_eth_rand6_grp4_code],1,1),';',
		SUBSTRING([cen_ind_eth_rand6_grp5_code],1,1),';',
		SUBSTRING([cen_ind_eth_rand6_grp6_code],1,1))AS ethnic_code_l1
		, CONCAT(
		SUBSTRING([cen_ind_eth_rand6_grp1_code],1,2),';',
		SUBSTRING([cen_ind_eth_rand6_grp2_code],1,2),';',
		SUBSTRING([cen_ind_eth_rand6_grp3_code],1,2),';',
		SUBSTRING([cen_ind_eth_rand6_grp4_code],1,2),';',
		SUBSTRING([cen_ind_eth_rand6_grp5_code],1,2),';',
		SUBSTRING([cen_ind_eth_rand6_grp6_code],1,2)) AS ethnic_code_l2
		, CONCAT(
		SUBSTRING([cen_ind_eth_rand6_grp1_code],1,3),';',
		SUBSTRING([cen_ind_eth_rand6_grp2_code],1,3),';',
		SUBSTRING([cen_ind_eth_rand6_grp3_code],1,3),';',
		SUBSTRING([cen_ind_eth_rand6_grp4_code],1,3),';',
		SUBSTRING([cen_ind_eth_rand6_grp5_code],1,3),';',
		SUBSTRING([cen_ind_eth_rand6_grp6_code],1,3)) AS ethnic_code_l3
		,CONCAT(
		[cen_ind_eth_rand6_grp1_code],';',
		[cen_ind_eth_rand6_grp2_code],';',
		[cen_ind_eth_rand6_grp3_code],';',
		[cen_ind_eth_rand6_grp4_code],';',
		[cen_ind_eth_rand6_grp5_code],';',
		[cen_ind_eth_rand6_grp6_code]) AS ethnic_code_l4
		, 'Cen_13' AS source_name
		, 1 AS group_
		, 3 AS rank_
FROM $(idicleanversion).[cen_clean].[census_individual_2013]
WHERE [snz_uid] IS NOT NULL;

/* DIA */

/* Birth - child */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT	[snz_uid]
		,CONCAT(
		SUBSTRING([dia_bir_ethnic1_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic2_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic3_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic4_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic5_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic6_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic7_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic8_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic9_snz_code],1,1),';',
		SUBSTRING([dia_bir_ethnic10_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([dia_bir_ethnic1_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic2_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic3_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic4_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic5_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic6_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic7_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic8_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic9_snz_code],1,2),';',
		SUBSTRING([dia_bir_ethnic10_snz_code],1,2))  AS ethnic_code_l2
		,CONCAT(
		SUBSTRING([dia_bir_ethnic1_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic2_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic3_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic4_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic5_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic6_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic7_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic8_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic9_snz_code],1,3),';',
		SUBSTRING([dia_bir_ethnic10_snz_code],1,3)) AS ethnic_code_l3
		,CONCAT(
		[dia_bir_ethnic1_snz_code],';',
		[dia_bir_ethnic2_snz_code],';',
		[dia_bir_ethnic3_snz_code],';',
		[dia_bir_ethnic4_snz_code],';',
		[dia_bir_ethnic5_snz_code],';',
		[dia_bir_ethnic6_snz_code],';',
		[dia_bir_ethnic7_snz_code],';',
		[dia_bir_ethnic8_snz_code],';',
		[dia_bir_ethnic9_snz_code],';',
		[dia_bir_ethnic10_snz_code]) AS ethnic_code_l4
		, 'DIA' AS source_name
		, 1 AS group_
		, 4 AS rank_
FROM  $(idicleanversion).[dia_clean].[births]
WHERE snz_uid IS NOT NULL AND [dia_bir_ethnic1_snz_code] IS NOT NULL;

/* Birth - parent 1 */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[parent1_snz_uid] as [snz_uid]
		,CONCAT(
		SUBSTRING([dia_bir_parent1_ethnic1_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic2_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic3_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic4_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic5_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic6_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic7_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic8_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic9_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent1_ethnic10_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([dia_bir_parent1_ethnic1_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic2_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic3_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic4_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic5_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic6_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic7_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic8_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic9_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent1_ethnic10_snz_code],1,2))  AS ethnic_code_l2
		,CONCAT(
		SUBSTRING([dia_bir_parent1_ethnic1_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic2_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic3_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic4_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic5_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic6_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic7_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic8_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic9_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent1_ethnic10_snz_code],1,3)) AS ethnic_code_l3
		,CONCAT(
		[dia_bir_parent1_ethnic1_snz_code],';',
		[dia_bir_parent1_ethnic2_snz_code],';',
		[dia_bir_parent1_ethnic3_snz_code],';',
		[dia_bir_parent1_ethnic4_snz_code],';',
		[dia_bir_parent1_ethnic5_snz_code],';',
		[dia_bir_parent1_ethnic6_snz_code],';',
		[dia_bir_parent1_ethnic7_snz_code],';',
		[dia_bir_parent1_ethnic8_snz_code],';',
		[dia_bir_parent1_ethnic9_snz_code],';',
		[dia_bir_parent1_ethnic10_snz_code]) AS ethnic_code_l4
		, 'DIA' AS source_name
		, 1 AS group_
		, 4 AS rank_
FROM  $(idicleanversion).[dia_clean].[births]
WHERE snz_uid IS NOT NULL AND [dia_bir_parent1_ethnic1_snz_code] IS NOT NULL and dia_bir_parent1_ethnic1_snz_code is not null;

/* Birth - parent 2 */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	parent2_snz_uid as [snz_uid]
		,CONCAT(
		SUBSTRING([dia_bir_parent2_ethnic1_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic2_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic3_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic4_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic5_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic6_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic7_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic8_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic9_snz_code],1,1),';',
		SUBSTRING([dia_bir_parent2_ethnic10_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([dia_bir_parent2_ethnic1_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic2_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic3_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic4_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic5_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic6_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic7_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic8_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic9_snz_code],1,2),';',
		SUBSTRING([dia_bir_parent2_ethnic10_snz_code],1,2))  AS ethnic_code_l2
		,CONCAT(
		SUBSTRING([dia_bir_parent2_ethnic1_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic2_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic3_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic4_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic5_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic6_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic7_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic8_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic9_snz_code],1,3),';',
		SUBSTRING([dia_bir_parent2_ethnic10_snz_code],1,3)) AS ethnic_code_l3
		,CONCAT(
		[dia_bir_parent2_ethnic1_snz_code],';',
		[dia_bir_parent2_ethnic2_snz_code],';',
		[dia_bir_parent2_ethnic3_snz_code],';',
		[dia_bir_parent2_ethnic4_snz_code],';',
		[dia_bir_parent2_ethnic5_snz_code],';',
		[dia_bir_parent2_ethnic6_snz_code],';',
		[dia_bir_parent2_ethnic7_snz_code],';',
		[dia_bir_parent2_ethnic8_snz_code],';',
		[dia_bir_parent2_ethnic9_snz_code],';',
		[dia_bir_parent2_ethnic10_snz_code]) AS ethnic_code_l4
		, 'DIA' AS source_name
		, 1 AS group_
		, 4 AS rank_
FROM  $(idicleanversion).[dia_clean].[births]
WHERE snz_uid IS NOT NULL AND [dia_bir_parent2_ethnic1_snz_code] IS NOT NULL and dia_bir_parent2_ethnic1_snz_code is not null;

/* DIA deaths */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT	[snz_uid]
		,CONCAT(
		SUBSTRING([dia_dth_ethnic1_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic2_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic3_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic4_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic5_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic6_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic7_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic8_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic9_snz_code],1,1),';',
		SUBSTRING([dia_dth_ethnic10_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([dia_dth_ethnic1_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic2_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic3_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic4_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic5_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic6_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic7_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic8_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic9_snz_code],1,2),';',
		SUBSTRING([dia_dth_ethnic10_snz_code],1,2))  AS ethnic_code_l2
		,CONCAT(
		SUBSTRING([dia_dth_ethnic1_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic2_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic3_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic4_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic5_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic6_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic7_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic8_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic9_snz_code],1,3),';',
		SUBSTRING([dia_dth_ethnic10_snz_code],1,3)) AS ethnic_code_l3
		,CONCAT(
		[dia_dth_ethnic1_snz_code],';',
		[dia_dth_ethnic2_snz_code],';',
		[dia_dth_ethnic3_snz_code],';',
		[dia_dth_ethnic4_snz_code],';',
		[dia_dth_ethnic5_snz_code],';',
		[dia_dth_ethnic6_snz_code],';',
		[dia_dth_ethnic7_snz_code],';',
		[dia_dth_ethnic8_snz_code],';',
		[dia_dth_ethnic9_snz_code],';',
		[dia_dth_ethnic10_snz_code]) AS ethnic_code_l4
		, 'DIA' AS source_name
		, 1 AS group_
		, 4 AS rank_
FROM  $(idicleanversion).[dia_clean].[deaths]
WHERE snz_uid IS NOT NULL AND [dia_dth_ethnic1_snz_code] IS NOT NULL; 

/* MOH */

/*MOH PHO enrollment */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING([moh_nes_ethnic1_snz_code],1,1),';',
		SUBSTRING([moh_nes_ethnic2_snz_code],1,1),';',
		SUBSTRING([moh_nes_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([moh_nes_ethnic1_snz_code],1,2),';',
		SUBSTRING([moh_nes_ethnic2_snz_code],1,2),';',
		SUBSTRING([moh_nes_ethnic3_snz_code],1,2)) AS  ethnic_code_l2
		, 'NA'AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'MOH' AS source_name
		, 1 AS group_
		, 5 AS rank_
FROM $(idicleanversion).[moh_clean].[nes_enrolment] 
WHERE snz_uid IS NOT NULL and moh_nes_ethnic1_snz_code is not null;

/* MOH NHI population cohort demographics*/

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT	distinct 
		[snz_uid]
		,CONCAT(
		SUBSTRING([moh_pop_ethnic1_snz_code],1,1),';',
		SUBSTRING([moh_pop_ethnic2_snz_code],1,1),';',
		SUBSTRING([moh_pop_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([moh_pop_ethnic1_snz_code],1,2),';',
		SUBSTRING([moh_pop_ethnic2_snz_code],1,2),';',
		SUBSTRING([moh_pop_ethnic3_snz_code],1,2)) AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'MOH' AS source_name
		, 1 AS group_
		, 5 AS rank_
FROM $(idicleanversion).[moh_clean].[pop_cohort_demographics]
WHERE snz_uid IS NOT NULL;

/* MOH CIR (Covid immunisation register) */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	b.[snz_uid]
		,CONCAT(
		SUBSTRING(cast([ethnic_code_1] AS varchar),1,1),';',
		SUBSTRING(cast([ethnic_code_2] AS varchar),1,1),';',
		SUBSTRING(cast([ethnic_code_3] AS varchar),1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING(cast([ethnic_code_1] AS varchar),1,2),';',
		SUBSTRING(cast([ethnic_code_2] AS varchar),1,2),';',
		SUBSTRING(cast([ethnic_code_3] AS varchar),1,2)) AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'MOH' AS source_name
		, 1 AS group_
		, 5 AS rank_
FROM [IDI_Adhoc].[clean_read_MOH_CIR].[moh_cir_nhi_20230519] as a
INNER JOIN $(idicleanversion).[security].[concordance] AS b
ON a.snz_moh_uid = b.snz_moh_uid
WHERE a.snz_uid IS NOT NULL and ethnic_code_1 is not null; 

/* MOE */

/* MOE STUDENT PER */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING([moe_spi_ethnic1_snz_code],1,1),';',
		SUBSTRING([moe_spi_ethnic2_snz_code],1,1),';',
		SUBSTRING([moe_spi_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([moe_spi_ethnic1_snz_code],1,2),';',
		SUBSTRING([moe_spi_ethnic2_snz_code],1,2),';',
		SUBSTRING([moe_spi_ethnic3_snz_code],1,2)) AS ethnic_code_l2
		,CONCAT(
		[moe_spi_ethnic1_snz_code],';',
		[moe_spi_ethnic1_snz_code],';',
		[moe_spi_ethnic1_snz_code]) AS ethnic_code_l3
		,'NA' AS ethnic_code_l4
		, 'MOE' AS source_name
		, 1 AS group_
		, 6 AS rank_
FROM $(idicleanversion).[moe_clean].[student_per]
WHERE snz_uid IS NOT NULL and [moe_spi_ethnic1_snz_code] is not null;

/* MOE ENROLMENTS */ 

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
	,CONCAT(
		SUBSTRING([moe_enr_ethnic1_snz_code],1,1),';',
		SUBSTRING([moe_enr_ethnic2_snz_code],1,1),';',
		SUBSTRING([moe_enr_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([moe_enr_ethnic1_snz_code],1,2),';',
		SUBSTRING([moe_enr_ethnic2_snz_code],1,2),';',
		SUBSTRING([moe_enr_ethnic3_snz_code],1,2)) AS ethnic_code_l2
		,CONCAT(
		[moe_enr_ethnic1_snz_code],';',
		[moe_enr_ethnic1_snz_code],';',
		[moe_enr_ethnic1_snz_code]) AS ethnic_code_l3
		,'NA' AS ethnic_code_l4
		, 'MOE' AS source_name
		, 1 AS group_
		, 6 AS rank_
FROM $(idicleanversion).[moe_clean].[enrolment]
WHERE snz_uid IS NOT NULL and [moe_enr_ethnic1_snz_code] is not null;

/* ACC */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING([acc_cli_ethnic1_snz_code],1,1),';',
		SUBSTRING([acc_cli_ethnic2_snz_code],1,1),';',
		SUBSTRING([acc_cli_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		[acc_cli_ethnic1_snz_code],';',
		[acc_cli_ethnic2_snz_code],';',
		[acc_cli_ethnic3_snz_code]) AS ethnic_code_l2
		,'NA' AS ethnic_code_l3
		,'NA' AS ethnic_code_l4
		, 'ACC' AS source_name
		, 1 AS group_
		, 7 AS rank_
FROM $(idicleanversion).[acc_clean].[clients]
WHERE snz_uid IS NOT NULL and acc_cli_ethnic1_snz_code is not null; 

/* MSD */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct [snz_uid]
		,CONCAT(
		SUBSTRING([msd_swn_ucvii_ethnic1_snz_code],1,1),';',
		SUBSTRING([msd_swn_ucvii_ethnic2_snz_code],1,1),';',
		SUBSTRING([msd_swn_ucvii_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		[msd_swn_ucvii_ethnic1_snz_code],';',
		[msd_swn_ucvii_ethnic2_snz_code],';',
		[msd_swn_ucvii_ethnic3_snz_code]) AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'MSD' AS source_name
		, 1 AS group_
		, 8 AS rank_
FROM $(idicleanversion).[msd_clean].[msd_swn]
WHERE snz_uid IS NOT NULL and msd_swn_ucvii_ethnic1_snz_code is not null;

/* SLA_MSD */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING([msd_sla_ethnic1_snz_code],1,1),';',
		SUBSTRING([msd_sla_ethnic2_snz_code],1,1),';',
		SUBSTRING([msd_sla_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([msd_sla_ethnic1_snz_code],1,2),';',
		SUBSTRING([msd_sla_ethnic2_snz_code],1,2),';',
		SUBSTRING([msd_sla_ethnic3_snz_code],1,2)) AS ethnic_code_l2
		,CONCAT(
		[msd_sla_ethnic1_snz_code],';',
		[msd_sla_ethnic2_snz_code],';',
		[msd_sla_ethnic3_snz_code]) AS ethnic_code_l3
		,'NA' AS ethnic_code_l4
		, 'SLA_MSD' AS source_name
		, 2 AS group_
		, 1 AS rank_
FROM $(idicleanversion).[sla_clean].[msd_borrowing]
WHERE snz_uid IS NOT NULL and msd_sla_ethnic1_snz_code is not null;

/* HES  */

/* HES_PERSON table in IDI_CLEAN */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct  [snz_uid]
		, CONCAT(
		(case when [hes_per_ethnic_grp1_snz_ind] =1 then 1 else 0 END),';',
		(case when [hes_per_ethnic_grp2_snz_ind] =1 then 2 else 0 END),';',		
		(case when [hes_per_ethnic_grp3_snz_ind] =1 then 3 else 0 END),';',
		(case when [hes_per_ethnic_grp4_snz_ind] =1 then 4 else 0 END),';',
		(case when [hes_per_ethnic_grp5_snz_ind] =1 then 5 else 0 END),';',
		(case when [hes_per_ethnic_grp6_snz_ind] =1 then 6 else 0 END)) AS ethnic_code_l1
		, 'NA' AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'HES' AS source_name
		, 2 AS group_
		, 2 AS rank_
FROM	$(idicleanversion).[hes_clean].[hes_person]
WHERE snz_uid IS NOT NULL and hes_per_ethnic_grp1_snz_ind is not null;

/* HES adhoc tables covering years 2015/16- 2023/24 */

DROP TABLE IF EXISTS #temp_hes;

CREATE TABLE #temp_hes (
	snz_hes_uid INT,
	dvethnicity  VARCHAR(45));

INSERT INTO #temp_hes (snz_hes_uid,	dvethnicity)
	SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_person_1516] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_person_1617] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_person_1718] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_person_1819] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_persondem_1819] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_persondem_1920] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_persondem_2021] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_persondem_2122] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_persondem_2223] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL
UNION
SELECT DISTINCT 
	snz_hes_uid,
	dvethnicity
FROM [IDI_ADHOC].[clean_read_HES].[hes_persondem_2324] WHERE snz_hes_uid IS NOT NULL AND dvethnicity IS NOT NULL;

UPDATE #temp_hes
SET dvethnicity = REPLACE (DVETHNICITY,CHAR(34),'');

UPDATE #temp_hes
SET dvethnicity = REPLACE (DVETHNICITY,',',';');

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct  b.[snz_uid] as snz_uid
		,a.[dvethnicity] as ethnic_code_l1
		,'NA' AS ethnic_code_l2
		,'NA' AS ethnic_code_l3
		,'NA' AS ethnic_code_l4
		,'HES' as source_name
		, 2 AS group_
		, 2 AS rank_
FROM #temp_hes AS A
	INNER JOIN $(idicleanversion).[security].[concordance] as B
	ON a.[snz_hes_uid] = b.[snz_hes_uid]; 

DROP TABLE IF EXISTS #temp_hes;

/* SOFIE */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING([sofie_wav_ethnic1_snz_code],1,1),';',
		SUBSTRING([sofie_wav_ethnic2_snz_code],1,1),';',
		SUBSTRING([sofie_wav_ethnic3_snz_code],1,1),';',
		SUBSTRING([sofie_wav_ethnic4_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		[sofie_wav_ethnic1_snz_code],';',
		[sofie_wav_ethnic2_snz_code],';',
		[sofie_wav_ethnic3_snz_code],';',
		[sofie_wav_ethnic4_snz_code]) AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'SOFIE' AS source_name
		, 2 AS group_
		, 3 AS rank_
FROM $(idicleanversion).[sofie_clean].[person_waves]
WHERE snz_uid IS NOT NULL and sofie_wav_ethnic1_snz_code is not null; 

/* LISNZ  */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		, substring([lisnz_ethnic_snz_code], 1, 1) AS ethnic_code_l1
		, substring([lisnz_ethnic_snz_code], 1, 2) AS ethnic_code_l2
		, substring([lisnz_ethnic_snz_code], 1, 3) AS ethnic_code_l3
		, [lisnz_ethnic_snz_code] AS ethnic_code_l4
		, 'LISNZ' AS source_name
		, 2 AS group_
		, 4 AS rank_
FROM $(idicleanversion).[lisnz_clean].[ethnicity]
WHERE snz_uid IS NOT NULL and  [lisnz_ethnic_snz_code] IS NOT NULL;

/* HLFS */

/* HLFF hlfs_clean.nzis */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING([hlfs_urd_ethnic_1_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_2_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_3_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_4_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_5_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_6_code],1,1),';', 
		SUBSTRING([hlfs_urd_ethnic_7_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_8_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_9_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_10_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_11_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_12_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_13_code],1,1),';',
		SUBSTRING([hlfs_urd_ethnic_14_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		SUBSTRING([hlfs_urd_ethnic_1_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_2_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_3_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_4_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_5_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_6_code],1,2),';', 
		SUBSTRING([hlfs_urd_ethnic_7_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_8_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_9_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_10_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_11_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_12_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_13_code],1,2),';',
		SUBSTRING([hlfs_urd_ethnic_14_code],1,2)) AS ethnic_code_l2
		,CONCAT(
		SUBSTRING([hlfs_urd_ethnic_1_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_2_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_3_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_4_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_5_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_6_code],1,3),';', 
		SUBSTRING([hlfs_urd_ethnic_7_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_8_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_9_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_10_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_11_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_12_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_13_code],1,3),';',
		SUBSTRING([hlfs_urd_ethnic_14_code],1,3)) AS ethnic_code_l3
		, CONCAT(
		[hlfs_urd_ethnic_1_code],';',
		[hlfs_urd_ethnic_2_code],';',
		[hlfs_urd_ethnic_3_code],';',
		[hlfs_urd_ethnic_4_code],';',
		[hlfs_urd_ethnic_5_code],';',
		[hlfs_urd_ethnic_6_code],';',
		[hlfs_urd_ethnic_7_code],';',
		[hlfs_urd_ethnic_8_code],';',
		[hlfs_urd_ethnic_9_code],';',
		[hlfs_urd_ethnic_10_code],';',
		[hlfs_urd_ethnic_11_code],';',
		[hlfs_urd_ethnic_12_code],';',
		[hlfs_urd_ethnic_13_code],';',
		[hlfs_urd_ethnic_14_code],';') AS ethnic_code_l4
		, 'HLFS' AS source_name
		, 2 AS group_
		, 5 AS rank_
FROM $(idicleanversion).[hlfs_clean].[data] 
WHERE snz_uid IS NOT NULL and hlfs_urd_ethnic_1_code is not null; 

/* HLFS hlfs_clean.nzis_pre2016  */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,[nzis_is_ethnic_code] as ethnic_code_l1
		, 'NA' AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'HLFS' AS source_name
		, 2 AS group_
		, 5 AS rank_
FROM $(idicleanversion).[hlfs_clean].[nzis_pre2016] 
WHERE snz_uid IS NOT NULL and nzis_is_ethnic_code is not null; 

/* Auckland City Mission */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		substring([acm_cli_ethnic1_snz_code],1,1),';',
		substring([acm_cli_ethnic2_snz_code],1,1),';',
		substring([acm_cli_ethnic3_snz_code],1,1)) AS ethnic_code_l1
		,CONCAT(
		[acm_cli_ethnic1_snz_code],';',
		[acm_cli_ethnic2_snz_code],';',
		[acm_cli_ethnic3_snz_code]) as ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'ACM' AS source_name
		, 2 AS group_
		, 6 AS rank_
FROM $(idicleanversion).[acm_clean].[acm_client] 
WHERE snz_uid IS NOT NULL and acm_cli_ethnic1_snz_code is not null; 

/* GSS */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT  distinct
		a.snz_uid
		,a.e1 AS ethnic_code_l1
		, CONCAT(
		SUBSTRING(a.e4,1,2),';',
		SUBSTRING(a.e4,7,2),';',
		SUBSTRING(a.e4,13,2),';',
		SUBSTRING(a.e4,19,2),';',
		SUBSTRING(a.e4,25,2),';',
		SUBSTRING(a.e4,31,2),';',
		SUBSTRING(a.e4,37,2),';',
		SUBSTRING(a.e4,43,2)) AS ethnic_code_l2
		, CONCAT(
		SUBSTRING(a.e4,1,3),';',
		SUBSTRING(a.e4,7,3),';',
		SUBSTRING(a.e4,13,3),';',
		SUBSTRING(a.e4,19,3),';',
		SUBSTRING(a.e4,25,3),';',
		SUBSTRING(a.e4,31,3),';',
		SUBSTRING(a.e4,37,3),';',
		SUBSTRING(a.e4,43,3)) AS ethnic_code_l3, 
		e4 as ethnic_code_l4, 
		'GSS' AS source_name
		, 2 AS group_
		, 7 AS rank_
FROM	(select distinct
		snz_uid,
		replace([gss_id_dvethnic],', ',';') as e1,  
		replace([gss_id_dvethnic_code],', ',';') as e4,
		gss_id_dvethnic_code
		from $(idicleanversion).[gss_clean].[gss_identity] 
		WHERE snz_uid IS NOT NULL and gss_id_dvethnic is not null ) as a;

/* PIAAC */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING(CAST([piaac_ethmention1_code] as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST([piaac_ethmention2_code] as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST([piaac_ethmention3_code] as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST([piaac_ethmention4_code] as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST([piaac_ethmention5_code] as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST([piaac_ethmention6_code] as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST([piaac_ethmention7_code] as VARCHAR(10)),1,1))  AS ethnic_code_l1
		, CONCAT(
		SUBSTRING(CAST([piaac_ethmention1_code] as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST([piaac_ethmention2_code] as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST([piaac_ethmention3_code] as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST([piaac_ethmention4_code] as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST([piaac_ethmention5_code] as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST([piaac_ethmention6_code] as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST([piaac_ethmention7_code] as VARCHAR(10)),1,2))  AS ethnic_code_l2
		, CONCAT(
		SUBSTRING(CAST([piaac_ethmention1_code] as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST([piaac_ethmention2_code] as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST([piaac_ethmention3_code] as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST([piaac_ethmention4_code] as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST([piaac_ethmention5_code] as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST([piaac_ethmention6_code] as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST([piaac_ethmention7_code] as VARCHAR(10)),1,3)) AS ethnic_code_l3
		, CONCAT(
		[piaac_ethmention1_code],';',
		[piaac_ethmention2_code],';',
		[piaac_ethmention3_code],';',
		[piaac_ethmention4_code],';',
		[piaac_ethmention5_code],';',
		[piaac_ethmention6_code],';',
		[piaac_ethmention7_code]) AS ethnic_code_l4
		, 'PIAAC' AS source_name
		, 2 AS group_
		, 8 AS rank_
FROM $(idicleanversion).[piaac_clean].[data]
WHERE snz_uid IS NOT NULL and piaac_ethmention1_code is not null;

/* NZCVS - see three sources now - core , incident and module, note we obtain Indian and Chinese ethnicity data, 'Other Asian' was judged to be ambiguous so not used */

/* Core*/

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,CONCAT(
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE1 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE2 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE3 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE4 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE5 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE6 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE7 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE8 as VARCHAR(10)),1,1),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE9 as VARCHAR(10)),1,1),';',
		(CASE WHEN nzcvs_core_ethnic_grp1_snz_ind =1 then 1 else 0 end),';',
		(CASE WHEN nzcvs_core_ethnic_grp2_snz_ind =1 then 2 else 0 end),';',
		(CASE WHEN nzcvs_core_ethnic_grp3_snz_ind =1 then 3 else 0 end),';',
		(CASE WHEN nzcvs_core_ethnic_grp4_snz_ind =1 then 4 else 0 end),';',
		(CASE WHEN nzcvs_core_ethnic_grp5_snz_ind =1 then 5 else 0 end),';',
		(CASE WHEN nzcvs_core_ethnic_grp6_snz_ind =1 then 6 else 0 end),';',
		(CASE WHEN nzcvs_core_DVETHN_CHINESE =1 then 4 else 0 end),';',
		(CASE WHEN nzcvs_core_DVETHN_INDIAN  =1 then 4 else 0 end))
		AS ethnic_code_l1
		, CONCAT(
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE1 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE2 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE3 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE4 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE5 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE6 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE7 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE8 as VARCHAR(10)),1,2),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE9 as VARCHAR(10)),1,2),';',
		(CASE WHEN nzcvs_core_DVETHN_CHINESE =1 then 42 else 0 end),';',
		(CASE WHEN nzcvs_core_DVETHN_INDIAN  =1 then 43 else 0 end))
		 AS ethnic_code_l2
		,CONCAT(
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE1 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE2 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE3 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE4 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE5 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE6 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE7 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE8 as VARCHAR(10)),1,3),';',
		SUBSTRING(CAST(nzcvs_core_ETHNICCODE9 as VARCHAR(10)),1,3))
		AS ethnic_code_l3
		,CONCAT(
		nzcvs_core_ETHNICCODE1,';',
		nzcvs_core_ETHNICCODE2,';',
		nzcvs_core_ETHNICCODE3,';',
		nzcvs_core_ETHNICCODE4,';',
		nzcvs_core_ETHNICCODE5,';',
		nzcvs_core_ETHNICCODE6,';',
		nzcvs_core_ETHNICCODE7,';',
		nzcvs_core_ETHNICCODE8,';',
		nzcvs_core_ETHNICCODE9)
		AS ethnic_code_l4
		, 'NZCVS' AS source_name
		, 2 AS group_
		, 9 AS rank_
FROM  $(idicleanversion).[nzcvs_clean].[core]
WHERE snz_uid IS NOT NULL and nzcvs_core_ethnic_grp1_snz_ind is not null;

/* WFF */

/*primary caregiver */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		, [wff_spe_l_ethnic_snz_code] AS ethnic_code_l1
		, 'NA' AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'WFF' AS source_name
		, 3 AS group_
		, 1 AS rank_
FROM	$(idicleanversion).[wff_clean].[spells]
WHERE	snz_uid IS NOT NULL and wff_spe_l_ethnic_snz_code is not null; 

/* partner */
INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[partner_snz_uid] as snz_uid
		, [wff_spe_r_ethnic_snz_code] AS ethnic_code_l1
		, 'NA' AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'WFF' AS source_name
		, 3 AS group_
		, 1 AS rank_
FROM	$(idicleanversion).[wff_clean].[spells]
WHERE	partner_snz_uid IS NOT NULL and wff_spe_r_ethnic_snz_code is not null;

/* SLA_SLAM_MOE - note only one l1 ethnicity response is recorded per snz_uid */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		, [slam_moe_ethnic_snz_code] AS ethnic_code_l1
		, 'NA' AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'SLA_SLAM_MOE' AS source_name
		, 3 AS group_
		, 2 AS rank_
FROM	$(idicleanversion).[sla_clean].[slam_moe]
WHERE snz_uid IS NOT NULL and slam_moe_ethnic_snz_code is not null;

/* SLA_SLAM_IRD  */

INSERT INTO #temp_ethnicity_list (snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_ , rank_)
SELECT distinct	[snz_uid]
		,SUBSTRING ([slam_ird_ethnic_snz_code], 1,1 ) AS ethnic_code_l1
		, [slam_ird_ethnic_snz_code] AS ethnic_code_l2
		, 'NA' AS ethnic_code_l3
		, 'NA' AS ethnic_code_l4
		, 'SLA_SLAM_IR' AS source_name
		, 3 AS group_
		, 3 AS rank_
FROM $(idicleanversion).[sla_clean].[slam_ird]
WHERE snz_uid IS NOT NULL and slam_ird_ethnic_snz_code is not null; 

/* Process temp_ethnicity_list table */

/* Drop rows which don't have valid l1 ethnicity in l1  */

delete from #temp_ethnicity_list 
WHERE		iif	(ethnic_code_l1 like '%1%' 
			or ethnic_code_l1 like '%2%' 
			or ethnic_code_l1 like '%3%' 
			or ethnic_code_l1 like '%4%' 
			or ethnic_code_l1 like '%5%' 
			or ethnic_code_l1 like '%6%'
			,1,0)=0;

/* modify table to add EC variables */
drop table if exists #temp_ethnicity_listA;
create table  #temp_ethnicity_listA 
		(
		snz_uid INT,
		ethnic_code_l1  VARCHAR(45),
		ethnic_code_l2  VARCHAR(45),
		ethnic_code_l3  VARCHAR(80),
		ethnic_code_l4  VARCHAR(100),
		source_name VARCHAR(12),
		group_ INT,
		rank_  INT,
		final_rank int,
		asian int,
		se_asian int,
		chinese int,
		indian int,
		other_asian int,
		continental_euro int,
		african_plus int,
		middle_eastern int,
		latin_american int,
		melaa_plus int,
		other_euro int,
		maori int,
		pacific int,
		euro_snz int,
		melaa_snz int,
		total_ethnic_comm int
		);

GO

INSERT INTO #temp_ethnicity_listA  
		(snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name, group_, rank_)
SELECT
		snz_uid, ethnic_code_l1, ethnic_code_l2, ethnic_code_l3, ethnic_code_l4, source_name,	group_,	rank_
FROM	#temp_ethnicity_list;

drop table if exists #temp_ethnicity_list;

update #temp_ethnicity_listA 
set		asian =			IIF	(ethnic_code_l1 like '%4%', 1, 0),
		se_asian =		IIF	(ethnic_code_l2 like '%41%', 1, 0),
		chinese =		IIF	(ethnic_code_l2 like '%42%', 1, 0),
		indian =		IIF	(ethnic_code_l2 like '%43%', 1, 0),
		other_asian=	IIF	(ethnic_code_l3 like '%444%' or ethnic_code_l2 like '%40%', 1, 0),
		
		continental_euro = IIF  
			(ethnic_code_l3		like '%122%'
			or ethnic_code_l3	like '%123%'
			or ethnic_code_l3	like '%124%'
			or ethnic_code_l3	like '%125%'
			or ethnic_code_l3	like '%126%'
			or ethnic_code_l3	like '%127%'
			or ethnic_code_l4	like '%12911%'
			or ethnic_code_l4	like '%12912%'
			or ethnic_code_l4	like '%12913%'
			or ethnic_code_l4	like '%12914%'
			or ethnic_code_l4	like '%12915%'
			or ethnic_code_l4	like '%12916%'
			or ethnic_code_l4	like '%12918%'
			or ethnic_code_l4	like '%12919%'
			or ethnic_code_l4	like '%12920%'
			or ethnic_code_l4	like '%12921%'
			or ethnic_code_l4	like '%12922%'
			or ethnic_code_l4	like '%12923%'
			or ethnic_code_l4	like '%12924%'
			or ethnic_code_l4	like '%12926%'
			or ethnic_code_l4	like '%12927%'
			or ethnic_code_l4	like '%12928%'
			or ethnic_code_l4	like '%12929%'
			or ethnic_code_l4	like '%12930%'
			or ethnic_code_l4	like '%12931%'
			or ethnic_code_l4	like '%12932%'
			or ethnic_code_l4	like '%12933%'
			or ethnic_code_l4	like '%12934%'
			or ethnic_code_l4	like '%12935%'
			or ethnic_code_l4	like '%12937%'
			or ethnic_code_l4	like '%12938%'
			or ethnic_code_l4	like '%12939%'
			or ethnic_code_l4	like '%12940%'
			or ethnic_code_l4	like '%12941%'
			or ethnic_code_l4	like '%12942%', 1 , 0),
			
		middle_Eastern=	IIF (ethnic_code_l2 like '%51%', 1, 0), 
		latin_American=	IIF (ethnic_code_l2 like '%52%', 1, 0),
		african_plus = IIF (ethnic_code_l2 like '%53%' 
			 or ethnic_code_l4 like '%61115%'
			 or ethnic_code_l4 like '%61116%'
			 or ethnic_code_l4 like '%61117%' 
			 or ethnic_code_l4 like '%12948%'
			 or ethnic_code_l4 like '%12949%'
			 or ethnic_code_l4 like '%12950%', 1, 0),

		melaa_plus = IIF(ethnic_code_l1 like '%5%'
				or ethnic_code_l4 like '%61115%'
				or ethnic_code_l4 like '%61116%'
				or ethnic_code_l4 like '%61117%' 
				or ethnic_code_l4 like '%12948%'
				or ethnic_code_l4 like '%12949%'
				or ethnic_code_l4 like '%12950%', 1,0),
	
		other_euro  =	 IIF  
			(ethnic_code_l2		like '%10%'
			or ethnic_code_l2	like '%11%'
			or ethnic_code_l3	like '%121%'
			or ethnic_code_l3	like '%128%'
			or ethnic_code_l4	like '%12943%'
			or ethnic_code_l4	like '%12945%'
			or ethnic_code_l4	like '%12947%'
			or ethnic_code_l4	like '%12999%', 1 , 0),

		maori     =	IIF	(ethnic_code_l1 like '%2%', 1, 0),
		pacific   =	IIF	(ethnic_code_l1 like '%3%', 1, 0),
		melaa_snz = IIF (ethnic_code_l1 like '%5%', 1, 0),
		euro_snz  = IIF (ethnic_code_l1 like '%1%', 1, 0)
		;

/* Collapse multiple rows from same source */

DROP TABLE IF EXISTS #temp_ethnicity_list2;
  
select	snz_uid,
		max(asian) as asian,
		max(se_asian) as se_asian,
		max(chinese) as chinese,
		max(indian) as indian,
		max(other_asian) as other_asian,
		max(continental_euro) as continental_euro,
		max(middle_eastern) as middle_eastern,
		max(latin_american) as latin_american,
		max(african_plus) as african_plus,
		max(melaa_plus) as melaa_plus,
		max(other_euro) as other_euro,
		max(maori) as maori,
		max(pacific) as pacific,
		max(melaa_snz) as melaa_snz,
		max(euro_snz) as euro_snz,
		source_name,
		group_ , 
		rank_,
		final_rank,
		total_ethnic_comm,
		string_agg(ethnic_code_l1, ';') as ethnic_code_l1,
		string_agg(ethnic_code_l2, ';') as ethnic_code_l2,
		string_agg(ethnic_code_l3, ';') as ethnic_code_l3,
		string_agg(ethnic_code_l4, ';') as ethnic_code_l4

into #temp_ethnicity_list2
from #temp_ethnicity_listA 
	group by snz_uid, source_name, group_, rank_, final_rank, total_ethnic_comm;  

/* Identify top ranked row for each snz_uid */

with ranked as (select
				*,
				Row_number () OVER (PARTITION BY snz_uid ORDER BY group_ ASC, rank_ ASC) AS rankval
				FROM #temp_ethnicity_list2 
				)
update b
set final_rank= r.rankval
FROM #temp_ethnicity_list2 as b 
JOIN ranked as  r on b.snz_uid = r.snz_uid and b.group_ = r.group_ and r.rank_ = b.rank_;

/* Make final table "ec_ethnicity" by only keeping top row */

DROP TABLE IF EXISTS #EC_ethnicity;

select
*
--into [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
into #EC_ethnicity
FROM #temp_ethnicity_list2
where final_rank=1;

/* update ethnicity variables to NULL where there is incomplete information */

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set chinese = NULL where chinese = 0 and ethnic_code_l2 = 'NA' and ethnic_code_l1 like '%4%' ;

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set indian = NULL where indian = 0 and ethnic_code_l2 = 'NA' and ethnic_code_l1 like '%4%';

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set se_asian = NULL where se_asian = 0 and ethnic_code_l2 = 'NA' and ethnic_code_l1 like '%4%';

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set	other_asian = NULL where other_asian = 0 and (ethnic_code_l2 = 'NA' and ethnic_code_l1 like '%4%') 
											 or  (ethnic_code_l3 = 'NA' and (ethnic_code_l2 like '%44%'));

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set middle_eastern = NULL where middle_eastern = 0 and ethnic_code_l2 = 'NA' and ethnic_code_l1 like '%5%';

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set latin_american = NULL where latin_american = 0 and ethnic_code_l2 = 'NA' and ethnic_code_l1 like '%5%';

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set  continental_euro = NULL
where continental_euro = 0 and ((ethnic_code_l1 like '%1%' and ethnic_code_l2 ='NA') or (ethnic_code_l2 like '%12%' and ethnic_code_l3 ='NA') or (ethnic_code_l3 like '%129%' and ethnic_code_l4 ='NA'));

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set  other_euro = NULL
where other_euro = 0 and ((ethnic_code_l1 like '%1%' and ethnic_code_l2 ='NA') or (ethnic_code_l2 like '%12%' and ethnic_code_l3 ='NA') or (ethnic_code_l3 like '%129%' and ethnic_code_l4 ='NA'));

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set african_plus = NULL
where african_plus = 0 and (((ethnic_code_l1 like '%1%' or ethnic_code_l1 like '%6%') and ethnic_code_l2 = 'NA') or
						 ((ethnic_code_l2 like '%12%' or ethnic_code_l1 like '%61%') and ethnic_code_l3 = 'NA') or
						 ((ethnic_code_l3 like '%129%' or ethnic_code_l1 like '%611%') and ethnic_code_l4 = 'NA'));

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set melaa_plus = NULL 
where melaa_plus = 0 and (((ethnic_code_l1 like '%1%' or ethnic_code_l1 like '%6%') and ethnic_code_l2 = 'NA') or
						 ((ethnic_code_l2 like '%12%' or ethnic_code_l1 like '%61%') and ethnic_code_l3 = 'NA') or
						 ((ethnic_code_l3 like '%129%' or ethnic_code_l1 like '%611%') and ethnic_code_l4 = 'NA'));

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set total_ethnic_comm = CASE WHEN melaa_plus =1 or continental_euro =1 or asian = 1 THEN 1
						ELSE 0 END;

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set total_ethnic_comm = NULL where melaa_plus is NULL and asian = 0 and continental_euro is NULL;

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set total_ethnic_comm = NULL where melaa_plus is NULL and asian = 0 and continental_euro = 0;

--update [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity]
update #EC_ethnicity
set total_ethnic_comm = NULL where melaa_plus = 0 and asian = 0 and continental_euro is NULL;

/* select distinct total_ethnic_comm, melaa_plus, asian, continental_euro from [IDI_Sandpit].[DL-MAA2020-47].[EC_ethnicity] */
/* select distinct total_ethnic_comm, melaa_plus, asian, continental_euro from #EC_ethnicity */

/*clean up l1-l4 columns*/

/* code below removes excess ';'s and double ups within a column for ethicity l1-l4 */

/* L1 */				
WITH cleanvals AS (
SELECT snz_uid, string_agg(VALUE, ';') as cleaned
FROM ( 
SELECT  DISTINCT a.snz_uid,
		TRIM(value) as value
		FROM	#EC_ethnicity a
CROSS APPLY STRING_SPLIT(ethnic_code_l1, ';') as split
	where TRIM(split.value) <>'' 
		 ) as filtered
GROUP BY snz_uid
	 )
UPDATE a 
SET a.ethnic_code_l1 = b.cleaned
FROM #EC_ethnicity a
JOIN cleanvals b  on a.snz_uid = b.snz_uid;

/* L2 */
WITH cleanvals AS (
SELECT snz_uid, string_agg(VALUE, ';') as cleaned
FROM ( 
SELECT  DISTINCT a.snz_uid,
		TRIM(value) as value
		FROM	#EC_ethnicity a
CROSS APPLY STRING_SPLIT(ethnic_code_l2, ';') as split
	where TRIM(split.value) <>'' 
		 ) as filtered
GROUP BY snz_uid
	 )
UPDATE a 
SET a.ethnic_code_l2 = b.cleaned
FROM #EC_ethnicity a
JOIN cleanvals b  on a.snz_uid = b.snz_uid;

/* L3 */
WITH cleanvals AS (
SELECT snz_uid, string_agg(VALUE, ';') as cleaned
FROM ( 
SELECT  DISTINCT a.snz_uid,
		TRIM(value) as value
		FROM	#EC_ethnicity a
CROSS APPLY STRING_SPLIT(ethnic_code_l3, ';') as split
	where TRIM(split.value) <>'' 
		 ) as filtered
GROUP BY snz_uid
	 )
UPDATE a 
SET a.ethnic_code_l3 = b.cleaned
FROM #EC_ethnicity a
JOIN cleanvals b  on a.snz_uid = b.snz_uid;

/* L4 */
WITH cleanvals AS (
SELECT snz_uid, string_agg(VALUE, ';') as cleaned
FROM ( 
SELECT  DISTINCT a.snz_uid,
		TRIM(value) as value
		FROM	#EC_ethnicity a
CROSS APPLY STRING_SPLIT(ethnic_code_l4, ';') as split
	where TRIM(split.value) <>'' 
		 ) as filtered
GROUP BY snz_uid
	 )
UPDATE a 
SET a.ethnic_code_l4 = b.cleaned
FROM #EC_ethnicity a
JOIN cleanvals b  on a.snz_uid = b.snz_uid;

/* drop temporary tables */

DROP TABLE IF EXISTS #temp_ethnicity_listA;
DROP TABLE IF EXISTS #temp_ethnicity_list2;

/* Final select statement */

drop table if exists $(targetdb).$(targetschema).$(projprefix)_ec_communities;

SELECT 
		snz_uid,
		source_name,
		asian,
		se_asian,
		chinese,
		indian,
		other_asian,
		continental_euro,
		african_plus,
		middle_eastern,
		latin_american,
		melaa_plus,
		total_ethnic_comm,
		other_euro,
		maori,
		pacific,
		euro_snz,
		melaa_snz,
		ethnic_code_l1,
		ethnic_code_l2,
		ethnic_code_l3,
		ethnic_code_l4

INTO $(targetdb).$(targetschema).$(projprefix)_ec_communities

FROM #EC_ethnicity

;