2. HLFS Data Tools User Guide

alexandra.ferguson
16 March 2023

This post is intended to give an understanding of how to access and set up the HLFS tools library for use on HLFS data.

Pre-requisites for using the HLFS tools

  1. You should have access to SQL Server Management Console in the IDI. This is a tool that enables you to directly run queries and stored procedures on the SQL server database that holds the IDI data. You can verify this by logging into an IDI session, and checking through the tools available in the Start Menu and locating SQL Server Tools. This tool is usually automatically made available to IDI researchers.

  2. To use the HLFS tools, the primary pre-requisite is that you have access to the HLFS datasets in the IDI. You would need to request access to these datasets when you submit your request for a research project in the IDI, or else submit a variation request if you already have an IDI project that you need HLFS data to be part of.

  3. Assuming you already have access to access and use HLFS data (i.e., you can access datasets under IDI_Clean_YYYYMM.hlfs_clean schema), you should additionaly also have access to the IDI_UserCode database in the IDI SQL server (prtprdsql36) - this is usually granted automatically to IDI research projects. You can verify access by logging into SQL Server Management Console (instructions to do this are available in the IDI Wiki), and checking the databases available to you under the database list viewer on the left side of the SQL Server Management Console window.

How can I obtain access to the HLFS Summarisation tools?

The HLFS summarisation tools are a collection of SQL programs that enable you to easily create estimates (like weighted counts, means, rates, comparisons between HLFS quarters, etc.) and confidence intervals without having an intimate understanding of how to use replicate weights or the jackknife method of creating estimates. And for those who already know how this works - this collection of programs saves you the effort of doing it in an ad-hoc fashion.

The tools can either be accessed from ID Commons directly (links to these tools are available in the HLFS landing page, OR these can be accessed from within the IDI.

You will need to install these SQL programs on IDI_UserCode database before you use these. Details on how these should be installed are described in subsequent sections.

How do I install these tools?

Assuming you’ve checked and validated that all the pre-requisites are satisfied, use the following steps to install the tools for your use -

  1. Copy over the HLFS summarisation tools folder into a suitable location in your IDI project folder.

  2. Open SQL Server Management Console, and log into the SQL server.

  3. From the HLFS summarisation tools folder, open the hlfs_data_preparation.sql file in SQL Server Management Console. This script is meant to take the hlfs_clean.data from the IDI_Clean version of your choice, and apply some cleaning rules to this dataset and prepare it for use by the HLFS summarisation tools.

  4. From the Query Menu at the top of the SQL Server Management Studio, scroll down to SQLCMD Mode, and click on it to enable it.

  5. After the documentation section of the code file, at the beginning of the code section, you’ll notice 4 lines of code that look like this -

:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
:setvar idicleanversion "{idicleanversion}"
  1. Now, select everything (Ctrl + A) and click Execute (or press F5) to run the script end-to-end. Note: If you get the following error: CREATE VIEW must be the first statement in a query batch, then comment out the statements use $(targetdb) and DROP VIEW IF EXISTS $(targetschema).$(projprefix)_hlfs_data and click Execute (or press F5) to run the script again.

  2. At the end of execution, a new database object will be created under IDI_UserCode.[DL-MAAXX-XX], by the name as specified by you - <projprefix>_hlfs_data. You can verify this by right-clicking IDI_UserCode database in the viewer and clicking refresh. Then navigate to IDI_UserCode → Views and checking for the object name as specified. This database object holds the cleaned HLFS data that you can use for generating your summaries.

  3. Now, from the HLFS summarisation tools folder, open the specific summarisation procedure you want to run. The details regarding each summarisation procedure can be found from the links provided in the HLFS landing page. For instance, the SELevel procedure creates weighted population counts by a set of grouping variables and the confidence intervals around this estimate as generated from the replicate weights.

  4. After the SE stored procedure file is opened, you’ll notice 3 lines of code that look like this (just like step 5) -

:setvar targetdb "{targetdb}"
:setvar targetschema "{targetschema}"
:setvar projprefix "{projprefix}"
  1. Select everything (Ctrl + A) and click Execute (or press F5) to run the code end-to-end. This will create the stored procedure in the database (for example - IDI_UserCode.[DL-MAA20XX-XX].SELevel). You can check that this is created, by You can verify this by right-clicking IDI_UserCode database in the viewer and clicking refresh. Then navigate to IDI_UserCode → Programmability → Stored Procedures and checking for the object name as specified.

  2. Repeat steps 7 to 9 for all the summarisation tools you want to install and use. Once you’re done, you are now ready to start using these tools to generate summaries from HLFS data.

How do I use the HLFS Summarisation tools?

From SQL Server

Assuming you’ve successfully installed the tools by using the directions provided above, now you are ready to start using the tools for creating summaries. Each summarisation tool creates a different kind of summary , and accepts a different set of parameters as input. Detailed information on what summaries are produced can be understood from the tool-specific pages (these links are available in the landing page).

In general, the syntax for calling the summarisation tool is as follows.

EXEC [DL-MAA20xx-xx].<procedurename> <parameter1>, <parameter2>, ...

The functionality is best explained with an example. For instance, assume that you have already run the hlfs_data.sql script and created a cleaned HLFS dataset with the name [DL-MAA2020-47].test_hlfs_data and also installed the SEMean.sql stored procedure under your project schema [DL-MAA2020-47], with the name test_SEMean. If you wanted to obtain the average of total working hours per week (hlfs_urd_usual_hr_tot_nbr) by region(hlfs_urd_region_code) and sex(hlfs_urd_sex_code) for the HLFS quarter 145, you would use the following code snippet. Note that the exact parameters to be supplied vary for different HLFS summarisation tools.

USE IDI_UserCode
GO

EXEC [DL-MAA2020-47].SELevel [DL-MAA2020-47].test_SEMean 
       '[DL-MAA2020-47].[test_hlfs_data]',
       'hlfs_urd_region_code, hlfs_urd_sex_code',
       'where hlfs_urd_quarter_nbr = 145',
       'hlfs_urd_usual_hr_tot_nbr'

More detailed examples are provided under the linked pages for each summarisation tool for your reference.

From SAS

/* If you're running this from SAS.*/
proc sql;
    connect to odbc(dsn=IDI_UserCode_srvprd);
    create table work.mysummarytable as
    select * from connection to odbc(
        EXEC [DL-MAA2020-47].SELevel [DL-MAA2020-47].test_SEMean 
       '[DL-MAA2020-47].[test_hlfs_data]',
       'hlfs_urd_region_code, hlfs_urd_sex_code',
       'where hlfs_urd_quarter_nbr = 145',
       'hlfs_urd_usual_hr_tot_nbr'
     );

    disconnect from odbc;
quit;

From R

# If you plan on running this from R.
require(DBI)

conn <- DBI::dbConnect(odbc::odbc()
                      ,Driver = <SQL Driver name>
                      ,Server = <IDI Server name>
                      ,Database= "IDI UserCode"
                      ,Trusted_Connection="Yes"
                      ,ConnectionPooling=FALSE)

res <- DBI::dbGetQuery(conn, "EXEC [DL-MAA2020-47].SELevel [DL-MAA2020-47].test_SEMean 
       '[DL-MAA2020-47].[test_hlfs_data]',
       'hlfs_urd_region_code, hlfs_urd_sex_code',
       'where hlfs_urd_quarter_nbr = 145',
       'hlfs_urd_usual_hr_tot_nbr'")

print(res)

What outputs can I expect from the HLFS Summarisation tools?

Each tool serves a different purpose, and produces a different summary. You can create weighted counts, rates, means, or even create differences between specific HLFS quarters. The output table also contains subtotals and grand totals by grouping columns that you supply as input. More details regarding outputs produced by each procedure can be found on the specifc pages dedicated to each tool (available on the HLFS landing page).

How will this data compare with published estimates?

What kind of sample errors does the HLFS data tool produce?

V( \hat{y}) = \frac{(G-1)}{G} \sum_{g=1}^G(\hat{y}_{(g)} - \hat{y})^2

where G is the total number of groups, g the replicate group, \hat{y} the final weight and \hat{y}_{(g)} the replicate weight . For more details, see Household Labour Force Survey sources and methods: 2025

How should sample errors be used or expressed in text?

How should you output data produced using this tool?