Small Area Estimation - State Poverty Rate Model Research Data Files:

  This zip file contains input files for the CPS equations of the SAIPE state 
poverty rate models for income years (IYs) 1989-2005 (omitting 1994, a year 
for which no SAIPE estimates were produced). These are intended for research, 
particularly when multiple years of data are of interest. The data is mostly 
the data used in SAIPE production, though some minor later revisions may have 
occurred in a few variables in a few years. Note also that for IY 2005 the 
CPS data were replaced in production by ACS data, and so the 2005 CPS 
equation models were run only for comparisons.

  One difference from the production data concerns the variable fnlse, the 
final sampling std. errors obtained by iteratively updating the original gvf 
std. errors using results from REML estimation of the model. The fnlse values 
don't always agree with the sampling std. errors used in production because 
the values included here were obtained from iterative updates that used REML 
estimation of a common model across most years. (This model used census 
residuals as a regressor, while production used census poverty estimates as 
a regressor in some years. It also included the food stamp regressor for all 
years except 1998 and 1999, although the food stamp variable was dropped from 
the production state models in 1998 and not reinstituted until 2004.) This 
use of a "common model" facilitated the production of the fnlse values given 
here, and was thought to be more desirable for research purposes than was 
using the actual production sampling std. errors in all cases.

  The files cpsxxp.txt and cpsxxt.txt contain data for models for poverty rates 
of ages 0-4, 5-17, 18-64, and 65+. There are two versions for age 5-17. One 
(in the cpsxxp.txt files) is for 5-17 year old children related in families. 
The other (in the cpsxxt.txt files) is for total 5-17 year old children. Data 
for age 0-4 is for total children age 0-4, and so is contained in the 
cpsxxt.txt files. For many of the years the same 0-4 data is replicated in the 
cpsxxp.txt files, but for other years some or all of the 0-4 columns in the 
cpsxxp.txt files are zeroes. For 18-64 and 65+ there is no distinction between 
total and related, and these data are available in both the cpsxxp.txt and 
cpsxxt.txt files. 

  In each file, and for each age group, the following data columns are included 
(where xx denotes the Income Year):

     cpsxx = direct CPS estimated poverty rate
   irsprxx = pseudo-poverty rate tabulated from IRS tax data
   irsnfxx = tax nonfiler rate
      fsxx = food stamp participation proportion
     cpsse = direct estimated std. error of cpsxx
     gvfse = initial GVF estimate of std. error of cpsxx
     fnlse = final GVF estimate of std. error of cpsxx after it is updated 
               iteratively in conjunction with REML estimation of the model.

For age 65+ the SSI participation rate variable (SSIxx) replaces fsxx. 
The last column in each file contains:

   smpsize = CPS sample size (number of interviewed households).

  The file read-cpspov-files.r is an R program that will read the CPS 
cpsxxp.txt or cpsxxt.txt files for a specified set of years and a specified 
age group, and store the data in suitable arrays. See the program for details.


Census Poverty Rate Equation Data Files:

  Data input files are also included for the census poverty rate equations. 
These files are cen89pov.txt (for estimates from the 1990 census for IY 1989) 
and cen99pov.txt (for estimates from the 2000 census for IY 1999). Variables 
are analogous to the first four from the CPS equation files; the final three 
are omitted since sampling error in the census long form estimates was 
negligible at the state level. The file cen89pov.txt contains data for modeling 
poverty rates of age 0-4 total, age 5-17 related, age 18-64, and age 65+. 
Production modeling used the age 5-17 related estimates, or corresponding 
"census residuals" in both the CPS 5-17 related and 5-17 total models. The 
file cen99pov.txt contains data for both age 5-17 related and age 5-17 total 
(and so has 4 more columns than does cen89pov.txt). When the 2000 census 
estimates or residuals were used in the modeling, we made the distinction 
between 5-17 related and 5-17 total.

  The files cen89res.txt and cen99res.txt contain the "census residuals" (the 
residuals from fitting the census poverty rate regression equations). Thus, 
if the interest is in the CPS equation, these can just be read in and there is 
no need to read in the census equation data and refit the census equation. The 
first of the files contains the residuals for 0-4t, 5-17r, 18-64, and 65+, 
while the second contains the residuals for 0-4t, 5-17r, 5-17t, 18-64, and 65+. 
This is analogous to the files cen89pov.txt and cen99pov.txt.


File of State Population Estimates:

  The files poptot0_4.txt, poptot5_17.txt, poptot18_64.txt, and poptot65+.txt 
are ASCII text files of state population estimates for the four respective 
age groups. For 1990-1999 these are post-censal estimates, meaning that they 
were constructed started with 1990 census counts (which refer to April 1, 1990) 
and updating them demographically (with birth, death, and migration data and 
estimates) through the decade. They were not modified to account for the 
Census 2000 counts, so that the transition from 1999 to 2000 can be larger 
than it would be for corresponding inter-censal population estimates, for 
which such modifications are made to smooth the transition to the next 
census results. The 2000-2005 population estimates are inter-censal estimates 
that were constructed starting from the Census 2000 results, but also taking 
into account the 2010 census results.

  The post-censal pop estimates are closer to what is used in SAIPE production, 
because for production the inter-censal pop estimates are not yet available. 
Also note that the 5-17 pop estimates included here are for the total 5-17 
population, not for the 5-17 related in families population. See the SAIPE 
web site for a discussion of the distinction.

  The pop estimates were obtained from the following websites:
https://www.census.gov/data/tables/time-series/demo/popest/1980s-state.html
https://www2.census.gov/programs-surveys/popest/tables/1990-2000/state/asrh/ 
https://www2.census.gov/programs-surveys/popest/datasets/2000-2010/intercensal/state/st-est00int-agesex.csv

  For further information on the SAIPE production models, input data, and 
estimation and prediction procedures and results at all geographic levels 
(state, county, and school district) see the SAIPE web site at 
https://www.census.gov/did/www/saipe/ or the following article:

  Bell, William R., Basel, Wesley W., and Maples, Jerry J. (2016), 
    "An Overview of the U.S. Census Bureau's Small Area Income and Poverty 
    Estimates Program," Chapter 19 in Analysis of Poverty Data by Small Area 
    Estimation, ed. Monica Pratesi, Wiley, pp. 349-378.


William R. Bell                          Carolina Franco
Associate Directorate for                Center for Statistical Research 
  Research and Methodology                 and Methods
U.S. Census Bureau                       U.S. Census Bureau
William.R.Bell@census.gov                Carolina.Franco@census.gov

First version: July 31, 2013
Revised: September 15, 2015; August 23, 2017; September 19, 2017
Program read-cpspov-files.r revised: January 16, 2020