The descriptive statistics of demographic characteristics at county- and state-levels are provided in Table 1. From Table 1, it can be observed that the average population for counites was 104 159 ranging from as few as 88 people to more than 10 million. At the state-level, the average population was more than 6.4 million ranging from about 0.6 million to almost 40 million. Of those populations, on average, a typical county had more younger (≤18 years) population (22%) than older (≥65 years) population (19%) so did a typical state (22% vs. 16%); whereas both counties and states had an equal gender distribution. In terms of the racial/ethnic composition, the proportion of non-Hispanic White was the majority for both counties (76%) and states (68%), followed by Hispanic (10% for counties and 12% for states) and then non-Hispanic Black (9% and 11%, respectively). Other racial/ethnic groups (i.e., Asian, American Indian & Alaska Native, and Native Hawaiian/Other Pacific Islander) made of the rest (<5% for counties and <8% for states). It is worth noting that in the HGLM analysis, we included the proportions of non-Hispanic Black and of Hispanic as the groups of interest and the combined remaining four groups as the reference category the majority (~90%) of which was non-Hispanic White.
Characteristic County-level (n=3141) State-level (n=51) Min. Max. M SD Min. Max. M SD Population size 88 10 105 518 104 159.48 333 534.49 577 737 39 667 045 6 415 047.73 7 343 307.89 Proportion of below 18 years of age < 0.01 0.42 0.22 0.03 0.18 0.30 0.22 0.02 Proportion of 65 and older 0.05 0.58 0.19 0.05 0.11 0.21 0.16 0.02 Proportion of female 0.27 0.57 0.50 0.02 0.48 0.53 0.51 0.01 Proportion of non-Hispanic White 0.03 0.98 0.76 0.20 0.22 0.93 0.68 0.16 Proportion of non-Hispanic Black < 0.01 0.85 0.09 0.14 0.01 0.45 0.11 0.11 Proportion of Hispanic 0.01 0.96 0.10 0.14 0.02 0.49 0.12 0.10 Proportion of Asian < 0.01 0.43 0.02 0.03 0.01 0.38 0.05 0.06 Proportion of American Indian & Alaska Native < 0.01 0.93 0.02 0.08 < 0.01 0.15 0.02 0.03 Proportion of Native Hawaiian/Other Pacific Islander 0 0.49 < 0.01 0.01 < 0.01 0.10 < 0.01 0.14 Proportion of rural < 0.01 1.00 0.59 0.31 < 0.01 0.61 0.26 0.15 Rural-urban continuum code 0 8 4.01 2.71 – – – – Number of COVID-19 deaths 0 6732 33.17 241.29 8 29699 2043.75 4528.34 Number of COVID-19 deaths per 100 000 population (COVID-19 mortality) 0 311.98 12.85 28.48 1.08 151.97 26.20 33.61
Table 1. Descriptive statistics of demographic characteristics and COVID-19 outcomes
As for rurality, a typical county in general had more than a half of its area (59%) as rural; whereas a typical state only had a little more than a quarter of its area (26%) as rural. The average score of the rural-urban continuum code was 4.01 for counties, suggesting that most of counties were "nonmetro counties with an urban population of 20 000 or more, adjacent to a metro area". The rural-urban continuum codes were not available at the state-level.
Table 1 also displays the summary statistics of COVID-19 outcomes. As of 31 May, 2020, among all 3141 counties, the total number of COVID-19 deaths was 104 183, and the average number of COVID-19 deaths was 33.17, ranging from 0 to 6732; while for all 50 states plus Washington, DC, the average number of COVID-19 deaths was 2043.75, ranging from 8 to 29 699. To make the numbers of COVID-19 deaths more comparable across counties and across states, both of which had various sizes of population, a standardized measure, COVID-19 mortality, defined as the number of COVID-19 deaths per 100 000 population, was computed. Among the counties, the average COVID-19 mortality was 12.85 per 100 000 population; while the average COVID-19 mortality for states was more than doubled as 26.20 per 100 000 population. Fig. 2 also demonstrated that the distribution of COVID-19 mortality was widely spread with several areas of high mortality in various regions across the country.
To examine the raw associations between county-level demographics and COVID-19 mortality, we first established an HGLM base model which included demographic variables only. The base model set a stage for later identifying health factors uniquely associated with COVID-19 mortality. The results of the base model were presented in Table 2. As expected, the overdispersion parameter estimate (i.e., county-level residual variance σ2) was 18.90, meaning that the estimated within-states variance of COVID-19 mortality given the demographics was 18.90 times larger than the variance expected from a regular Poisson model, which justified the use of the overdispersion parameter. Further, the between-states variance (i.e., state-level residual variance τ) was 0.75 which was statistically significant (P<0.001), meaning that COVID-19 mortality varied significantly across states.
Demographic variable Coeff. Half-Std. Coeff. Robust SE t–Ratio Approx. d.f. P ERR Half-Std. ERR %
Abs. % Change Rank (Intercept) 2.04 0.19 10.59 50 <0.001 7.68 — — — — Proportion of below 18 years of age –0.38 –0.01 1.08 –0.35 3077 0.73 0.69 0.99 –1.3% 1.3% 7 Proportion of 65 and older 4.42 0.21 1.33 3.32 3077 <0.001 83.48 1.23 23.2% 23.2% 4 Proportion of non-Hispanic Black 2.10 0.30 0.44 4.73 3077 <0.001 8.14 1.35 35.0% 35.0% 1 Proportion of Hispanic 2.06 0.28 0.28 7.39 3077 <0.001 7.82 1.33 32.9% 32.9% 2 Proportion of female 2.33 0.05 3.22 0.72 3077 0.47 10.30 1.05 5.5% 5.5% 6 Proportion of rural –1.22 –0.38 0.58 –2.09 3077 0.04 0.30 0.68 –31.9% 31.9% 3 Rural-urban continuum code –0.08 –0.21 0.04 –1.87 3077 0.06 0.93 0.81 –18.9% 18.9% 5 aCounty-level overdispersion parameter (σ2)=18.90, and state-level residual variance (τ)=0.75 (P<0.001). d.f.: degree of freedom; ERR: event rate ratio.
Table 2. Results of HGLM base modela for demographics
In order to compare the substantive importance of impacts of demographics, the estimated regression coefficients were converted to a half-standardized event rate ratio (ERR). The half-standardized ERR is a measure of effect size, and its interpretation is similar to odds ratios. That is, if a half-standardized ERR is greater than 1, the expected death rate is higher in the group of interest than the reference group; otherwise, the expected death rate is equal or lower in the group of interest than the reference group. Then, its absolute percent (%) of change was ranked at the last column of Table 2 for indicating the substantive importance of each demographic variable.
The rank ordering by the absolute % change indicated that the proportion of non-Hispanic Black had the highest rank (#1) with a half-standardized ERR of 1.35, which means that for two counties that had one standard-deviation difference in the proportion of non-Hispanic Black (SD=0.14, Table 1), on average, the country that had a higher proportion of non-Hispanic Black had 35.0% more death than the country that had a lower proportion of non-Hispanic Black, controlling for other demographic variables in the model. Another example but a reverse case would be that if a county's proportion of rural went up one standard deviation (SD=0.31, Table 1), the COVID-19 mortality of that county would go down by 31.9% on average. It is also interesting to see that race/ethnicity measured as the proportions of non-Hispanic Black and Hispanic occupied the top two ranks, indicated that racial/ethnic minorities acted as the highest risk factors of COVID-19 mortality. On the other hand, rurality measured as the proportion of rural areas worked as a protective factor.
To make a fair comparison for the mean event rate of COVID-19 mortality, we controlled demographic variables and general health by adding them to the HGLM model as covariates when identifying county-level health factors uniquely associated with COVID-19 mortality. To avoid potential model overfitting due to too many item-level health factors and general health (totally 77 items) in the County Health Rankings, we took a two-step approach to investigating how each health factor was uniquely related to COVID-19 mortality. The first step was a preliminary round to select potentially highly influential factors. To do that, we added one item-level factor at a time to the HGLM Poisson regression in addition to the demographic variables mentioned earlier. Using the criterion of half-standardized ERR≥0.30, which is considered to be a substantively moderate effect size, we initially selected 12 item-level health factors and general health from the totally 77 items. Those selected item-level health factors and general health were listed in Supplementary Table 1 (available online).
Furthermore, for highly correlated items, such as food environment index and limited access to healthy foods within the factor category of Diet & Exercise (r=−0.77), and proportions of uninsured, uninsured adults, and uninsured children in the category of Access to Care (all rs>0.73), we decided to create a composite score for the highly correlated items in the category by creating a z-score for each item, adding the z-scores up, and then re-standardizing them into a single overall z-score to represent each category in order to increase the interpretability of the results. For other items, we also created a z score for each item so that we could compare the relative influence of each factor on COVID-19 mortality.
At the second step of analysis, the outcome, COVID-19 mortality, was regressed on these nine z-scores in addition to the seven demographic variables described in the previous section using HGLM Poisson regression with overdispersion allowed. The results for this analysis were displayed in Table 3. From Table 3 we can see that long commute driving alone, severe housing problems, and juvenile arrests rate were statistically significant health factors of COVID-19 mortality (P<0.05); and the rates of suicide and uninsured were marginally statistically significant (P<0.10). However, in terms of effect size measured by ERR, suicide rate was most influential (−28.1%), followed by uninsured rate (20.7%), long commute driving alone (19.9%), juvenile arrests rate (−15.0%), severe housing problem (13.6%), access to healthy food (11.2%), and social associations number rate (−1.5%). It should be noted that two general heath items (child mortality number rate and average life expectancy) were neither statistically significant nor necessary to be interpreted as controlled covariates.
Factor category Item (in z–score) Coefficient Robust SE t–Ratio Approx. d.f. P ERR Rank Health factors Health behaviors Diet & Exercise Access to healthy food 0.11 0.15 0.69 1360 0.49 1.11 7 Clinical care Access to care Uninsured rate 0.19 0.10 1.87 1360 0.06 1.21 2 Social & economic factors Family & social support Social associations number rate –0.02 0.08 –0.19 1360 0.85 0.98 9 Community safety Suicides rate –0.33 0.17 –1.90 1360 0.06 0.72 1 Juvenile arrests rate –0.16 0.07 –2.23 1360 0.03 0.85 4 Physical environment Housing & Transit Long commute driving alone 0.18 0.06 2.85 1360 < 0.01 1.20 3 Severe housing problems 0.13 0.05 2.70 1360 0.01 1.14 5 General health Length of life Child mortality number rate –0.10 0.11 –0.87 1360 0.39 0.91 8 Average life expectancy –0.14 0.10 –1.41 1360 0.16 0.87 6 aThe seven demographic variables were controlled in the model but not shown in this table for clarity. The estimates of the intercept, the level-1 overdispersion parameter, and the level-2 residual variance are also not included in this table. SE: standard error; d.f.: degree of freedom; ERR: event rateratio.
Table 3. Results of HGLM Poisson regression with overdispersiona for health factors and general health
Identification of county-level health factors associated with COVID-19 mortality in the United States
- COVID-19 /
- mortality /
- health factors /
- health disparity /
- hierarchical generalized linear model
Abstract: Many studies have investigated causes of COVID-19 and explored safety measures for preventing COVID-19 infections. Unfortunately, these studies fell short to address disparities in health status and resources among decentralized communities in the United States. In this study, we utilized an advanced modeling technique to examine complex associations of county-level health factors with COVID-19 mortality for all 3141 counties in the United States. Our results indicated that counties with more uninsured people, more housing problems, more urbanized areas, and longer commute are more likely to have higher COVID-19 mortality. Based on the nationwide population-based data, this study also echoed prior research that used local data, and confirmed that county-level sociodemographic factors, such as more Black, Hispanic, and older subpopulations, are attributed to high risk of COVID-19 mortality. We hope that these findings will help set up priorities on high risk communities and subpopulations in future for fighting the novel virus.
|Citation:||Wei Pan, Yasuo Miyazaki, Hideyo Tsumura, Emi Miyazaki, Wei Yang. Identification of county-level health factors associated with COVID-19 mortality in the United States[J]. The Journal of Biomedical Research. doi: 10.7555/JBR.34.20200129|