3.4 Labelled data in other packages
Although labelled datasets are relatively new and somewhat of a niche there are a few packages that are starting to leverage the additional metadata provided.
3.4.1 Frequency tables with questionr
The questionr package provides a set of convenient helper functions for survey processing tasks. Some of these use label and missing value metadata for display purposes.
Among others, the freq()
function provides an equivalent to frequency tables produced in SPSS, and the ltabs()
function provides a wrapper for stats::xtabs()
that uses labels by default
library(questionr)
freq(gss$HEALTH)
#> n % val%
#> [0] IAP 774 33.0 NA
#> [1] EXCELLENT 359 15.3 22.9
#> [2] GOOD 771 32.8 49.1
#> [3] FAIR 355 15.1 22.6
#> [4] POOR 84 3.6 5.4
#> [8] DK 5 0.2 NA
#> [9] NA 0 0.0 0.0
ltabs(~ HELPSICK + HEALTH, gss)
#> HEALTH: Condition of health
#> HELPSICK: Should govt help pay for medical care? [0] IAP [1] EXCELLENT [2] GOOD
#> [0] IAP 0 177 391
#> [1] GOVT SHOULD HELP 250 52 132
#> [2] 2 138 44 79
#> [3] AGREE WITH BOTH 242 56 101
#> [4] 4 62 17 34
#> [5] PEOPLE HELP SELVES 63 9 28
#> [8] DK 19 4 6
#> [9] NA 0 0 0
#> HEALTH: Condition of health
#> HELPSICK: Should govt help pay for medical care? [3] FAIR [4] POOR [8] DK
#> [0] IAP 171 45 1
#> [1] GOVT SHOULD HELP 73 20 1
#> [2] 2 40 5 0
#> [3] AGREE WITH BOTH 44 3 2
#> [4] 4 12 4 0
#> [5] PEOPLE HELP SELVES 10 7 0
#> [8] DK 3 0 1
#> [9] NA 2 0 0
#> HEALTH: Condition of health
#> HELPSICK: Should govt help pay for medical care? [9] NA
#> [0] IAP 0
#> [1] GOVT SHOULD HELP 0
#> [2] 2 0
#> [3] AGREE WITH BOTH 0
#> [4] 4 0
#> [5] PEOPLE HELP SELVES 0
#> [8] DK 0
#> [9] NA 0
3.4.2 Tabling with gtsummary
gtsummary was originally developed as a complement to the [gt]{https://gt.rstudio.com/} table presentation package, for easily producing summary tables of common indicators for datasets, regression models and so on.
Variable labels will be used for labelling tables by default, where they exist. Value labels are not used by default, but can easily be included by converting the variables to factors as demonstrated in the previous section.
gss %>%
select(HEALTH, HELPSICK, HELPPOOR) %>%
to_factor(drop_unused_labels = TRUE, user_na_to_na = TRUE) %>%
tbl_summary(by = HEALTH)
#> 779 observations missing `HEALTH` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `HEALTH` column before passing to `tbl_summary()`.
Characteristic | EXCELLENT, N = 3591 | GOOD, N = 7711 | FAIR, N = 3551 | POOR, N = 841 |
---|---|---|---|---|
Should govt help pay for medical care? | ||||
GOVT SHOULD HELP | 52 (29%) | 132 (35%) | 73 (41%) | 20 (51%) |
2 | 44 (25%) | 79 (21%) | 40 (22%) | 5 (13%) |
AGREE WITH BOTH | 56 (31%) | 101 (27%) | 44 (25%) | 3 (7.7%) |
4 | 17 (9.6%) | 34 (9.1%) | 12 (6.7%) | 4 (10%) |
PEOPLE HELP SELVES | 9 (5.1%) | 28 (7.5%) | 10 (5.6%) | 7 (18%) |
Unknown | 181 | 397 | 176 | 45 |
Should govt improve standard of living? | ||||
GOVT ACTION | 28 (16%) | 67 (18%) | 33 (19%) | 13 (36%) |
2 | 23 (13%) | 58 (16%) | 26 (15%) | 3 (8.3%) |
AGREE WITH BOTH | 92 (51%) | 162 (43%) | 83 (47%) | 16 (44%) |
4 | 26 (14%) | 59 (16%) | 16 (9.0%) | 2 (5.6%) |
PEOPLE HELP SELVES | 11 (6.1%) | 27 (7.2%) | 19 (11%) | 2 (5.6%) |
Unknown | 179 | 398 | 178 | 48 |
1
Statistics presented: n (%)
|
gss %>%
transmute(RINCOME, REALINC = unclass(REALINC), FINRELA) %>%
to_factor(drop_unused_labels = TRUE, user_na_to_na = TRUE) %>%
tbl_summary(by = FINRELA, percent = "row")
#> 27 observations missing `FINRELA` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `FINRELA` column before passing to `tbl_summary()`.
Characteristic | FAR BELOW AVERAGE, N = 1531 | BELOW AVERAGE, N = 5971 | AVERAGE, N = 1,0421 | ABOVE AVERAGE, N = 4761 | FAR ABOVE AVERAGE, N = 531 |
---|---|---|---|---|---|
Respondents income | |||||
LT $1000 | 1 (3.0%) | 17 (52%) | 11 (33%) | 4 (12%) | 0 (0%) |
$1000 TO 2999 | 3 (9.4%) | 12 (38%) | 11 (34%) | 6 (19%) | 0 (0%) |
$3000 TO 3999 | 2 (6.2%) | 11 (34%) | 14 (44%) | 5 (16%) | 0 (0%) |
$4000 TO 4999 | 0 (0%) | 11 (52%) | 7 (33%) | 3 (14%) | 0 (0%) |
$5000 TO 5999 | 3 (14%) | 10 (48%) | 7 (33%) | 1 (4.8%) | 0 (0%) |
$6000 TO 6999 | 1 (8.3%) | 3 (25%) | 5 (42%) | 2 (17%) | 1 (8.3%) |
$7000 TO 7999 | 3 (17%) | 6 (33%) | 7 (39%) | 2 (11%) | 0 (0%) |
$8000 TO 9999 | 4 (12%) | 11 (33%) | 15 (45%) | 2 (6.1%) | 1 (3.0%) |
$10000 - 14999 | 6 (6.4%) | 39 (41%) | 40 (43%) | 8 (8.5%) | 1 (1.1%) |
$15000 - 19999 | 7 (12%) | 25 (42%) | 24 (40%) | 3 (5.0%) | 1 (1.7%) |
$20000 - 24999 | 6 (5.7%) | 34 (32%) | 56 (53%) | 10 (9.4%) | 0 (0%) |
$25000 OR MORE | 17 (2.0%) | 136 (16%) | 428 (51%) | 248 (29%) | 18 (2.1%) |
Unknown | 100 | 282 | 417 | 182 | 31 |
Family income in constant $ | 7,378 (4,086, 17,025) | 12,485 (5,108, 20,430) | 24,970 (12,485, 37,455) | 49,940 (30,645, 72,640) | 119,879 (30,645, 119,879) |
1
Statistics presented: n (%); Median (IQR)
|
gss %>%
to_factor(drop_unused_labels = TRUE, user_na_to_na = TRUE) %>%
tbl_cross(HELPSICK, HEALTH, percent = "row")
Characteristic | Condition of health | Total | ||||
---|---|---|---|---|---|---|
EXCELLENT | GOOD | FAIR | POOR | Unknown | ||
Should govt help pay for medical care? | ||||||
GOVT SHOULD HELP | 52 (9.8%) | 132 (25%) | 73 (14%) | 20 (3.8%) | 251 (48%) | 528 (100%) |
2 | 44 (14%) | 79 (26%) | 40 (13%) | 5 (1.6%) | 138 (45%) | 306 (100%) |
AGREE WITH BOTH | 56 (12%) | 101 (23%) | 44 (9.8%) | 3 (0.7%) | 244 (54%) | 448 (100%) |
4 | 17 (13%) | 34 (26%) | 12 (9.3%) | 4 (3.1%) | 62 (48%) | 129 (100%) |
PEOPLE HELP SELVES | 9 (7.7%) | 28 (24%) | 10 (8.5%) | 7 (6.0%) | 63 (54%) | 117 (100%) |
Unknown | 181 (22%) | 397 (48%) | 176 (21%) | 45 (5.5%) | 21 (2.6%) | 820 (100%) |
Total | 359 (15%) | 771 (33%) | 355 (15%) | 84 (3.6%) | 779 (33%) | 2,348 (100%) |