3.4 Labelled data in other packages

Although labelled datasets are relatively new and somewhat of a niche there are a few packages that are starting to leverage the additional metadata provided.

3.4.1 Frequency tables with questionr

The questionr package provides a set of convenient helper functions for survey processing tasks. Some of these use label and missing value metadata for display purposes.

Among others, the freq() function provides an equivalent to frequency tables produced in SPSS, and the ltabs() function provides a wrapper for stats::xtabs() that uses labels by default

library(questionr)

freq(gss$HEALTH)
#>                 n    % val%
#> [0] IAP       774 33.0   NA
#> [1] EXCELLENT 359 15.3 22.9
#> [2] GOOD      771 32.8 49.1
#> [3] FAIR      355 15.1 22.6
#> [4] POOR       84  3.6  5.4
#> [8] DK          5  0.2   NA
#> [9] NA          0  0.0  0.0

ltabs(~ HELPSICK + HEALTH, gss)
#>                                                 HEALTH: Condition of health
#> HELPSICK: Should govt help pay for medical care? [0] IAP [1] EXCELLENT [2] GOOD
#>                           [0] IAP                      0           177      391
#>                           [1] GOVT SHOULD HELP       250            52      132
#>                           [2] 2                      138            44       79
#>                           [3] AGREE WITH BOTH        242            56      101
#>                           [4] 4                       62            17       34
#>                           [5] PEOPLE HELP SELVES      63             9       28
#>                           [8] DK                      19             4        6
#>                           [9] NA                       0             0        0
#>                                                 HEALTH: Condition of health
#> HELPSICK: Should govt help pay for medical care? [3] FAIR [4] POOR [8] DK
#>                           [0] IAP                     171       45      1
#>                           [1] GOVT SHOULD HELP         73       20      1
#>                           [2] 2                        40        5      0
#>                           [3] AGREE WITH BOTH          44        3      2
#>                           [4] 4                        12        4      0
#>                           [5] PEOPLE HELP SELVES       10        7      0
#>                           [8] DK                        3        0      1
#>                           [9] NA                        2        0      0
#>                                                 HEALTH: Condition of health
#> HELPSICK: Should govt help pay for medical care? [9] NA
#>                           [0] IAP                     0
#>                           [1] GOVT SHOULD HELP        0
#>                           [2] 2                       0
#>                           [3] AGREE WITH BOTH         0
#>                           [4] 4                       0
#>                           [5] PEOPLE HELP SELVES      0
#>                           [8] DK                      0
#>                           [9] NA                      0

3.4.2 Tabling with gtsummary

gtsummary was originally developed as a complement to the [gt]{https://gt.rstudio.com/} table presentation package, for easily producing summary tables of common indicators for datasets, regression models and so on.

Variable labels will be used for labelling tables by default, where they exist. Value labels are not used by default, but can easily be included by converting the variables to factors as demonstrated in the previous section.

library(gtsummary)
gss %>%
  select(HEALTH, HELPSICK, HELPPOOR) %>%
  to_factor(drop_unused_labels = TRUE, user_na_to_na = TRUE) %>%
  tbl_summary(by = HEALTH)
#> 779 observations missing `HEALTH` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `HEALTH` column before passing to `tbl_summary()`.
Characteristic EXCELLENT, N = 3591 GOOD, N = 7711 FAIR, N = 3551 POOR, N = 841
Should govt help pay for medical care?
GOVT SHOULD HELP 52 (29%) 132 (35%) 73 (41%) 20 (51%)
2 44 (25%) 79 (21%) 40 (22%) 5 (13%)
AGREE WITH BOTH 56 (31%) 101 (27%) 44 (25%) 3 (7.7%)
4 17 (9.6%) 34 (9.1%) 12 (6.7%) 4 (10%)
PEOPLE HELP SELVES 9 (5.1%) 28 (7.5%) 10 (5.6%) 7 (18%)
Unknown 181 397 176 45
Should govt improve standard of living?
GOVT ACTION 28 (16%) 67 (18%) 33 (19%) 13 (36%)
2 23 (13%) 58 (16%) 26 (15%) 3 (8.3%)
AGREE WITH BOTH 92 (51%) 162 (43%) 83 (47%) 16 (44%)
4 26 (14%) 59 (16%) 16 (9.0%) 2 (5.6%)
PEOPLE HELP SELVES 11 (6.1%) 27 (7.2%) 19 (11%) 2 (5.6%)
Unknown 179 398 178 48

1 Statistics presented: n (%)

gss %>%
  transmute(RINCOME, REALINC = unclass(REALINC), FINRELA) %>%
  to_factor(drop_unused_labels = TRUE, user_na_to_na = TRUE) %>%
  tbl_summary(by = FINRELA, percent = "row")
#> 27 observations missing `FINRELA` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `FINRELA` column before passing to `tbl_summary()`.
Characteristic FAR BELOW AVERAGE, N = 1531 BELOW AVERAGE, N = 5971 AVERAGE, N = 1,0421 ABOVE AVERAGE, N = 4761 FAR ABOVE AVERAGE, N = 531
Respondents income
LT $1000 1 (3.0%) 17 (52%) 11 (33%) 4 (12%) 0 (0%)
$1000 TO 2999 3 (9.4%) 12 (38%) 11 (34%) 6 (19%) 0 (0%)
$3000 TO 3999 2 (6.2%) 11 (34%) 14 (44%) 5 (16%) 0 (0%)
$4000 TO 4999 0 (0%) 11 (52%) 7 (33%) 3 (14%) 0 (0%)
$5000 TO 5999 3 (14%) 10 (48%) 7 (33%) 1 (4.8%) 0 (0%)
$6000 TO 6999 1 (8.3%) 3 (25%) 5 (42%) 2 (17%) 1 (8.3%)
$7000 TO 7999 3 (17%) 6 (33%) 7 (39%) 2 (11%) 0 (0%)
$8000 TO 9999 4 (12%) 11 (33%) 15 (45%) 2 (6.1%) 1 (3.0%)
$10000 - 14999 6 (6.4%) 39 (41%) 40 (43%) 8 (8.5%) 1 (1.1%)
$15000 - 19999 7 (12%) 25 (42%) 24 (40%) 3 (5.0%) 1 (1.7%)
$20000 - 24999 6 (5.7%) 34 (32%) 56 (53%) 10 (9.4%) 0 (0%)
$25000 OR MORE 17 (2.0%) 136 (16%) 428 (51%) 248 (29%) 18 (2.1%)
Unknown 100 282 417 182 31
Family income in constant $ 7,378 (4,086, 17,025) 12,485 (5,108, 20,430) 24,970 (12,485, 37,455) 49,940 (30,645, 72,640) 119,879 (30,645, 119,879)

1 Statistics presented: n (%); Median (IQR)

gss %>%
  to_factor(drop_unused_labels = TRUE, user_na_to_na = TRUE) %>%
  tbl_cross(HELPSICK, HEALTH, percent = "row")
Characteristic Condition of health Total
EXCELLENT GOOD FAIR POOR Unknown
Should govt help pay for medical care?
GOVT SHOULD HELP 52 (9.8%) 132 (25%) 73 (14%) 20 (3.8%) 251 (48%) 528 (100%)
2 44 (14%) 79 (26%) 40 (13%) 5 (1.6%) 138 (45%) 306 (100%)
AGREE WITH BOTH 56 (12%) 101 (23%) 44 (9.8%) 3 (0.7%) 244 (54%) 448 (100%)
4 17 (13%) 34 (26%) 12 (9.3%) 4 (3.1%) 62 (48%) 129 (100%)
PEOPLE HELP SELVES 9 (7.7%) 28 (24%) 10 (8.5%) 7 (6.0%) 63 (54%) 117 (100%)
Unknown 181 (22%) 397 (48%) 176 (21%) 45 (5.5%) 21 (2.6%) 820 (100%)
Total 359 (15%) 771 (33%) 355 (15%) 84 (3.6%) 779 (33%) 2,348 (100%)