3.3 Exploring datasets

The labelled package provides a simple helper function look_for() for finding variables with either variable or value labels matching a search term in your dataset.

Some simple examples are included below. For a more detailed rundown of the look_for() function see the vignette.

# Find variables with "medical" in the label
look_for(gss, "medical")
#> pos   variable label                                col_type values             
#> <chr> <chr>    <chr>                                <chr>    <chr>              
#> 314   HELPSICK Should govt help pay for medical ca… dbl+lbl  [0] IAP            
#> ​      ​         ​                                     ​         [1] GOVT SHOULD HE…
#> ​      ​         ​                                     ​         [3] AGREE WITH BOTH
#> ​      ​         ​                                     ​         [5] PEOPLE HELP SE…
#> ​      ​         ​                                     ​         [8] DK             
#> ​      ​         ​                                     ​         [9] NA             
#> 390   INTMED   Interested in medical discoveries    dbl+lbl  [0] IAP            
#> ​      ​         ​                                     ​         [1] Very interested
#> ​      ​         ​                                     ​         [2] Moderately int…
#> ​      ​         ​                                     ​         [3] Not at all int…
#> ​      ​         ​                                     ​         [8] DONT KNOW      
#> ​      ​         ​                                     ​         [9] NA             
#> 498   MEDDOC   X should go to general medical doct… dbl+lbl  [0] IAP            
#> ​      ​         ​                                     ​         [1] YES            
#> ​      ​         ​                                     ​         [2] NO             
#> ​      ​         ​                                     ​         [8] DK             
#> ​      ​         ​                                     ​         [9] NA

# Only provide basic details
look_for(gss, "income", details = FALSE)
#>   pos variable label                                         
#> <int> <chr>    <chr>                                         
#>    15 ABPOOR   Low income--cant afford more children         
#>    16 ABPOORW  Wrong for woman to get abortion if low income?
#>   130 CONINC   Family income in constant dollars             
#>   136 CONRINC  Respondent income in constant dollars         
#>   209 EQWLTH   Should govt reduce income differences         
#>   252 FINRELA  Opinion of family income                      
#>   369 INCGAP   Income differentials in usa too big           
#>   370 INCOM16  R's family income when 16 yrs old             
#>   371 INCOME   Total family income                           
#>   372 INCOME16 Total family income                           
#>   373 INCUSPOP Estimated income status of housing unit       
#>   722 REALINC  Family income in constant $                   
#>   723 REALRINC R's income in constant $                      
#>   805 RINCBLLS Income alone is enough                        
#>   806 RINCOM16 Respondents income                            
#>   807 RINCOME  Respondents income                            
#>   927 TAX      R's federal income tax

# Search using a regular expression
look_for(gss, "medic(al|ation)", details = FALSE)
#>   pos variable label                                                  
#> <int> <chr>    <chr>                                                  
#>   314 HELPSICK Should govt help pay for medical care?                 
#>   390 INTMED   Interested in medical discoveries                      
#>   498 MEDDOC   X should go to general medical doctor for help         
#>   532 MUSTMED  X should be forced to take prescribed medication by law
#>   617 OTCMED   X should take non-prescription medication              
#>   813 RXMED    X should take prescription medication

# Provide a variable summary as a tibble
gss %>%
  look_for("medic(al|ation)") %>%
  as_tibble()
#> # A tibble: 6 x 13
#>     pos variable label col_type class type  levels value_labels na_values
#>   <int> <chr>    <chr> <chr>    <nam> <chr> <name> <named list> <named l>
#> 1   314 HELPSICK Shou… dbl+lbl  <chr… doub… <NULL> <dbl [6]>    <dbl [3]>
#> 2   390 INTMED   Inte… dbl+lbl  <chr… doub… <NULL> <dbl [6]>    <dbl [3]>
#> 3   498 MEDDOC   X sh… dbl+lbl  <chr… doub… <NULL> <dbl [5]>    <dbl [3]>
#> 4   532 MUSTMED  X sh… dbl+lbl  <chr… doub… <NULL> <dbl [5]>    <dbl [3]>
#> 5   617 OTCMED   X sh… dbl+lbl  <chr… doub… <NULL> <dbl [5]>    <dbl [3]>
#> 6   813 RXMED    X sh… dbl+lbl  <chr… doub… <NULL> <dbl [5]>    <dbl [3]>
#> # … with 4 more variables: na_range <named list>, unique_values <int>,
#> #   n_na <int>, range <named list>

# Provide a variable summary as a tibble with one row per value
gss %>%
  look_for("medic(al|ation)") %>%
  lookfor_to_long_format()
#> # A tibble: 32 x 13
#>      pos variable label col_type class type  levels value_labels na_values
#>    <int> <chr>    <chr> <chr>    <nam> <chr> <chr>  <chr>        <named l>
#>  1   314 HELPSICK Shou… dbl+lbl  <chr… doub… <NA>   [0] IAP      <dbl [3]>
#>  2   314 HELPSICK Shou… dbl+lbl  <chr… doub… <NA>   [1] GOVT SH… <dbl [3]>
#>  3   314 HELPSICK Shou… dbl+lbl  <chr… doub… <NA>   [3] AGREE W… <dbl [3]>
#>  4   314 HELPSICK Shou… dbl+lbl  <chr… doub… <NA>   [5] PEOPLE … <dbl [3]>
#>  5   314 HELPSICK Shou… dbl+lbl  <chr… doub… <NA>   [8] DK       <dbl [3]>
#>  6   314 HELPSICK Shou… dbl+lbl  <chr… doub… <NA>   [9] NA       <dbl [3]>
#>  7   390 INTMED   Inte… dbl+lbl  <chr… doub… <NA>   [0] IAP      <dbl [3]>
#>  8   390 INTMED   Inte… dbl+lbl  <chr… doub… <NA>   [1] Very in… <dbl [3]>
#>  9   390 INTMED   Inte… dbl+lbl  <chr… doub… <NA>   [2] Moderat… <dbl [3]>
#> 10   390 INTMED   Inte… dbl+lbl  <chr… doub… <NA>   [3] Not at … <dbl [3]>
#> # … with 22 more rows, and 4 more variables: na_range <named list>,
#> #   unique_values <int>, n_na <int>, range <named list>