The projectable workflow

This vignette steps through a typical projectable workflow, taking us from a simple dataframe via a metadata-rich table constructed using the prj_tbl_*() functions through to a projection of that table in gt format.

The projectable package was designed to gell with gt and the wider tidyverse, so – although it is by no means necessary – we will use dplyr verbs throughout.

library(projectable)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(gt)

Setting the table

The first step in the projectable workflow involves transforming an ordinary dataframe into a metadata-rich table-like object that is fit for projection. We term this ‘setting the table’ and we perform it using the prj_tbl_*() functions along with col_*() helpers.

Thus, to generate a frequency cross tabulation out of the mtcars dataset with rows corresponding to the number of cylinders and the transmission types, and columns corresponding to the engine shape, we write:

crs_tab <- mtcars %>% 
  # Specify the rows of the prj table
  prj_tbl_rows(
    Cylinders = cyl,
    Transmission = am,
  ) %>%
  # Specify the columns of the prj table
  prj_tbl_cols(
    `V-Shaped` = col_freq(n = vs %in% 1, N = vs %in% 0:1),
    `Not V-Shaped` = col_freq(n = vs %in% 0, N = vs %in% 0:1)
  ) %>%
  prj_tbl_summarise()

# Notice that the rows sum to one
knitr::kable(crs_tab)

row_spanner	rows	V-Shaped	Not V-Shaped
Cylinders	4	0.91	0.091
Cylinders	6	0.57	0.430
Cylinders	8	0.00	1.000
Transmission	0	0.37	0.630
Transmission	1	0.54	0.460

Behind the scenes, prj_tbl_summarise() loops over the row and column expressions it is provided. Each column expression provided via prj_tbl_cols() is evaluated over .data filtered down to each row expression provided via prj_tbl_rows().

In our example, col_freq(...) is evaluated over mtcars filtered to each unique value in mtcars$cyl, and then over mtcars filtered to each unique value in mtcars$am.

If we wanted to calculate the base for each frequency, not row-by-row, but over the whole column we would have to explicitly instruct R, when it is evaluating N, not to look for vs within the filtered version of mtcars, but in mtcars overall. Thus we would write

crs_tab_col <- mtcars %>% 
  prj_tbl_rows(
    Cylinders = cyl,
    Transmission = am,
  ) %>% 
  prj_tbl_cols(
    `V-Shaped` = col_freq(n = vs %in% 1, N = .data$vs %in% 1), 
    `Not V-Shaped` = col_freq(n = vs %in% 0, N = .data$vs %in% 0)
  ) %>%
  prj_tbl_summarise()

# Notice that, for each mutually exclusive and collectively exhaustive set of rows,
# the columns sum to one
knitr::kable(crs_tab_col)

row_spanner	rows	V-Shaped	Not V-Shaped
Cylinders	4	0.71	0.056
Cylinders	6	0.29	0.170
Cylinders	8	0.00	0.780
Transmission	0	0.50	0.670
Transmission	1	0.50	0.330

And if we wanted to calculate the base for each frequency, not over the whole column, but over the whole table, then we would write:

# Calculate frequency cell-by-cell
crs_tab_overall <- mtcars %>% 
  prj_tbl_rows(
    Cylinders = cyl,
    Transmission = am,
  ) %>% 
  prj_tbl_cols(
    `V-Shaped` = col_freq(n = vs %in% 1, N = mtcars$vs %in% 0:1), 
    `Not V-Shaped` = col_freq(n = vs %in% 0, N = mtcars$vs %in% 0:1)
  ) %>%
  prj_tbl_summarise()

# Notice that, for each mutually exclusive and collectively exhaustive set of rows,
# the cells sum to one
knitr::kable(crs_tab_overall)

row_spanner	rows	V-Shaped	Not V-Shaped
Cylinders	4	0.31	0.031
Cylinders	6	0.12	0.094
Cylinders	8	0.00	0.440
Transmission	0	0.22	0.380
Transmission	1	0.22	0.190

Once we have finished setting the table, what we should have is a table-like object with projectable_cols. In addition to a face value, each of these projectable_cols will have a set of metadata associated with it: a col_freq, for instance, contains n and N metadata alongside a p (proportion) face value.

Casting a shadow

When it comes to displaying this metadata-rich table-like object, we have a range of options.

In some circumstances, we may want to display only the face values of each projectable_col, in which case we can simply print the table as we have done above.

However, in many cases, we will also want to be able to display the associated metadata. Thus, there might be times when we want to display not only the proportions but the associated n and N too.

This is where the prj_project() function comes in: we use it to ‘unfold’ each projectable_col into whichever of its constituent elements we would like to display. We do this by passing through a named list of glue-like character vectors where each name corresponds to a column in the table passed through as .data. Each element in the character vector will be pulled out into a column in the output.

So, for instance, to display the proportion alongside the count in one column and the base in another column for the V-Shaped and Not V-shaped columns we created, we write:

crs_tab_display <- crs_tab %>% 
  prj_project(
    .cols = list(
      `V-Shaped` = c(Frequency = "{signif(p, 2)} ({n})", Sample = "{N}"),
      `Not V-Shaped` = c(Frequency = "{signif(p, 2)} ({n})", Sample = "{N}")
    )
  )

knitr::kable(crs_tab_display)

row_spanner	rows	V-Shaped.Frequency	V-Shaped.Sample	Not V-Shaped.Frequency	Not V-Shaped.Sample
Cylinders	4	0.91 (10)	11	0.09 (1)	11
Cylinders	6	0.57 (4)	7	0.43 (3)	7
Cylinders	8	0 (0)	14	1 (14)	14
Transmission	0	0.37 (7)	19	0.63 (12)	19
Transmission	1	0.54 (7)	13	0.46 (6)	13

We thus arrive at another ‘flat’ dataframe: a dataframe whose columns are simple atomic vectors. But with one caveat, the dataframe as a whole comes associated with metadata which keeps track of which columns in the output (what we call the projection) belong to which columns in the input (the metadata rich table).

The significance of this metadata will become clear in the next section.

Shaping with `gt`

At this stage, the substance of what we want to display is already there. But its not in a form that’s fit for general consumption.

For one thing, while the columns in the projection give some indication of their provenance by way of their names, it would be good to make the connection between input and output columns much more overt.

And for another, we might want to fiddle with the formatting: adding colours, touching up labels, annotating with footnotes.

The bad news is that projectable can’t do really any of this. The good news is that it doesn’t need to.

Instead of implementing any formatting functions, projectable implements a hand-off function, prj_gt(), which initialises a gt table object:

crs_tab_gt <- crs_tab %>% 
  prj_shadow(where(is_col_freq), .shadow = c(Frequency = "{signif(p, 2)} ({n})", Sample = "{N}")) %>% 
  prj_gt()

crs_tab_gt

	V-Shaped		Not V-Shaped
	Frequency	Sample	Frequency	Sample
Cylinders
4	0.91 (10)	11	0.09 (1)	11
6	0.57 (4)	7	0.43 (3)	7
8	0 (0)	14	1 (14)	14
Transmission
0	0.37 (7)	19	0.63 (12)	19
1	0.54 (7)	13	0.46 (6)	13

It is then possible to leverage all of the functionality of gt to apply whatever formatting we want:

crs_tab_gt %>% 
  gt::tab_header(title = "Engine Shape vs Other Vehicle Characteristics") %>% 
  gt::tab_stubhead(label = "Vehicle Characteristics") %>% 
  gt::tab_options(
    heading.background.color = "#080808",
    row_group.background.color = "#f0f0f0"
  )

Engine Shape vs Other Vehicle Characteristics
Vehicle Characteristics	V-Shaped		Not V-Shaped
Vehicle Characteristics	Frequency	Sample	Frequency	Sample
Cylinders
4	0.91 (10)	11	0.09 (1)	11
6	0.57 (4)	7	0.43 (3)	7
8	0 (0)	14	1 (14)	14
Transmission
0	0.37 (7)	19	0.63 (12)	19
1	0.54 (7)	13	0.46 (6)	13

Setting the table

Casting a shadow

Shaping with gt

Shaping with `gt`