This vignette steps through a typical projectable
workflow, taking us from a simple dataframe via a metadata-rich table constructed using the prj_tbl_*()
functions through to a projection of that table in gt
format.
The projectable
package was designed to gell with gt
and the wider tidyverse
, so – although it is by no means necessary – we will use dplyr
verbs throughout.
library(projectable)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(gt)
The first step in the projectable
workflow involves transforming an ordinary dataframe into a metadata-rich table-like object that is fit for projection. We term this ‘setting the table’ and we perform it using the prj_tbl_*()
functions along with col_*()
helpers.
Thus, to generate a frequency cross tabulation out of the mtcars
dataset with rows corresponding to the number of cylinders and the transmission types, and columns corresponding to the engine shape, we write:
crs_tab <- mtcars %>%
# Specify the rows of the prj table
prj_tbl_rows(
Cylinders = cyl,
Transmission = am,
) %>%
# Specify the columns of the prj table
prj_tbl_cols(
`V-Shaped` = col_freq(n = vs %in% 1, N = vs %in% 0:1),
`Not V-Shaped` = col_freq(n = vs %in% 0, N = vs %in% 0:1)
) %>%
prj_tbl_summarise()
# Notice that the rows sum to one
knitr::kable(crs_tab)
row_spanner | rows | V-Shaped | Not V-Shaped |
---|---|---|---|
Cylinders | 4 | 0.91 | 0.091 |
Cylinders | 6 | 0.57 | 0.430 |
Cylinders | 8 | 0.00 | 1.000 |
Transmission | 0 | 0.37 | 0.630 |
Transmission | 1 | 0.54 | 0.460 |
Behind the scenes, prj_tbl_summarise()
loops over the row and column expressions it is provided. Each column expression provided via prj_tbl_cols()
is evaluated over .data
filtered down to each row expression provided via prj_tbl_rows()
.
In our example, col_freq(...)
is evaluated over mtcars
filtered to each unique value in mtcars$cyl
, and then over mtcars
filtered to each unique value in mtcars$am
.
If we wanted to calculate the base for each frequency, not row-by-row, but over the whole column we would have to explicitly instruct R, when it is evaluating N
, not to look for vs
within the filtered version of mtcars
, but in mtcars
overall. Thus we would write
crs_tab_col <- mtcars %>%
prj_tbl_rows(
Cylinders = cyl,
Transmission = am,
) %>%
prj_tbl_cols(
`V-Shaped` = col_freq(n = vs %in% 1, N = .data$vs %in% 1),
`Not V-Shaped` = col_freq(n = vs %in% 0, N = .data$vs %in% 0)
) %>%
prj_tbl_summarise()
# Notice that, for each mutually exclusive and collectively exhaustive set of rows,
# the columns sum to one
knitr::kable(crs_tab_col)
row_spanner | rows | V-Shaped | Not V-Shaped |
---|---|---|---|
Cylinders | 4 | 0.71 | 0.056 |
Cylinders | 6 | 0.29 | 0.170 |
Cylinders | 8 | 0.00 | 0.780 |
Transmission | 0 | 0.50 | 0.670 |
Transmission | 1 | 0.50 | 0.330 |
And if we wanted to calculate the base for each frequency, not over the whole column, but over the whole table, then we would write:
# Calculate frequency cell-by-cell
crs_tab_overall <- mtcars %>%
prj_tbl_rows(
Cylinders = cyl,
Transmission = am,
) %>%
prj_tbl_cols(
`V-Shaped` = col_freq(n = vs %in% 1, N = mtcars$vs %in% 0:1),
`Not V-Shaped` = col_freq(n = vs %in% 0, N = mtcars$vs %in% 0:1)
) %>%
prj_tbl_summarise()
# Notice that, for each mutually exclusive and collectively exhaustive set of rows,
# the cells sum to one
knitr::kable(crs_tab_overall)
row_spanner | rows | V-Shaped | Not V-Shaped |
---|---|---|---|
Cylinders | 4 | 0.31 | 0.031 |
Cylinders | 6 | 0.12 | 0.094 |
Cylinders | 8 | 0.00 | 0.440 |
Transmission | 0 | 0.22 | 0.380 |
Transmission | 1 | 0.22 | 0.190 |
Once we have finished setting the table, what we should have is a table-like object with projectable_col
s. In addition to a face value, each of these projectable_col
s will have a set of metadata associated with it: a col_freq
, for instance, contains n
and N
metadata alongside a p
(proportion) face value.
When it comes to displaying this metadata-rich table-like object, we have a range of options.
In some circumstances, we may want to display only the face values of each projectable_col
, in which case we can simply print the table as we have done above.
However, in many cases, we will also want to be able to display the associated metadata. Thus, there might be times when we want to display not only the proportions but the associated n
and N
too.
This is where the prj_project()
function comes in: we use it to ‘unfold’ each projectable_col
into whichever of its constituent elements we would like to display. We do this by passing through a named list of glue-like character vectors where each name corresponds to a column in the table passed through as .data
. Each element in the character vector will be pulled out into a column in the output.
So, for instance, to display the proportion alongside the count in one column and the base in another column for the V-Shaped
and Not V-shaped
columns we created, we write:
crs_tab_display <- crs_tab %>%
prj_project(
.cols = list(
`V-Shaped` = c(Frequency = "{signif(p, 2)} ({n})", Sample = "{N}"),
`Not V-Shaped` = c(Frequency = "{signif(p, 2)} ({n})", Sample = "{N}")
)
)
knitr::kable(crs_tab_display)
row_spanner | rows | V-Shaped.Frequency | V-Shaped.Sample | Not V-Shaped.Frequency | Not V-Shaped.Sample |
---|---|---|---|---|---|
Cylinders | 4 | 0.91 (10) | 11 | 0.09 (1) | 11 |
Cylinders | 6 | 0.57 (4) | 7 | 0.43 (3) | 7 |
Cylinders | 8 | 0 (0) | 14 | 1 (14) | 14 |
Transmission | 0 | 0.37 (7) | 19 | 0.63 (12) | 19 |
Transmission | 1 | 0.54 (7) | 13 | 0.46 (6) | 13 |
We thus arrive at another ‘flat’ dataframe: a dataframe whose columns are simple atomic vectors. But with one caveat, the dataframe as a whole comes associated with metadata which keeps track of which columns in the output (what we call the projection
) belong to which columns in the input (the metadata rich table).
The significance of this metadata will become clear in the next section.
gt
At this stage, the substance of what we want to display is already there. But its not in a form that’s fit for general consumption.
For one thing, while the columns in the projection
give some indication of their provenance by way of their names, it would be good to make the connection between input and output columns much more overt.
And for another, we might want to fiddle with the formatting: adding colours, touching up labels, annotating with footnotes.
The bad news is that projectable
can’t do really any of this. The good news is that it doesn’t need to.
Instead of implementing any formatting functions, projectable
implements a hand-off function, prj_gt()
, which initialises a gt
table object:
crs_tab_gt <- crs_tab %>%
prj_shadow(where(is_col_freq), .shadow = c(Frequency = "{signif(p, 2)} ({n})", Sample = "{N}")) %>%
prj_gt()
crs_tab_gt
V-Shaped | Not V-Shaped | |||
---|---|---|---|---|
Frequency | Sample | Frequency | Sample | |
Cylinders | ||||
4 | 0.91 (10) | 11 | 0.09 (1) | 11 |
6 | 0.57 (4) | 7 | 0.43 (3) | 7 |
8 | 0 (0) | 14 | 1 (14) | 14 |
Transmission | ||||
0 | 0.37 (7) | 19 | 0.63 (12) | 19 |
1 | 0.54 (7) | 13 | 0.46 (6) | 13 |
It is then possible to leverage all of the functionality of gt
to apply whatever formatting we want:
crs_tab_gt %>%
gt::tab_header(title = "Engine Shape vs Other Vehicle Characteristics") %>%
gt::tab_stubhead(label = "Vehicle Characteristics") %>%
gt::tab_options(
heading.background.color = "#080808",
row_group.background.color = "#f0f0f0"
)
Engine Shape vs Other Vehicle Characteristics | ||||
---|---|---|---|---|
Vehicle Characteristics | V-Shaped | Not V-Shaped | ||
Frequency | Sample | Frequency | Sample | |
Cylinders | ||||
4 | 0.91 (10) | 11 | 0.09 (1) | 11 |
6 | 0.57 (4) | 7 | 0.43 (3) | 7 |
8 | 0 (0) | 14 | 1 (14) | 14 |
Transmission | ||||
0 | 0.37 (7) | 19 | 0.63 (12) | 19 |
1 | 0.54 (7) | 13 | 0.46 (6) | 13 |