{tibble}
, {dplyr}
$
operator for vector in data frame
[1] 28
[1] 24.4
[1] 21
[1] 20
NOTE: The mean and median of x
differ.
# A tibble: 1 × 2
x x_median
<dbl> <dbl>
1 28 28
x
are the same. The median()
is computed based on the new value of x
that is assigned by the first line in summarize()
(e.g., x = mean(x, na.rm = TRUE)
). You would want to use a new variable name.Assigning to names other than x
:
# A tibble: 1 × 2
mean median
<dbl> <dbl>
1 28 21
NOTE: The mean and median of x
differ.
Coding each variable to include in the summarized data frame can be tedious.
across()
dplyr::across()
across()
is used when you want to iterate a function or set of functions across a multiple variables. The function will require you to pass arguments for the columns you want to summarize, the function(s) specifying how to summarize, and the names of the new output variables.
dplyr::across()
: Parameters/Arguments.cols
: the columns to perform a function upon.fns
: the function(s) to apply to the column in .cols
.names
: a glue specification that describes how to name the output columns; use {.col}
to stand for the selected column name, and {.fn}
for the function being applied; defaults to "{col}_{fn}"
dplyr::across()
: Passing Arguments (Cont.).cols = c(x, y)
.fns = ~mean(x, na.rm = TRUE)
.names = NULL
(default argument)dplyr::across()
: Passing Arguments (Cont.)dplyr::across()
: Passing Arguments (Cont.)Passing an argument to .names
, .names = "{col}_{fn}"
:
dplyr::across()
: Passing Arguments (Cont.)Or using a quoted vector for .cols
: .cols = c("x", "y")
dplyr::across()
: Passing Arguments (Cont.)Or passing a quoted vector to .cols
:
all_of()
or any_of()
for variable selection.cols = all_of(summarize_these)
.cols = summarize_these
will produce a warningdplyr::across()
: Passing Arguments (Cont.)summarize_these <- c("x", "y") # create the vector to pass
DATA |>
group_by(group) |>
summarize(across(.cols = any_of(summarize_these),
.fns = ~mean(.x, na.rm = TRUE),
.names = "{col}_{fn}"
)
)
# A tibble: 2 × 3
group x_1 y_1
<chr> <dbl> <dbl>
1 a 15 27.7
2 b 47.5 19.5
across()
x
and y
), {fn} results in a numeric value which is not diagnostic of the functionObtain the mean (e.g., afunction
) of the numeric vector (e.g., num_vect
):
.fns
.cols = c(x, y)
.fns = list(~mean(x, na.rm = TRUE))
.fns
(Cont.).fns
(Cont.)The ~
is use as a lambda-like operator that results in iterating the function over all instances of x. In this case, list(~mean(x, na.rm = TRUE)
, the x is not referring to the x
column in the data frame but instead the values in all variables passed to .cols
. In this case, the x would be both x
and y
, in that order.
x
and y
), {fn} results in a numeric value which is not diagnostic of the function.cols = c(x, y)
.fns = list(some_name = ~mean(x, na.rm = TRUE))
.fns
summary_funcs
list of function(s).fns = summary_funcs
.fns
(Cont.)Create a list containing ~mean(.x, na.rm = TRUE))
used in previous example:
.fns
(Cont.)Then pass to .fns
, .fns = summary_funcs
:
.fns
(Cont.)Pair with .cols = summarize_these
to summarize the variables in summarize_these
using the function(s) in summary_funcs
:
Add functions to the list to accomplish more
.fns
: Groupinggroup_by(group)
.fns = summary_funcs
.fns
: Grouping (Cont.)DATA |>
group_by(group) |>
summarize(across(.cols = c(x, y),
.fns = summary_funcs,
.names = "{col}_{fn}"
)
)
# A tibble: 2 × 9
group x_mean x_median x_sd x_n y_mean y_median y_sd y_n
<chr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
1 a 15 13 5.29 3 27.7 30 6.81 3
2 b 47.5 47.5 4.95 2 19.5 19.5 0.707 2