6 Arranging Columns with Across
I’m trying to arrange calculated columns immediately after source columns when using dplyr column-wise operations.
Example data
df <- tibble(
id = c(1, 2, 2),
id_row = c(1, 1, 2),
name_first = c("John", "Jane", "Jane"),
city = c("NY", "DAL", "HOU"),
x = c(0, 1, 0)
)
df
## # A tibble: 3 × 5
## id id_row name_first city x
## <dbl> <dbl> <chr> <chr> <dbl>
## 1 1 1 John NY 0
## 2 2 1 Jane DAL 1
## 3 2 2 Jane HOU 0
Within ID, the values of name_first and city should be constant. The values for id_row and x need not be constant. I want to test for and inspect differing values of name_first and city within-id.
One convenient way to do this is with mutate()
and across()
.
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
)
## # A tibble: 3 × 7
## # Groups: id [2]
## id id_row name_first city x name_first_all_match city_all_match
## <dbl> <dbl> <chr> <chr> <dbl> <lgl> <lgl>
## 1 1 1 John NY 0 TRUE TRUE
## 2 2 1 Jane DAL 1 TRUE FALSE
## 3 2 2 Jane HOU 0 TRUE FALSE
The issue is that the “all_match” (calculated) columns are added to the far right of the data frame rather than immediately after their source column. This makes it difficult to visually inspect differing values for the columns of interest.
Of course, in this small data frame, we could easily rearrange the columns using select()
.
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
select(id, id_row, starts_with("name_first"), starts_with("city"), x)
## # A tibble: 3 × 7
## # Groups: id [2]
## id id_row name_first name_first_all_match city city_all_match x
## <dbl> <dbl> <chr> <lgl> <chr> <lgl> <dbl>
## 1 1 1 John TRUE NY TRUE 0
## 2 2 1 Jane TRUE DAL FALSE 1
## 3 2 2 Jane TRUE HOU FALSE 0
The issue with that approach is that it quickly becomes pretty cumbersome with more columns. A more tractable approach would be to sort the names alphabetically…
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
select(sort(names(.)))
## # A tibble: 3 × 7
## # Groups: id [2]
## city city_all_match id id_row name_first name_first_all_match x
## <chr> <lgl> <dbl> <dbl> <chr> <lgl> <dbl>
## 1 NY TRUE 1 1 John TRUE 0
## 2 DAL FALSE 2 1 Jane TRUE 1
## 3 HOU FALSE 2 2 Jane TRUE 0
…but in my situation I need to preserve the original column order. I’d also prefer to stick with Tidyverse solutions if possible.
Any ideas are appreciated!
6.1 Solution by LMc:
# Use select because it allows for more complex column selection when working
# with more complex data frames.
inspect_cols <- df %>% select(name_first, city) %>% names()
# Set column order ahead of time. This assumes that you know the names of each
# of the columns you want to inspect
col_order <- purrr::map(
names(df),
function(x) {
if (x %in% inspect_cols) {
c(x, paste0(x, "_all_match"))
} else {
x
}
}
) %>%
unlist()
df %>%
group_by(id) %>%
mutate(
across(
.cols = all_of(inspect_cols),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
dplyr::select(all_of(col_order))
## # A tibble: 3 × 7
## # Groups: id [2]
## id id_row name_first name_first_all_match city city_all_match x
## <dbl> <dbl> <chr> <lgl> <chr> <lgl> <dbl>
## 1 1 1 John TRUE NY TRUE 0
## 2 2 1 Jane TRUE DAL FALSE 1
## 3 2 2 Jane TRUE HOU FALSE 0