dplyr mutate(): Create New Variables with mutate

5007

Create new column with dplyr mutate

Create new column with dplyr mutate

dplyr, R package part of tidyverse suite of packages, provides a great set of tools to manipulate datasets in the tabular form. dplyr has a set of core functions for “data munging”,including select(), mutate(), filter(), summarise(), and arrange().

And in this tidyverse tutorial, a part of tidyverse 101 series, we will learn how to use dplyr’s mutate() function. With dplyr’s mutate() function one can create a new variable/column in the data frame. Here we will use dplyr’s mutate() function to create one variable first and multiple variables at the same time.

library("tidyverse")

We will use the fantastic Penguins dataset to illustrate the three ways to see data in a dataframe. Let us load the data from cmdlinetips.com‘ github page.

path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
penguins<- readr::read_csv(path2data)

We can see that our data frame contains multiple variables that are mesaured in milli-meter (mm) and a variable measured in gram (g).

## Parsed with column specification:
## cols(
##   species = col_character(),
##   island = col_character(),
##   bill_length_mm = col_double(),
##   bill_depth_mm = col_double(),
##   flipper_length_mm = col_double(),
##   body_mass_g = col_double(),
##   sex = col_character()
## )

How To Create A New Variable with mutate() in dplyr?

Let us create a single new column using dplyr’s mutate(). We will use an existing column to create the new column or variable.

Our new variable is body_mass in kg and we will compute it from existing variable body_mass_g. To create the new variable, we start with the data frame with the pipe operator and use mutate() function. Inside mutate() function, we specify the name of the new variable we are creating and how exactly we are creating. In this example, we create the new variable body_mass_kg by dividing an existing variable body_mass_g by 1000.

penguins %>% 
  mutate(body_mass_kg = body_mass_g/1000)

We get a data frame with the new column as result. The new variable that we created will be added as the last column. the

## # A tibble: 344 x 8
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <chr>   <chr>           <dbl>         <dbl>            <dbl>       <dbl>
##  1 Adelie  Torge…           39.1          18.7              181        3750
##  2 Adelie  Torge…           39.5          17.4              186        3800
##  3 Adelie  Torge…           40.3          18                195        3250
##  4 Adelie  Torge…           NA            NA                 NA          NA
##  5 Adelie  Torge…           36.7          19.3              193        3450
##  6 Adelie  Torge…           39.3          20.6              190        3650
##  7 Adelie  Torge…           38.9          17.8              181        3625
##  8 Adelie  Torge…           39.2          19.6              195        4675
##  9 Adelie  Torge…           34.1          18.1              193        3475
## 10 Adelie  Torge…           42            20.2              190        4250
## # … with 334 more rows, and 2 more variables: sex <chr>, body_mass_kg <dbl>

Note that creating a new column with mutate() does not change the original dataframe. We get a new dataframe as a tibble.

How to Create two variables with mutate?

We can create two or more new variables using a single mutate function. For example, to create two new columns, we use mutate() fucntions with new variables separated by comma.

In this example below we create two new variables using existing variables.

penguins %>% 
  mutate(body_mass_kg= body_mass_g/1000,
         flipper_length_m = flipper_length_mm/1000)
## # A tibble: 344 x 9
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <chr>   <chr>           <dbl>         <dbl>            <dbl>       <dbl>
##  1 Adelie  Torge…           39.1          18.7              181        3750
##  2 Adelie  Torge…           39.5          17.4              186        3800
##  3 Adelie  Torge…           40.3          18                195        3250
##  4 Adelie  Torge…           NA            NA                 NA          NA
##  5 Adelie  Torge…           36.7          19.3              193        3450
##  6 Adelie  Torge…           39.3          20.6              190        3650
##  7 Adelie  Torge…           38.9          17.8              181        3625
##  8 Adelie  Torge…           39.2          19.6              195        4675
##  9 Adelie  Torge…           34.1          18.1              193        3475
## 10 Adelie  Torge…           42            20.2              190        4250
## # … with 334 more rows, and 3 more variables: sex <chr>, body_mass_kg <dbl>,
## #   flipper_length_m <dbl>


How To Create a Fresh New Column with dplyr’s mutate

In the above examples, we create one or more new columns from an existing columns. We can use mutate() function to create without using existing column as well.

In this example, we use dplyr’s mutate() function to create new column using row number.

penguins %>% 
   mutate(ID=row_number())

This creates ID column at the end of the dataframe.


## # A tibble: 344 x 8
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <chr>   <chr>           <dbl>         <dbl>            <dbl>       <dbl>
##  1 Adelie  Torge…           39.1          18.7              181        3750
##  2 Adelie  Torge…           39.5          17.4              186        3800
##  3 Adelie  Torge…           40.3          18                195        3250
##  4 Adelie  Torge…           NA            NA                 NA          NA
##  5 Adelie  Torge…           36.7          19.3              193        3450
##  6 Adelie  Torge…           39.3          20.6              190        3650
##  7 Adelie  Torge…           38.9          17.8              181        3625
##  8 Adelie  Torge…           39.2          19.6              195        4675
##  9 Adelie  Torge…           34.1          18.1              193        3475
## 10 Adelie  Torge…           42            20.2              190        4250
## # … with 334 more rows, and 2 more variables: sex <chr>, ID <int>

How To Overwrite an Existing Column with dplyr’s mutate

We can also use dplyr’s mutate() function to overwrite an existing column. In the example below, we use mutate() function to overwrite the existing “species” variable.

penguins %>%
  mutate(species= stringr::str_to_upper(species))

We use str_to_upper() function from stringr package to convert the character variable to uppercase variable. Note the values of the first column species is all in upper case now.

## # A tibble: 344 x 7
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <chr>   <chr>           <dbl>         <dbl>            <dbl>       <dbl>
##  1 ADELIE  Torge…           39.1          18.7              181        3750
##  2 ADELIE  Torge…           39.5          17.4              186        3800
##  3 ADELIE  Torge…           40.3          18                195        3250
##  4 ADELIE  Torge…           NA            NA                 NA          NA
##  5 ADELIE  Torge…           36.7          19.3              193        3450
##  6 ADELIE  Torge…           39.3          20.6              190        3650
##  7 ADELIE  Torge…           38.9          17.8              181        3625
##  8 ADELIE  Torge…           39.2          19.6              195        4675
##  9 ADELIE  Torge…           34.1          18.1              193        3475
## 10 ADELIE  Torge…           42            20.2              190        4250
## # … with 334 more rows, and 1 more variable: sex <chr>

The post dplyr mutate(): Create New Variables with mutate appeared first on Python and R Tips.