Patchwork R package goes nerd viral

20

 

twitter_post.png

Y’all, a few weeks ago I came across the patchwork package created by Thomas Lin Pedersen. When I saw how easy it was to mix and match multiple R plots into one image, I gave it a quick share with some basic highlighting of how to use the package. I knew it was a cool package but I had no idea how excited people would get! For a relatively niche subject on twitter, this tweet got a lot of traction. The tweet interaction was not just with R users! It also had a lot of python users calling for the same functionality with matplotlib.

By the time the hype was over, it reached 6.9K likes and 1.8K retweets. While this volume might not be traditionally viewed as viral on twitter, I would certainly say that this tweet has gone “nerd viral”. It was definitely shocking to me.

To pay homage to the tweet and the package, I’ve decided to conduct a little analysis on the tweet performance and possible impact on package downloads. I’m going to get so meta with this analysis, that I’ll then arrange the resulting plots with patchwork.

Bonus: Adding an image to patchwork

As a bonus, I’ll show y’all how I added an image to the patchwork layout by placing it within a ggplot graph and fixing the coordinates to avoid weird scaling issues.

Set Up

Install and Load the Packages

Thank you to “Dusty” who posted the tip to install and load packages using “easypackages” on my last tutorial.

#install.packages("easypackages")
library(easypackages)
packages("tidyverse", "rtweet", "tidytext", "rtweet", "wordcloud2", "patchwork", "cran.stats", "data.table", 
         "gameofthrones", "ggimage", "magick", "ggpubr", "jpeg", "png")

Set up our colour palette

Using the beautiful Game of Thrones color palette from Alejandro Jiménez in his “gameofthrones” package. Thank you to Divya Seernani for sharing!

#Set the palette using the beautiful GOT Arya palette from Alejandro Jiménez
pal <- got(20, option = "Arya")

#cherry pick a few extended
c <-"#889999"
c2 <- "#AAB7AF"

Add your twitter credentials

Create your twitter authentication token by following the steps in Michael Kearneys beautiful documentation. Replace the “ADD YOUR CREDS” with your own credentials.

#create_token(
#  app = "ADD YOUR CREDS",
#  consumer_key = "ADD YOUR CREDS",
#  consumer_secret = "ADD YOUR CREDS")

1st Plot – Create a plot of the tweet stats (favorites, retweets)

Lookup the tweet and view stats

lt <-lookup_tweets('1229176433123168256')
lt
lt.png

Create a chart with the tweet stats

p1 <- lt %>% 
  rename(Faves = favorite_count, RTs = retweet_count) %>% 
  select(Faves, RTs) %>%  #select only the desired columns
  gather("stat", "value") %>%  #reformat to make the table long which is easier for bar charts to consume
  ggplot(aes(stat, value)) +  #plot the bar chart
  geom_bar(stat="identity", fill=c2) +
  theme_classic() + 
  labs(title = "Tweet Stats",
                           x = "Tweet Statistic", y = "Total")  

p1

 

unnamed-chunk-5-1.png

 

2nd Plot – Create a plot of the tweet stats (favorites, retweets)

Gather approx 1K of the retweet data

The get_retweets() function only allows a max of 100 retweets to be pulled via the API at a time. This is a rate imposed by the twitter API. When pulling this data, I had quite a difficult time. Not only, did a lot of the suggested methods to getting cursors fail, the rate limiting wasn’t consistent. Sometimes I was able to get close to 1K tweets in 100 batches. Sometimes it blocked me for 15 min intervals (as expected). Since this is just an example to show patchwork, I decided to just grab 1K of the retweets which is roughly half of the full set of retweets. Further, I should let you know that I did attempt to put it in a function, but I couldn’t find an appropriate system wait time that would complete in a reasonable time and/or actually return the data. Please reach out if you have a better/proven method! In the meantime, here is my brute force method.

statusid <- '1229176433123168256' #set the first lowest retweet statusid to be the id of the original tweet
rtweets <- get_retweets(statusid, n=100, parse = TRUE) # get 100 retweets
min_id <- min(rtweets$status_id)

rtweets2 <- get_retweets(statusid, n=100, max_id = min_id, parse = TRUE) # get 100 retweets
min_id <- min(rtweets2$status_id)

And repeat as needed, full code is available here.

Graph the most common words used in the retweeters profile descriptions

Most of the techniques used below to process the data and graph the data are taken from the tidy text mining book by Julia Silge and David Robinson

data(stop_words)
#Unnest the words - code via Tidy Text
rtweet_table2 <- rtweet_table %>% 
  unnest_tokens(word, description) %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>%
  filter(!word %in% c('t.co', 'https'))
p2 <- rtweet_table2 %>%
  filter(n> 50) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  theme_classic() +
  geom_col(fill= c) +
  labs(title = "RT Profiles",
       x = "Key Words", y = "Total Occurances")  +
  coord_flip() 

p2

 

unnamed-chunk-7-1.png

 

3rd Plot – Plot the patchwork CRAN download stats

Gather the data

To gather the patchwork download stats, I used the “cran.stats” package. The examples to process the download stats were very easy to follow and I used them as the basis for gathering the data. See examples here.

dt = read_logs(start = as.Date("2020-02-01"), 
               end = as.Date("2020-02-29"), 
               verbose = TRUE)
patchwork <- stats_logs(dt, type="daily", packages=c("patchwork"), 
                        dependency=TRUE, duration = 30L)

Plot the CRAN download data

I plotted the download data using the ggplot, the geom_line() function and just a little extra fanciness to annotate the graph with the annotate() function. Great annotation examples here

p3 <- ggplot(patchwork, aes(x=key, y=tot_N, group=1)) +
  geom_line() + theme_classic() + theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  ylim(0, 1500) + 
  labs(title = "Downloads of the R Patchwork Package",
       x = "Date", y = "Total Downloads") + 
  annotate("rect", xmin = "2020-02-16", xmax = "2020-02-20", ymin = 400, ymax = 900,
           alpha = .3, fill = c2)  +
  annotate(
    geom = "curve", alpha = 0.3, x = "2020-02-14", y = 650, xend = "2020-02-17", yend = 800, 
    curvature = .3, arrow = arrow(length = unit(2, "mm"))
  ) +
  annotate(geom = "text", x = "2020-02-07", y = 650, label = "Nerd viral #rstats tweet", hjust = "left", alpha = 0.5)

p3

 

unnamed-chunk-9-1.png

 

Add the plots to the same graphic using patchwork

As is the focus of this post, when this package was shared on twitter, people were very excited about it. The patchwork package was created by Thomas Lin Pedersen. Not only is it incredibly easy to use, it comes with great documentation

Try a few layouts

Using the plots p1, p2, p3 created above, try a few layouts following the package documentation

p1/p2/p3
unnamed-chunk-10-1.png
p1 + p2 + p3
unnamed-chunk-10-2.png
p1/ (p2 +p3)
unnamed-chunk-10-3.png
#Final Layout
p <- p3 / (p1 + p2)
p
unnamed-chunk-10-4.png

Annotate the final layout

We will select the final layout from the above code block and then add some overall titles, captioning and formatting. This example was covered in the excellent patchwork annotation guide

p + plot_annotation(
  title = 'Patchwork Went Nerd Viral',
  caption = 'Source: @littlemissdata'
) & 
  theme(text = element_text('mono'))
unnamed-chunk-11-2.png

Add an image to the patchwork graphic

Bring in the image

Using an empty ggplot and the background_image() function, you can bring an image into a graph object. Further, you can prevent image resizing with the coord_fixed() function. This is important so the actual image doesn’t get resized with the patchwork placement.

twitter <- image_read('https://raw.githubusercontent.com/lgellis/MiscTutorial/master/Patchwork/twitter_post.png')
twitter <- ggplot() +
  background_image(twitter) + coord_fixed()

Plot the image with patchwork

pF <- twitter + (p3/ (p1 + p2))

pF + plot_annotation(
  title = 'Patchwork Went Nerd Viral',
  caption = 'Source: @littlemissdata'
) 
finalPatchworkWithTwitter.png

Thank You

Please comment below if you enjoyed this blog, have questions, or would like to see something different in the future.  Note that the full code is available on my  github repo.  

If you have trouble downloading the files or cloning the repo from github, please go to the main page of the repo and select “Clone or Download” and then “Download Zip”. Alternatively or you can execute the following R commands to download the whole repo through R

use_course("https://github.com/lgellis/MiscTutorial/archive/master.zip")