How do I reinstall my packages for R 4.0 when many of them came from Github?

Motivation

R 4.0 is here, and that’s great. Be warned that if you install it, you will lose all of your installed packages. How should you reinstall them?

First, there’s something to be said for installing as you go rather than inheriting all of your packages from your past self. A major upgrade can be a good excuse to clean house, trim unnecessary stuff from your computer, and install packages from scratch. As much as I like to take this approach myself, I have to teach next week, so I want my R environment established quickly.1

If you don’t particularly want to install-as-you-go, one way to revitalize your R environment is to record which packages you have installed and automate their re-installation with some code. You can find several helpful online guides that walk through the main idea: save the names of installed packages as a vector, and then pass these names to install.packages() to do a batch installation from CRAN.

But what about the packages that don’t live on CRAN? My experiments with R have led me to install several packages from Github: maybe they were “development versions” that had yet to be published on CRAN, or they will never be on CRAN due to policy incompatibilities or the wishes of the package developer. (See Reason #5 in this tweet.) How can we automate the reinstall process when these packages have different online sources? This post walks through a process for doing that, adapted from this Gist that I shared on Twitter. I adapted the whole thing to go into a little Github repository if you’d like to fork/clone that instead.

Roadmap

The routine has the following main ideas.

  1. What packages have I installed, and where did they come from?
  2. Check my local package versions against the CRAN versions. Are the most recent versions on CRAN, or was I using a Github version that I should keep using?
  3. Render unto Caesar: install the packages from CRAN that makes sense to get from CRAN, but install the packages from Github that makes sense to get from Github.

What is already installed?

First we collect information on the packages we already have installed. I am working out of a project directory that manages all of my R updating business, so I use the {here} package consistent with a project-oriented R workflow. We will also use tidyverse-style data manipulation.

library("here")
library("tidyverse")
## Warning: package 'tibble' was built under R version 4.0.2

The installed.packages() function returns a table of package information for all packages in your library. I convert this to a tibble to make things easier.

# data frame of all installed packages
local_pkgs <- installed.packages() %>%
  as_tibble() %>%
  print()
## # A tibble: 392 x 16
##    Package LibPath Version Priority Depends Imports LinkingTo Suggests Enhances
##    <chr>   <chr>   <chr>   <chr>    <chr>   <chr>   <chr>     <chr>    <chr>   
##  1 abind   /Libra… 1.4-5   <NA>     "R (>=… method… <NA>       <NA>    <NA>    
##  2 adagio  /Libra… 0.7.1   <NA>     "R (>=… graphi… <NA>       <NA>    <NA>    
##  3 ade4    /Libra… 1.7-15  <NA>     "R (>=… graphi… <NA>      "ade4Tk… <NA>    
##  4 AER     /Libra… 1.2-9   <NA>     "R (>=… stats,… <NA>      "boot, … <NA>    
##  5 Amelia  /Libra… 1.7.6   <NA>     "R (>=… foreig… Rcpp (>=… "tcltk,… <NA>    
##  6 arrayh… /Libra… 1.1-0   <NA>      <NA>   method… <NA>       <NA>    <NA>    
##  7 askpass /Libra… 1.1     <NA>      <NA>   sys (>… <NA>      "testth… <NA>    
##  8 assert… /Libra… 0.2.1   <NA>      <NA>   tools   <NA>      "testth… <NA>    
##  9 audio   /Libra… 0.1-7   <NA>     "R (>=… <NA>    <NA>       <NA>    <NA>    
## 10 backpo… /Libra… 1.1.7   <NA>     "R (>=… utils   <NA>       <NA>    <NA>    
## # … with 382 more rows, and 7 more variables: License <chr>,
## #   License_is_FOSS <chr>, License_restricts_use <chr>, OS_type <chr>,
## #   MD5sum <chr>, NeedsCompilation <chr>, Built <chr>

This is a start, but I also want to know if I got these packages from CRAN or from Github. I can do this with sessioninfo::package_info(), passing a vector of package names to the function.

# get source details (cran, github...) from package_info()
local_details <- 
  sessioninfo::package_info(pkgs = local_pkgs$Package) %>%
  as_tibble() %>%
  select(package, local_version = ondiskversion, source) %>%
  print()
## # A tibble: 378 x 3
##    package      local_version source        
##    <chr>        <chr>         <chr>         
##  1 abind        1.4-5         CRAN (R 4.0.0)
##  2 adagio       0.7.1         CRAN (R 4.0.0)
##  3 ade4         1.7-15        CRAN (R 4.0.0)
##  4 AER          1.2-9         CRAN (R 4.0.0)
##  5 Amelia       1.7.6         CRAN (R 4.0.0)
##  6 arrayhelpers 1.1-0         CRAN (R 4.0.0)
##  7 askpass      1.1           CRAN (R 4.0.0)
##  8 assertthat   0.2.1         CRAN (R 4.0.0)
##  9 audio        0.1-7         CRAN (R 4.0.0)
## 10 backports    1.1.7         CRAN (R 4.0.0)
## # … with 368 more rows

Notice that this new table has 14 fewer rows. That’s because sessioninfo::package_info() isn’t returning the base packages that show up in installed.packages(). That’s fine, since those will come with R 4.0 anyway.

The source column in this new table shows us what we want to know. For instance, let’s look at packages that I have from Github.

filter(local_details, str_detect(source, "Github"))
## # A tibble: 8 x 3
##   package       local_version source                                   
##   <chr>         <chr>         <chr>                                    
## 1 colorout      1.2-2         Github (jalvesaq/colorout@726d681)       
## 2 emo           0.0.0.9000    Github (hadley/emo@3f03b11)              
## 3 ggkeyboard    0.0.0.9009    Github (sharlagelfand/ggkeyboard@b1a965d)
## 4 mRkov         0.0.0.9000    Github (serrat839/mRkov@0f520e8)         
## 5 rethinking    2.00          Github (rmcelreath/rethinking@f393f30)   
## 6 Statamarkdown 0.4.5         Github (hemken/Statamarkdown@506cfc9)    
## 7 texreg        1.36.28       Github (leifeld/texreg@c1da5c8)          
## 8 waffle        1.0.1         Github (hrbrmstr/waffle@3f61463)

Determining install source by comparing package versions

Before you update R, you may be using packages installed from Github, even if those packages are also on CRAN. We want to compare our locally installed package versions against the versions on CRAN. If the CRAN versions are more recent, we can go ahead and get those packages from CRAN. If the Github versions are still the most recent (or the only) versions of some packages, we want to get them from Github.

We will want to get a table of data on CRAN package versions. The available.packages() function returns info for all packages on CRAN.

# available.packages() returns pkg info for ALL pkgs on CRAN.
cran_pkgs <- available.packages() %>% 
  as_tibble(.name_repair = tolower) %>%
  print()
## # A tibble: 16,578 x 17
##    package version priority depends imports linkingto suggests enhances license
##    <chr>   <chr>   <chr>    <chr>   <chr>   <chr>     <chr>    <chr>    <chr>  
##  1 A3      1.0.0   <NA>     R (>= …  <NA>   <NA>      randomF… <NA>     GPL (>…
##  2 aaSEA   1.1.0   <NA>     R(>= 3… "DT(>=… <NA>      knitr, … <NA>     GPL-3  
##  3 AATtoo… 0.0.1   <NA>     R (>= … "magri… <NA>      <NA>     <NA>     GPL-3  
##  4 ABACUS  1.0.0   <NA>     R (>= … "ggplo… <NA>      rmarkdo… <NA>     GPL-3  
##  5 abbyyR  0.5.5   <NA>     R (>= … "httr,… <NA>      testtha… <NA>     MIT + …
##  6 abc     2.1     <NA>     R (>= …  <NA>   <NA>      <NA>     <NA>     GPL (>…
##  7 abc.da… 1.0     <NA>     R (>= …  <NA>   <NA>      <NA>     <NA>     GPL (>…
##  8 ABC.RAP 0.9.0   <NA>     R (>= … "graph… <NA>      knitr, … <NA>     GPL-3  
##  9 abcADM  1.0     <NA>     <NA>    "Rcpp … Rcpp, BH  <NA>     <NA>     GPL-3  
## 10 ABCana… 1.2.1   <NA>     R (>= … "plotr… <NA>      <NA>     <NA>     GPL-3  
## # … with 16,568 more rows, and 8 more variables: license_is_foss <chr>,
## #   license_restricts_use <chr>, os_type <chr>, archs <chr>, md5sum <chr>,
## #   needscompilation <chr>, file <chr>, repository <chr>

We only care about the packages in this table that we have already installed, so we will narrow the table down using a join.

slimmer_frame <- 
  left_join(
    x = select(local_details, package, local_version, source),
    y = select(cran_pkgs, package, cran_version = version)
  ) %>%
  print()
## # A tibble: 378 x 4
##    package      local_version source         cran_version
##    <chr>        <chr>         <chr>          <chr>       
##  1 abind        1.4-5         CRAN (R 4.0.0) 1.4-5       
##  2 adagio       0.7.1         CRAN (R 4.0.0) 0.7.1       
##  3 ade4         1.7-15        CRAN (R 4.0.0) 1.7-16      
##  4 AER          1.2-9         CRAN (R 4.0.0) 1.2-9       
##  5 Amelia       1.7.6         CRAN (R 4.0.0) 1.7.6       
##  6 arrayhelpers 1.1-0         CRAN (R 4.0.0) 1.1-0       
##  7 askpass      1.1           CRAN (R 4.0.0) 1.1         
##  8 assertthat   0.2.1         CRAN (R 4.0.0) 0.2.1       
##  9 audio        0.1-7         CRAN (R 4.0.0) 0.1-7       
## 10 backports    1.1.7         CRAN (R 4.0.0) 1.2.0       
## # … with 368 more rows

Using this slimmer table, we categorize the sources of these packages and where we want to reinstall them from. Here is the basic idea: We want to install from Github only if our local Github version is more recent than the CRAN version. This also applies when there is no version of a package on CRAN.

Stated another way, we install a package from CRAN in any case that the CRAN version is more recent than the local version. This is true even if the local version was installed from Github! Remember, we don’t install from Github simply because we did so in the past. We install from Github if there is no better choice.

There are edge cases to be aware of: we may find that the CRAN version of a package is behind our local version, even if the local version was installed from CRAN. This happens for (at least) two reasons: if a package version was reverted on CRAN (which appeared to happen in the case of StanHeaders), or if the package is currently unavailable for installation from CRAN (due to some incompatibility, perhaps).

The code below does this categorization using the utils::compareVersion() function, which interprets the version numbers so we don’t have to. We do an additional step to note the Github repostory for any package that we still want to obtain from Github.

compare_frame <- slimmer_frame %>%
  group_by(package) %>% 
  mutate(
    source_locale = case_when(
      compareVersion(local_version, cran_version) == 1 &
        str_detect(source, "Github") ~ "Github",
      compareVersion(local_version, cran_version) == 1 &
        is.na(cran_version) &
        str_detect(source, "CRAN") ~ "Unavailable on CRAN",
      compareVersion(local_version, cran_version) == 1 &
        (is.na(cran_version) == FALSE) &
        str_detect(source, "CRAN") ~ "Downgraded on CRAN",
      compareVersion(local_version, cran_version) %in% c(-1, 0) ~ "CRAN"
    ),
    github_repo = case_when(
      source_locale == "Github" ~ 
        str_split(string = source, pattern = "@", simplify = TRUE)[,1] %>%
        str_replace("Github \\(", ""),
      TRUE ~ as.character(NA)
    ),
  ) %>%
  ungroup() %>%
  print()
## # A tibble: 378 x 6
##    package     local_version source       cran_version source_locale github_repo
##    <chr>       <chr>         <chr>        <chr>        <chr>         <chr>      
##  1 abind       1.4-5         CRAN (R 4.0… 1.4-5        CRAN          <NA>       
##  2 adagio      0.7.1         CRAN (R 4.0… 0.7.1        CRAN          <NA>       
##  3 ade4        1.7-15        CRAN (R 4.0… 1.7-16       CRAN          <NA>       
##  4 AER         1.2-9         CRAN (R 4.0… 1.2-9        CRAN          <NA>       
##  5 Amelia      1.7.6         CRAN (R 4.0… 1.7.6        CRAN          <NA>       
##  6 arrayhelpe… 1.1-0         CRAN (R 4.0… 1.1-0        CRAN          <NA>       
##  7 askpass     1.1           CRAN (R 4.0… 1.1          CRAN          <NA>       
##  8 assertthat  0.2.1         CRAN (R 4.0… 0.2.1        CRAN          <NA>       
##  9 audio       0.1-7         CRAN (R 4.0… 0.1-7        CRAN          <NA>       
## 10 backports   1.1.7         CRAN (R 4.0… 1.2.0        CRAN          <NA>       
## # … with 368 more rows

Updating R and reinstalling packages

When we are satisfied with our decisions about where to install a package from, save this comparison table to file.

# output data location
dir.create(here("data"))

# output file
out_file <- as.character(str_glue("pkg-data_{Sys.Date()}.rds"))
write_rds(compare_frame, here("data", out_file))

After updating, we reopen R and install our packages according to our classifications. We want to begin by installing {remotes} to enable installation from Github.

# to install from github
install.packages("remotes")

# should still be operating in your working directory
# so downloading {here} makes sense also
install.packages("here")

# read package data
pkgs <- readRDS(here::here("data", "pkg-data_2020-04-25.rds"))

Everything that we want to install from Github, we install by iterating remotes::install_github over the github repository slugs that we saved previously. This requires us to write the code in “hard mode” because we aren’t using tidyverse dialect.

# install from github repos
github_pkgs <- pkgs[pkgs$source_locale == "Github", ][["github_repo"]]

remotes::install_github(github_pkgs)

Finally, everything that we said we wouldn’t get from Github, we get by using plain ol’ install.packages(). Before doing this, you may find it beneficial to filter out some of the packages that you don’t use anymore or that maybe we only installed as dependencies for other packages.

# install from cran with remaining package names
cran_pkgs <- pkgs[pkgs$source_locale != "Github", ][["package"]]

install.packages(cran_pkgs)

This should get you (more or less) up and running with R 4.0 and all of your old packages.

Fair warning: I’m already getting burned by some C++ configuration problems for packages that want to compile from source. I think this is particular to my own computer and the klugey fixes I undertook to set up {rstan} with MacOS Catalina. I remember reading somewhere that R 4.0 fixed some of the Stan x Catalina problems, so maybe I will confront these choices again soon, but I will cross that bridge when I get to it.


  1. And maybe I will do a fresh install of everything once the semester is over.