5  Stata

While working on the L2C main outcomes paper, I wanted to double check some of my results in Stata. It turns out you can run Stata code directly from within R; albeit with some limitations. This note needs some work, but the basics are here.

The first thing we need to do is load the RStata package.

library(dplyr, warn.conflicts = FALSE)
library(RStata) # For Stata

Then, we need to tell R where to find Stata on our local computer and which version of Stata we are using.

# For Stata
options("RStata.StataPath" = "/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp")
options("RStata.StataVersion" = 18)

Then, we can run Stata directly in R code chunks in RStudio. For example, let’s calculate the frequency and percentage of observations of each cylinder type in the mtcars data set.

  1. Pass a Stata command (as a character string) – or a do file – to the src argument of the stata() function.
  2. Pass the name of a data frame that currently exists in the R global environment to the data.in argument of the stata() function.
stata(
  src = "tabulate cyl",
  data.in = mtcars
)
. tabulate cyl

        cyl |      Freq.     Percent        Cum.
------------+-----------------------------------
          4 |         11       34.38       34.38
          6 |          7       21.88       56.25
          8 |         14       43.75      100.00
------------+-----------------------------------
      Total |         32      100.00

This works, but we can’t use these estimates directly. However, we can save the estimation results and use them in further operations. See this SO post for more discussion.

# Create a temp file to save the results to.
# I don't want to create a dta file on my desktop every time I build this book.
tmpdir <- tempfile()
dir.create(tmpdir, showWarnings = FALSE, recursive = TRUE)
tempfile <- file.path(tmpdir, "stata_out.dta")
stata_tabulate_results <- stata(
  paste0( # Using past to inject the temp file. Normally this isn't necessary
    '
    tabulate cyl, matcell(x)
    clear
    svmat x
    ',
    'save ', tempfile
  ),
  data.in = mtcars,
  data.out = TRUE
)
. 
.     tabulate cyl, matcell(x)

        cyl |      Freq.     Percent        Cum.
------------+-----------------------------------
          4 |         11       34.38       34.38
          6 |          7       21.88       56.25
          8 |         14       43.75      100.00
------------+-----------------------------------
      Total |         32      100.00
.     clear
.     svmat x
number of observations will be reset to 3
Press any key to continue, or Break to abort
Number of observations (_N) was 0, now 3.
.     save /var/folders/_6/rzs6b7fd09d1_4sw4vyrw8900000gs/T//Rtmp1JXCEK/file144
> d11c6609ce/stata_out.dta
file
    /var/folders/_6/rzs6b7fd09d1_4sw4vyrw8900000gs/T//Rtmp1JXCEK/file144d11c6
    > 609ce/stata_out.dta saved

Without the tempfile code, it would look like this.

stata_tabulate_results <- stata(
  '
  tabulate cyl, matcell(x)
  clear
  svmat x
  save "/Users/bradcannell/Desktop/t.dta"
  ',
  data.in = mtcars,
  data.out = TRUE
)

I still have a lot to learn about saving the results from Stata commands. In the future, we may want to add notes from “/Users/bradcannell/Library/CloudStorage/Dropbox/Stata/Notes/Example Do Files/Matrix and Saved Results.do”