<- data.frame(
demo id = c("001", "002", "003", "004"),
age = c(30, 67, 52, 56),
edu = c(3, 1, 4, 2)
)
17 Exporting Data
The data frames we’ve created so far don’t currently live in our global environment from one programming session to the next because we haven’t yet learned how to efficiently store our data long-term. This limitation makes it difficult to share our data with others or even to come back later to modify or analyze our data ourselves. In this chapter, you will learn to export data from R’s memory to a file on your hard drive so that you may efficiently store it or share it with others. In the examples that follow, we’re going to use this simulated data.
👆 Here’s what we did above:
We created a data frame that is meant to simulate some demographic information about 4 hypothetical study participants.
The first variable (
id
) is the participant’s study id.The second variable (
age
) is the participant’s age at enrollment in the study.The third variable (
edu
) is the highest level of formal education the participant completed. Where:1 = Less than high school
2 = High school graduate
3 = Some college
4 = College graduate
17.1 Plain text files
Most of readr
’s read_
functions that were introduced in the importing plain text files chapter have a write_
counterpart that allow you to export data from R into a plain text file.
Additionally, all of haven
s read_
functions that were introduced in the importing binary files chapter have a write_
counterpart that allow you to export data from R into SAS, Stata, and SPSS binary file formats.
Interestingly, readxl
does not have a write_excel()
function for exporting R data frames as .xls or .xlsx files. However, the importance of this is mitigated by the fact that Excel can open .csv files and readr
contains a function (write_csv()
)for exporting data frames in the .csv file format. If you absolutely have to export your data frame as a .xls
or .xlsx
file, there are other R packages capable of doing so (e.g., xlsx
).
So, with all these options what format should you choose? our answer to this sort of depends on the answers to two questions. First, will this data be shared with anyone else? Second, will we need any of the metadata that would be lost if we export this data to a plain text file?
Unless you have a compelling reason to do otherwise, we’re going to suggest that you always export your R data frames as csv files if you plan to share your data with others. The reason is simple. They just work. we can think of many times when someone sent me a SAS or Stata data set and we wasn’t able to import it for some reason or the data didn’t import in the way that we expected it to. we don’t recall ever having that experience with a csv file. Further, every operating system and statistical analysis software application that we’re aware of is able to accept csv files. Perhaps for that reason, they have become the closest thing to a standard for data sharing that exists – at least that we’re aware of.
Exporting an R data frame to a csv file is really easy. The example below shows how to export our simulated demographic data to a csv file on our computer’s desktop:
::write_csv(demo, "demo.csv") readr
👆Here’s what we did above:
We used
readr
’swrite_csv()
function to export a data frame calleddemo
in our global environment to a csv file on our desktop calleddemo.csv
.You can type
?write_csv
into your R console to view the help documentation for this function and follow along with the explanation below.The first argument to the
write_csv()
function is thex
argument. The value passed to thex
argument should be a data frame that is currently in our global environment.The second argument to the
write_csv()
function is thepath
argument. The value passed to thepath
should be a file path telling R where to create the new csv file.You name the csv file directly in the file path. Whatever name you write after the final slash in the file path is what the csv file will be named.
As always, make sure you remember to include the file extension in the file path.
Even if you don’t plan on sharing your data, there is another benefit to saving your data as a csv file. That is, it’s easy to open the file and take a quick peek if you need to for some reason. You don’t have to open R and load the file. You can just find the file on your computer, double-click it, and quickly view it in your text editor or spreadsheet application of choice.
However, there is a downside to saving your data frames to a csv file. In general, csv files don’t store any metadata, which can sometimes be a problem (or a least a pain). For example, if you’ve coerced several variables to factors, that information would not be preserved in the csv file. Instead, the factors will be converted to character strings. If you need to preserve metadata, then you may want to save you data frames in a binary format.
17.2 R binary files
In the chapter on importing binary files we mentioned that most statistical analysis software allows you to save your data in a binary file format. The primary advantage to doing so is that potentially useful metadata is stored alongside your analysis data. We were first introduced to factor vectors in [Let’s Get Programming] chapter. There, we saw how coercing some of your variables to factors can be useful. However, doing so requires R to store metadata along with the analysis data. That metadata would be lost if you were to export your data frame to a plain text file. This is an example of a time when we may want to consider exporting our data to a binary file format.
R actually allows you to save your data in multiple different binary file formats. The two most popular are the .Rdata format and the .Rds format. we’re going to suggest that you use the .Rds format to save your R data frames. Exporting to this format is really easy with the readr
package.
The example below shows how to export our simulated demographic data to an .Rds file on our computer’s desktop:
::write_rds(demo, "demo.rds") readr
👆Here’s what we did above:
We used
readr
’swrite_rds()
function to export a data frame calleddemo
in our globabl environment to an .Rds file on our desktop calleddemo.rds
.You can type
?write_rds
into your R console to view the help documentation for this function and follow along with the explanation below.The first argument to the
write_rds()
function is thex
argument. The value passed to thex
argument should be a data frame that is currently in our global environment.The second argument to the
write_csv()
function is thepath
argument. The value passed to thepath
should be a file path telling R where to create the new .Rds file.You name the .Rds file directly in the file path. Whatever name you write after the final slash in the file path is what the .Rds file will be named.
As always, make sure you remember to include the file extension in the file path.
To load the .Rds data back into your global environment, simply pass the path to the .Rds file to readr
s read_rds()
function:
<- readr::read_rds("demo.rds") demo
There is a final thought we want to share on exporting data frames. When we got to the end of this chapter, it occurred to me that the way we wrote it may give the impression that that you must choose to export data frames as plain text files or binary files, but not both. That isn’t the case. we frequently export our data as a csv file that we can easily open and view and/or share with others, but also export it to an .Rds file that retains useful metadata we might need the next time we return to our analysis. we suppose there could be times that your files are so large that this is not an efficient strategy, but that is generally not the case in our projects.