+ - 0:00:00
Notes for current slide
Notes for next slide
1 / 59

Getting Started

2 / 59

Derailment (Ray's Academic Blog)

Derailment has posts on how to set up a Mac:

There are also posts on how to set up a Windows machine:

There are posts on configuring and updating:

3 / 59

Why Bother?

4 / 59
5 / 59

The Gold Standard

  • Data scientists can archive a compendium of work, which includes the raw data, all the data processing/analyses code, all the analysis software itself and the resulting manuscript in a bundle.

    This perfectly reproducible workflow requires extra R packages like renv and extra software like docker.


  • This also requires extraordinary care to deal with Protected Health Information.
7 / 59

Good Work

  • Non-experts can easily build a project which will knit together all of the data processing and the text, tables and figures for a paper into a single document and share that information with the world.

    This workflow, which excludes the raw data and the analysis software itself, goes a long way toward allowing reproduction of findings.
8 / 59

Where We are Going

  • Use modern R and RStudio to quickly and easily set up a paper outline/shell for a biomedical research article.
  • Easily organize a research project and embed analysis, figures, and tables directly into your article.
  • Stop copying and pasting analysis/tables out of Excel into Word.
  • You get the shell for a paper in just a couple clicks.

9 / 59

Just how quick and easy? Just add rUM...

  1. Click to install a package.
  2. Type rUM. (capitalization matters)
  3. Restart RStudio.
  4. Click New Project... from the File menu.
  5. Click New Directory.
  6. Click Research Project Template.
  7. Choose where to save the project.
  8. Click Knit.

10 / 59

Demo

11 / 59

What does that hard work get you?

The rUM project template creates a folder that contains:

  1. The paper shell, called reproducible.Rmd (you can rename it)
  2. An empty folder, called data, to hold your data
  3. An RStudio project file, with the name you chose
  4. Bibliography files, named references.bib, to hold references to articles and an automatic bibliography file, named packages.bib with references to all the used R packages
  5. A bibliography style file, named the-new-england-journal-of-medicine.csl to format your references
  6. A file, named .gitignore to help protect you from accidentally sharing data
12 / 59
13 / 59

Authoring

14 / 59

How Do I Write/Edit in RStudio?

The document is in a format called R Markdown. It saves with a suffix of .Rmd. You can use point-and-click tools to edit it.

Click the Visual Markdown Editor button:

15 / 59

Editing Tools - Formatting Text

It works like a word processor. Highlight and click a button.

16 / 59

Editing Tools - Spelling

Right click to fix spelling or add to the dictionary:

17 / 59

Navigation

You can jump to sections using the document outline

18 / 59

Navigation

You can jump to sections using the document outline

or the navigation control

18 / 59

Cross Referencing (1)

You can add IDs to sections of your paper (as well as figures/tables) and then put in hyperlinks. To add a section identifier:

19 / 59

Cross Referencing (1)

You can add IDs to sections of your paper (as well as figures/tables) and then put in hyperlinks. To add a section identifier:



19 / 59

Cross Referencing (1)

You can add IDs to sections of your paper (as well as figures/tables) and then put in hyperlinks. To add a section identifier:



19 / 59

Cross Referencing (2)

If your paper is using section numbering, you can use the menu to add links to sections:

20 / 59

Cross Referencing (2)

If your paper is using section numbering, you can use the menu to add links to sections:

:scale 50%

20 / 59

Cross Referencing (2)

If your paper is using section numbering, you can use the menu to add links to sections:

:scale 50%

:scale 50%

20 / 59

Cross Referencing (3)

If your paper is not using section numbering, you need to type some markdown code. The syntax to cross reference a section is:

[label](#link)

where the label is the text for the reader and #link is the section ID.

21 / 59

Cross Referencing (3)

If your paper is not using section numbering, you need to type some markdown code. The syntax to cross reference a section is:

[label](#link)

where the label is the text for the reader and #link is the section ID.

21 / 59

Cross Referencing (3)

If your paper is not using section numbering, you need to type some markdown code. The syntax to cross reference a section is:

[label](#link)

where the label is the text for the reader and #link is the section ID.

21 / 59

Cross Referencing (3)

If your paper is not using section numbering, you need to type some markdown code. The syntax to cross reference a section is:

[label](#link)

where the label is the text for the reader and #link is the section ID.

:scale 100%

21 / 59

Doing Calculations in Text

You can nest a calculation directly into the narrative by clicking the code button.

22 / 59

Doing Calculations in Text

You can nest a calculation directly into the narrative by clicking the code button.

Then type the letter lowercase r and then your formula.

22 / 59

Preprocessing Code Blocks

If you need to do something complicated, like calculating how many subjects were enrolled, you can add a code block. You can do some calculating, then reference the results.

23 / 59

Using a Code Block

Write your R code below the line with {r}. You can later reference any object you have created.

24 / 59

Using a Code Block

Write your R code below the line with {r}. You can later reference any object you have created.

And you can use functions in inline code.

24 / 59

Adding References (1)

Adding references/citations is simple:

25 / 59

Adding References (2)

There are many search engines to help you find your article:

If you use Zotero your complete reference library will be listed.
26 / 59

Adding References - NEJM Style

New England Journal citations go at the end of a sentence.

:scale 100%

The output appears as:


:scale 100%

27 / 59

To Use a Different Bibliography Style

Search the Zotero Style Repository website and save the ".csl" file into your project folder.

Then change the YAML header in the ".Rmd" file to match the file you downloaded:

---
title: "your_title_goes_here"
author: "your_name_goes_here"
date: "r Sys.Date()"
output:
bookdown::html_document2:
number_sections: false
bibliography: [references.bib, packages.bib]
csl: the-new-england-journal-of-medicine.csl
---
28 / 59

Using American Psychological Association (APA) Style

If you download the apa.cls file into the project folder and tweak the YAML header, your references will appear in APA format.

End of Sentence References - use [ ]

:scale 100%

:scale 100%

:scale 100%

Inline References - without [ ]

:scale 100%

:scale 100%

:scale 100%

29 / 59

Using GitHub

30 / 59

Why Bother?

  1. To get a job...
    • Anybody who claims to do data science or who wants a job in industry needs to know how to use git/GitHub.
    • If somebody does not have GitHub on the top of their resume I stop reading.
    • I look at GitHub usage to judge if I want to hire somebody.
  2. Most people in the R world post code on GitHub. You can see how other people solve problems.
  3. If you do something useful you can share.
    • It is nearly no effort to share your algorithms with the world.
  4. You can also do pretty hardcore project management with GitHub.
31 / 59

Using git and GitHub

  • GitHub is a website that lets you share any file with the world.
  • git is software that helps you keep files synchronized and helps resolve conflicts if more than one person has edited the same part of a file.
  • Derailment has a post to help you set up git and GitHub.



  • Obviously you need to be careful that you do not put sensitive data like Protected Health Information (PHI) and Personally Identifiable Information (PII) onto GitHub.
  • If you build a project using rUM, it automatically puts a file into the project folder, called .gitignore, which helps block the sharing of files that it recognizes as data.
32 / 59

How to Make a Project on GitHub

  1. Log into your account.
  2. Create a new project.

    :scale 50%

3. Make a "Repository".

33 / 59

Get the URL so you can sync your local machine.

34 / 59

To Create a Local Copy of the Files that are on GitHub

  1. Open RStudio
  2. Use the File > New Project ...
  3. Choose "Version Control"
  4. Choose "Git"
  5. Add your project URL:

35 / 59

The Git Enabled Project

36 / 59

Notice the Files and the Git Tab

37 / 59

If You Have a Project Created with rUM (1)

You can copy the contents of the rUM folder into this GitHub enabled project folder but this requires a little care....

  • Using RStudio, delete the file called .gitignore that was automatically created in the GitHub enabled project.

38 / 59

If You Have a Project Created with rUM (2)

Move the content of your rUM project folder into the GitHub enabled project folder.

  • Windows
    • Using the Windows File Explore, select all files in the rUM project, then copy and paste the files into the GitHub enabled project. Tell it to overwrite the README file.
  • Mac
    • You need to move the .gitignore file (which is hidden) along with everything else.
    • Show hidden files before you move files. To show hidden files in the Mac Finder, hold down the command key (it looks like ) and the shift key and then tap the . (period/decimal) key.
    • After you move the .gitignore file, you can do ⌘ shift . again to hide the hidden files.
39 / 59

Dealing with Hidden Files on Mac

  • Files whose names starts with a . (period/decimal) are hidden on Mac.

    By default, the RStudio IDE shows some hidden files:

    :scale 80%

    By default, the finder hides all hidden files:

    :scale 80%

40 / 59

Show Hidden Files on Mac

  • Show hidden files with ⌘ shift .

    The finder hiding hidden files:

    :scale 80%

    The finder showing hidden files:

    :scale 80%

41 / 59

After Moving Files to the GitHub Enabled Project

42 / 59

What is the big deal about .gitignore?

  • Notice the set of files that are listed in the Git windowpane.
  • These are files that can be tracked/backed-up/sent to GitHub.
  • The .gitignore specifies files/folders to not track/back-up/sent to GitHub.



  • My .gitignore file lists data files that I work with. If you work with other kinds of files that contain data, tell me and I will add them to my list.
43 / 59

.Ray's .gitignore

44 / 59

How do I move a file to GitHub? (1)

  1. You choose which files.
  2. Add a message to describe the work.
  3. Send it.

45 / 59

How do I move several related files to GitHub? (2)

  1. You choose which files.
  2. Add a message to describe the work.
  3. Send it.

46 / 59

If You Want to Intimidate People

  1. You choose which files. Call this "staging files."
  2. Add a message to describe the work. Call this "committing files."
  3. Send it. Call this "pushing files."

Call your git/GitHub project back-up/repository/web-page a repo.

47 / 59

Modified Analysis File

  • When you save and commit a file, it disappears from the git windowpane.
  • As soon as you modify a file, that you previously committed it it will be listed again in the git windowpane.
    • It will appear with a blue M to indicate the file was modified.
  • When you are happy with your modifications, upload it to GitHub using the same workflow.

To use the git terms:

  1. Save
  2. Stage (check off the box)
  3. Commit with a message (type a summary)
  4. Push (upload)

48 / 59

Dealing with Changes

To use the git terms:

  1. Save
  2. Stage (check off the box)
  3. Commit with a message (type a summary)
  4. Push (upload)

When you are on the commit screen you will see the sections that are changed. Deleted code will be in red. New code will be in green.

49 / 59

The GitHub Repo After Pushing Commits

50 / 59

The GitHub Repo after Pushing Three Commits

51 / 59

Commit History

52 / 59

How do I get my project onto a new computer?

1) Follow the instructions to make a new GitHub project. That will give you the current state of the project.

2) Click the download button whenever you want to update the local information.

3) Copy the data into the project folder.
53 / 59

If You Want to Continue to Intimidate People

  1. Follow the instructions to make a new GitHub project. That will give you the current state of the project. Call this "cloning a project."
  2. Click the download button whenever you want to update the local information. Call this "pulling files."
54 / 59

Want to learn more about R Markdown?

55 / 59

Want to learn more about git/GitHub?

The most popular reference for R git/GitHub integration is Happy Git and GitHub by Jenny Bryan and Jim Hester: https://happygitwithr.com/

56 / 59

Beyond the Basics

57 / 59

Articles on Reproducible Research

Start here:

Incredibly influential:

Important:

58 / 59

Coming in 2021

  • I will "live code" a couple research projects using R and R Markdown and GitHub for CHARM in September 2021.

59 / 59

Getting Started

2 / 59
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow