R Graphics

Generating graphics for statistical analysis using R

Bo Werth
Statistician STI/EAS

Graphics in statistical software

programming

  • quality: reproducibility, calculations

  • efficiency gains: scaling

  • output flexibility: PDF, Word, HTML

interactive creation (e.g. Excel, Tableau)

  • development time

  • changing requirements

  • maintenance and transparency

R Graphics Systems

google trends: python, r, ggplot

traditional graphics

  • graphics facilities of the S language
  • fast, intuitive
  • create plot with high-level function
  • add elements with low-level functions

grid graphics system

  • produce complete plots
  • ideal proportions
  • facetting or multi-panel conditioning
  • trellis, lattice, ggplot

The organization of R graphics

R graphics system

Publication quality graphics with ggplot

  • implementation of Leland Wilkinson's "grammar of graphics" (2005)
  • independent components that can be composed in different ways
  • not limited to a set of pre-specified graphics
  • create new graphics that are precisely tailored

Interactive Graphics using JavaScript

  • ggvis: ggplot for dynamic charts based on vega.js
  • rCharts: interface to high-level js libraries building e.g. on d3.js
    • highcharts, polyplot, nvd3, ricksaw for statistical charts
    • crosslet, datamaps, leaflet etc. for map visualisations

traditional S-PLUS graphics

  • pen on paper model:
    • can only draw on top of the plot
    • cannot modify or delete existing content
  • no (user accessible) representation of the graphics
  • includes both tools for drawing primitives and entire plots
  • generally fast, but have limited scope

traditional S-PLUS graphics: example(plot)

plot of chunk plotexample

traditional S-PLUS graphics: plot regions

Single plot regions

Single Plot Regions

Multiple plot regions

Multi Plot Regions

traditional S-PLUS graphics: plot regions

op <- par(mfrow = c(2, 2),
          mar = c(3, 0, 0, 0))
plot(...); plot(...); plot(...); plot(...)
## At end of plotting, reset to previous settings:
par(op)
  • the documentation can be looked up with ?par()
  • margins are measured in multiples of lines of text
  • modifying traditional graphics state settings via par() has a persistent effect

traditional S-PLUS graphics: plot regions

op <- par(mfrow = c(2, 2),
          mar = c(3, 0, 0, 0))
plot(...); plot(...); plot(...); plot(...)
## At end of plotting, reset to previous settings:
par(op)
  • mfrow and mfcol control the number of figure regions on a page

par mfrow

par mfcol

traditional S-PLUS graphics: controlling plot regions

Graphics state settings controlling plot regions

  • diagram for controlling widths and horizontal locations
  • plot region = figure region - figure margins
  • plt: location of the plot region (l, r, b, t)
  • pin: size of the plot region, (width, height)
  • pty: m: use all available space, s: preserve square format

traditional S-PLUS graphics: colors and colours

colours()[1:4] # 657 color names
## [1] "white"         "aliceblue"     "antiquewhite"  "antiquewhite1"
col2rgb("transparent") # see the RGB values for a particular color name
##       [,1]
## red    255
## green  255
## blue   255
rgb(1, 0, 0) # Red-Green-Blue triplet of intensities, format #RRGGBB, FF = 255
## [1] "#FF0000"

traditional S-PLUS graphics: pch point symbols

traditional S-PLUS graphics

example(barplot)

plot of chunk barplot

example(boxplot)

plot of chunk boxplotex

traditional S-PLUS graphics

example(pairs)

plot of chunk pairplot

example(persp)

plot of chunk perspplot

traditional S-PLUS graphics: example(stars)

plot of chunk starsegment

plot of chunk starradar

traditional S-PLUS graphics: example(mosaicplot)

plot of chunk mosaic1

plot of chunk mosaic2

traditional S-PLUS graphics: conditioning plot, example(coplot)

plot of chunk coplot1

plot of chunk coplot2

traditional S-PLUS graphics: lm example

plot of chunk lmplot

traditional S-PLUS graphics: Agglomerative Nesting (Hierarchical Clustering)

plot of chunk agnesplot

traditional S-PLUS graphics

plot(hclust(d = dist(USArrests), method = "average"), main=title)

plot of chunk unnamed-chunk-2

lattice / grid graphics: pre and post drawing

plot of chunk oztemp

  • draw map of Australia
  • draw average monthly temperatures for six cities

lattice / grid graphics: embedding plots in grid viewports

plot of chunk viewport

  • create dendrogram object and cut it into four subtrees
  • define lattice panel function to draw the dendrograms
  • make base plot region correspond to the created viewport
  • use traditional plot() function to draw the dendrogram

Layered Grammar of Graphics

A statistical graphic is a mapping from data to

  • geometric objects (points, lines, bars)
  • with aesthetic attributes (colour, shape, size)
  • in a coordinate system (cartesian, polar, map projection)

and optionally entails

  • statistical transformations of the data (binning, counting)
  • faceting to generate the same graphic for different subsets of the dataset

ggplot2 attempts to produce any kind of statistical graphic using

  • a compact syntax and independent components to facilitate extensions
  • the grid package to exercise low-level control over the appearance of the plot

Wickham, H. (2009). ggplot2. doi:10.1007/978-0-387-98141-3

Comparison plot() and qplot()

plot(x, y)

plot of chunk plot1

qplot(x, y)

plot of chunk qplot1

Comparison plot() and qplot()

plot(x, y, type = "l")

plot of chunk plot2

qplot(x, y, geom = "line")

plot of chunk qplot2

Comparison plot() and qplot()

plot(x, y, type = "s")

plot of chunk plot3

qplot(x, y, geom = "step")

plot of chunk qplot3

Comparison plot() and qplot()

plot(x, y, type = "b")

plot of chunk plot4

qplot(x, y, geom = c("point", "line"))

plot of chunk qplot4

mtcars dataset

Data from the 1974 Motor Trend US magazine for 32 automobiles (1973-74 models). The variables are the following:

  • mpg Miles/(US) gallon
  • cyl Number of cylinders
  • disp Displacement (cu.in.)
  • hp Gross horsepower
  • drat Rear axle ratio
  • wt Weight (lb/1000)
  • qsec 1/4 mile time
  • vs V/S
  • am Transmission (0 = automatic, 1 = manual)
  • gear Number of forward gears
  • carb Number of carburetors

Comparison plot() and qplot()

boxplot(wt~cyl,
        data=mtcars, col="lightgray")

plot of chunk boxplotcomp

qplot(factor(cyl), wt,
      data=mtcars, geom=c("boxplot", "jitter"))

plot of chunk boxqplotcomp

Comparison plot() and qplot()

hist(mtcars$wt)

plot of chunk histplotcomp

qplot(mtcars$wt, geom = "histogram",
      binwidth = 0.5, color = factor(0))

plot of chunk histqplotcomp

Comparison plot() and qplot()

cdplot(mtcars$wt, factor(mtcars$cyl))

plot of chunk cdplotcomp

qplot(mtcars$wt, fill=factor(mtcars$cyl),
      geom="density")

plot of chunk cdqplotcomp

diamonds dataset

A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:

  • price price in US dollars ($326-$18,823)
  • carat weight of the diamond (0.2-5.01)
  • cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
  • colour diamond colour, from J (worst) to D (best)
  • clarity a measurement of how clear the diamond is (I1 (worst), SI1, SI2, VS1, VS2, VVS1, VVS2, IF (best))
  • x length in mm (0-10.74)
  • y width in mm (0-58.9)
  • z depth in mm (0-31.8)
  • depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79)
  • table width of top of diamond relative to widest point (43-95)

ggplot2 qplot(): single line of code

qplot(x=carat, y=price, colour=clarity,
  data=diamonds, geom=("point"))

plot of chunk qplotex1

qplot(carat, price,
  data = diamonds, geom = c("point", "smooth"))

plot of chunk qplotex2

ggplot2 ggplot(): add layers for more control using "+"

ggplot(data=diamonds) +
    geom_point(aes(x=price, y=carat, colour=color)) +
    facet_grid(. ~ clarity)

plot of chunk ggplotex

ggplot2 ggplot(): add layers for more control using "+"

ggplot(data = df,aes(x = x,y = y)) + # source: https://plot.ly/ggplot2/geom_errorbar/
    geom_errorbar(aes(ymin = ymin,ymax = ymax), colour = 'steelblue', width = 0.2) +
    geom_errorbarh(aes(xmin = xmin,xmax = xmax), colour = 'steelblue', height = 0.4) +
    geom_point(color = "black", size = 3)

plot of chunk ggplotlayer

ggplot Themes

theme_stata()

plot of chunk ggthemesstata

theme_economist()

plot of chunk ggthemeseconomist

ggplot Themes

theme_fivethirtyeight()

plot of chunk ggthemesfivethirtyeight

Tableau theme_igray()

plot of chunk ggthemesigray

ggplot Themes: ggthemr

pal.crN <- c('#95B3D7','#F79646','#8064A2','#4BACC6','#9BBB59','#C0504D')
ugly <- define_palette(
  swatch = pal.crN, gradient = c(lower = pal.crN[1L], upper = pal.crN[2L]))
ggthemr(ugly)
ggplot(dsamp, aes(x=price, fill=cut)) + geom_bar(binwidth = 500)

plot of chunk ggthemr

ggthemr_reset()

gridSVG: gapminder

gapminder animation

  • gridSVG animates a ggplot object before the output is flattened for export to a graphics device
  • the animation is obtained from mapping to the time dimension (annual since 1950)
  • size = population
  • color = continent
    • blue: Europe
    • red: Asia
    • green: Africa
    • yellow: America

gridSVG: R&D Expendidures

rnd animation

ggvis: /demo/dynamic.r

dynamic stacked bars

ggvis stacked bars

moving data points

ggvis moving points

rCharts: nvd3 Sparklines

p2 <- nPlot(uempmed ~ date, data = economics, type = 'sparklinePlus')
p2$chart(xTickFormat="#!function(d) {return d3.time.format('%b %Y')(new Date( d * 86400000 ));}!#")
p2$print('chart2')

rCharts: highcharts

p3 <- hPlot(Pulse ~ Height, data = MASS::survey, type = "bubble", title = "Zoom demo",
            subtitle = "bubble chart", size = "Age", group = "Exer")
p3$chart(zoomType = "xy"); p3$exporting(enabled = T); p3$print('chart3')

Reporting with knitr

  • use rmarkdown syntax
  • generate charts from data, e.g. using ggplot
  • include key figures in narrative, e.g. descriptive statistics
  • convert to various output formats (Word, PDF, HTML)

knitr logo

Reporting with knitr

knitr output

Reporting with knitr

# A Minimal Example for Markdown
This is a minimal example of using **knitr** to produce an _HTML_ page from _Markdown_.
## R code chunks
Now we write some code chunks in this markdown file:
'''{r computing}
x <- 1+1 # a simple calculator
set.seed(123)
rnorm(5)  # boring random numbers
'''
We can also produce plots:
'''{r graphics}
par(mar = c(4, 4, .1, .1))
with(mtcars, {
  plot(mpg~hp, pch=20, col='darkgray')
  lines(lowess(hp, mpg))
})
'''

Report Templates

brew: generate input files for knitr

apply function to country vector

create.report <- function(x, prepend = "report_icio_tiva_") {
  Rmd.file <- file.path(path.Rmd, paste0(prepend, x, ".Rmd"))
  rmd.file <- file.path(path.rmd, paste0(prepend, x, ".rmd")) # .md doesn't convert hash tags
  brew(file = file.path(path, "report_icio_tiva.brew"), output = Rmd.file)
  knit(input = Rmd.file, output = rmd.file)
  out.file <- paste0(prepend, x, ".rmd")
  return(out.file)
}
coulist <- c("AUT", "DEU", "ESP", "IRL", "USA")
results <- sapply(as.character(coulist), create.report)

brew template example

'''{r preamble, echo = FALSE}
cou <<- '<%= x %>'
country <- as.character(namereg$country[match(cou, namereg$cou)])
natnlty <- as.character(namereg$coupron[match(cou, namereg$cou)])
customtext <- cntext[,colnames(cntext)==cou]
'''
# Trade in Value-added: 'r country'
## EXGRDVA\_EX {#exgrdvaex}
### Domestic value added content of gross exports, 'r year', %
'''{r fig1, fig.path="figures/report_icio_tiva/<%= x %>/", fig.height=5, fig.width=10,
      echo=FALSE, message=FALSE}
    source(file.path(path, "code", "figure1.R"))
'''
'r country' domestic value-added content of its exports is, at 'r .perc1'%, 'r .rel1'
the OECD average in 'r year'.
'r if(!is.na(customtext[1])) customtext[1]'

Reporting Tools and Platforms

References