Data Analysis, Simulation, and Programming in R

For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt…. All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data. [Tukey, J., 1962. The future of data analysis. The Annals of Mathematical Statistics, 33(1), 1–67.]

This course provides training in using the open-source statistical programming environment called R to accomplish 1) real-world, reproducible data analysis and 2) design and implementation of statistical simulations, which are an important tool for evaluating the performance of statistical estimation and inference procedures. Topics covered include:

  • the logic of R’s primary data structures and how to work with functions
  • tools and best practices for accessing, cleaning, and manipulating data
  • reproducibility as a fundamental tenet of high-quality data analysis
  • data visualization techniques
  • selected statistical models and methods that are useful for data-analysis, including linear regression models and generalized linear models.

Content relevant to designing and implementing Monte Carlo simulation studies is interwoven throughout the course.