R Independent Study (Archive)



This introductory R course (formally known as Stat 390EF) is now open for free to anyone who wants to learn R as a non-credit course. It has exercises that mark you right or wrong with explanations.
No need to register or pay, and no certificate of any sort is granted upon completion.
Learning R is its own reward and we wanted to make the course available to anyone in the world who wants to learn!

Syllabus


This course was designed for Stat 200 students in sections L1, L2 or ONL (before Fall 2018) who were interested in learning R. Students with solid background in computer programming and had taken STAT 200 or STAT 212 were encouraged to consider taking STAT 385 instead.

This was a 2-credit online independent study. Students were expected to learn the material on their own. There were no class meetings or face-to-face discussion groups. Students were anticipated spending 6 – 8 hours a week. Not everyone can learn well in this setting, so this was not a course for everyone. All the required course work were posted on the lesson page. The grade was determined by the weekly Lon Capa assignments due every Sunday at 11:59 pm.

Credit Hours: This was a 2-hour credit course.

Pre-requisite: Students were ecpected to have taken a basic one-semester statistics course like Stat 100, and were concurrently taking/have taken Stat 200 in section L1, L2 or ONL.


Course Staff

Instructor: Ellen Fireman

Course Developer: Yuk Tung Liu

Course Assistants: David Collier and Karle Flanagan

TAs/Graders


Course Goal and Philosophy

Upon completion of this course, students would be expected to be able to use R to perform various statistical analyses they had learnerd in Stat 100 and Stat 200.

R is more than a set of separate little calculator-like commands. It is a full programming language with an internal logic. As with any language, acquiring fluency requires real practice. Our exercises are mostly not cut-and-paste phrases. Instead we build toward real-world use.


Materials

Fortunately there are already many high-quality free resources available for R learners and users. We structure the course around those materials. We use mainly three resources in this course.

Textbook: R Programming for Data Science by Roger D. Peng.
This is an ebook. The suggested price is $20, but you can get it free if you want. More information on the book and how to purchase it can be found in Week 1's notes.

swirl: A software package written in R. It provides interactive lessons for beginners to learn R. Instructions for installing swirl are given in Week 1's notes.

Weekly Reading Assignment: There is an html reading assignment every week. The links to the notes are posed on our lesson page. In these notes, we demonstrate how R can be used to tackle problems encountered in Stat 100 and Stat 200.

Lon Capa Assignments: We integrated these materials into a set of lesson plans with weekly Lon Capa homework assignments, mostly using individually randomized data sets and graded automatically. Most of these problems are not very hard, but since each student had a slightly different data set, copying answers would not work! Starting from Week 7, students were given problems that required them to write codes and explain the process of analyzing the data. These problems were hand-graded by the TAs.

There were two types of assignments: weekly regular homework assignment and weekly quiz. The purpose of the regular homework assignments was to give dtudents practice for the R commands and using R to solve problems in statistics. Students got instant feedbacks on whether or not their answers were correct. The quizzes were designed to test student's knowledge of R commands and skill in using R to solve problems. Unlike the regular homework assignments, students wouldn't get any feedbacks on their submitted answers to those quizzes until after the due dates, but they could change your answers many times before the due dates. Students wouldn't get any help from the TAs on the quizzes, except for clarification of questions.

Late assignments were NOT accepted on Lon-Capa. However, Lon-Capa graded each problem in the assignment separately, so if a student did 70% of the homework correctly before the due date, he/she got credit for that 70%. The lowest regular homework score and lowest quiz score from Weeks 2-12 were dropped at the end of the semester (all assignments in Weeks 13 and 14 were counted).

All the Lon-Capa assignments are now open. They can be asscessed on the Lon Capa Exercises page.


Grading

The grades were based on the weekly Lon Capa assignments due every Sunday at 11:59pm. The regular homework assignments counted 90% and quizzes counted 10% of the total grade. There were no exams or projects. That means everyone could do well just by hard work.

Overall grade was translated into a letter grade as follows:

A+ 97-100 A 93-96.99 A- 90-92.99
B+ 87-89.99 B 83-86.99 B- 80-82.99
C+ 77-79.99 C 73-76.99 C- 70-72.99
D+ 67-69.99 D 63-66.99 D- 60-62.99
F < 60

Bonus Points: There were several bonus assignments. The first was the syllabus quiz, given on the first week and due by the end of the second week. The last was a survey on this course, which were given at the end of the semester. Additional 4 sets of bonus R problems were posted in the middle of the the semester. All these bonus problems are now open and are mixed with the regular HW assignments on the Lon Capa Exercises page. Bonus points could only help students. Students could still get 100% without doing any bonus work. Bonus points were figured into the grade as follows:

(Percentage on Required Work) + 0.25×(Percentage on Bonus Points)
100 + 0.25×(Percentage on Bonus Points)

Suppose at the end of the semester a student had an 80% average and got 90% of the bonus work. The course total would be (80 + 0.25×90)/(100 + 0.25×90) = 102.5/122.5 = 83.67%. So the grade would be raised from a B- to a B.



Course Schedule

Lessons were posted here for the entire semester.

Week Topics
1 Introduction, installation of R
2 Data types, missing values, vectorized operations
3 Loading data files, subsetting, statistical functions
4 Control functions and logical operations
5 Simple data manipulations
6 Writing functions, plottings
7 R markdown, simple linear regressions
8 Loop functions, regression with factor variables
9 Simulations, multivariable regressions
10 Date and time in R, Introduction to Monte Carlo simulations
11 Statistical Tests, optional: regular expressions
12 Transformation of variables
13* Logistic regression
14 Nonparametric statistics

* Note that Week 13 is 1.5-week long.