SCI 2000 - Winter 2021

General Information #

This is the course website for SCI 2000: Introduction to Data Science. This course aims to provide students with an introduction to data science. Specifically, this course will introduce you to tools and hands-on experience needed to analyse data. By the end of the course, students will:

  • Become proficient in R, to the level that they can analyse data using the tools from this class.
  • Be able to describe and analyze data through visualization and simple statistical procedures.
  • Be introduced to statistical thinking and be able to think critically about variation and biases.

Course Details #

The course outline can be downloaded here.

Prerequisites #

Instructor approval.

Textbook #

There is no textbook for this course. Notes will be provided to students through UM Learn, along with additional resources.

Assessments #

The assessments for this course include:

  • Four (4) assignments.
  • Three (3) data analysis summaries.
  • One (1) final project.

Outline of Topics #

The course is expected to cover the following topics:

  • Data visualization
  • Data wrangling
  • Relational data
  • Web scraping
  • Introduction to regular expressions
  • (If time permits) Automation and version control

Throughout the course, the applied topics above will be complemented with an introduction to statistical thinking: how to think about variability, what biases can occur in the data, and how to perform simple statistical procedures (e.g. comparing means, proportions, linear regression).

Statistical Software #

The course requires you to make extensive use of the R statistical software for your assignments and final data project. Sample codes will be provided to students.

You can download R for free (for Windows, Mac, Linux, and Solaris) from the Comprehensive R Archive Network at: https://cran.r-project.org/

For additional resources on R, see here.