From presidential campaigns to medical trials to Netflix recommendations, data analysis is at the heart of the 21st century world. Fundamentals of Data Science is a six-week summer course that will teach you how to collect, analyze, and visualize data. Students will conduct an independent research project into any topic of their choice, culminating in a final analysis that can be used as a writing sample for college, internship, and job applications.
Students will learn:
First, we'll setup and get familiar with the tools you'll be using throughout this course - a programming language called R and a free application to make working with R easier called RStudio.
What is data? What are the different types of data? How do we store and interact with data?
Here, we dive more deeply into data manipulation to learn how to work with data to answer questions that we are interested in.
We enter the world of data visualization in R. How can we use data visualization to communicate ideas and analyses? What makes a successful graphic?
How do we learn from data? We improve our data manipulation and visualization skills by calculating statistics and visualizing trends.
So far, we have worked exclusively with a single dataset in normal spreadsheet format. How do we work with more complex data, like geographic data for maps? How can we combine information from multiple data sources?
We have already covered an astounding amount of material. Here, we take a step back and review what we have learned so far. We also cover a few advanced plotting techniques.
Many interesting research questions involve text - song lyrics, books, campaign speeches, Tweets, etc. Here, we'll learn how to work with text data in more detail:
A great deal of scientific research explores the relationship between multiple variables. Does increased school funding improve student outcomes? Does smoking cause lung cancer? These are fundamentally questions of causal inference, which we will discuss here:
The purpose of data science is to gain knowledge and insights from data. The ultimate goal of this course is to prepare you to use the tools of data science to conduct an independent analysis of your own. This is your opportunity to apply everything we’ll learned in the class to a topic that you are passionate about.
The final project instructions are found here.