Real health data is complex, often unstructured, at times inaccurate, inconsistent, contains missing values, and is organized for clinical care rather than to meet analytic needs. Learning from health data requires a solid grasp of data operations, data visualization, statistics, and machine learning, as well as an understanding of ethical and legal frameworks guiding health data privacy and security. Students in this course will learn foundational topics in data science focused on health data and will apply this knowledge on real health datasets through hands-on labs integrated into lectures. The course is based on two large themes: (a) understanding health data, and (b) making inferences about data. Students will develop a systematic working understanding of R, one of the most widely used languages for data science, and an introductory understanding of several packages useful in analyzing health data. They will participate in a group project focused on answering a health-related question. After completing this course, students should be able to securely store a health data set, summarize its structure, merge tables, visualize relationships, reshape and subset it to meet analytic needs, deal with missing values, apply statistical and machine learning methods to build prediction models, and evaluate the performance of those models. 

Program Year



Course Number

LHS 610

Course Name

Exploratory Data Analysis for Health

Practice Area / Category