Project

Your main graded task this semester is a data analysis project.

You will select a dataset. On this dataset you will perform at least two of the following four types of analysis:

  1. Association Rules
  2. Clustering
  3. Dimensionality Reduction
  4. Classification

You will compile your results into a written report and a short in-class presentation.

Grading methodology

Your report and presentation will be evaluated on four distinct tasks, listed below and equally weighted. Each of the four paper tasks tasks is scored:

Acceptable is awarded for a completed report, with no more than 4 minor flaws. Any submission that misses the course deadline will be awarded this grade.

Good is awarded for a completed report without any flaws.

Excellent is awarded for a completed report without any flaws that also contains a description and evaluation of options for alternative analyses, and what characterizes the distinction between them.

Minor flaws include:

  • Significant language or readability issues, making it hard to understand the text
  • Arithmetic or coding errors

Major flaws that award a failing grade to the paper and thereby the course include:

  • Missing a deadline
  • Not covering some of the four requested topics at all
  • Readability issues to the point of our not being able to understand the text

Task

Full description of the dataset you have chosen, including a basic exploratory data analysis.

Good

Description and visualizations of the provenance of the data set (ie who compiled it, when, where, why) and its features (how many variables, what types, what values do they take, how do they relate to each other)

Excellent

Suggestions for what the data set could be used for beyond the plans of the original creators.

Task

Full description of your first chosen analysis method, including analysis results and assumptions or conditions for its applicability.

Good

A comprehensive and visualized report on the structure in the data as well as descriptions of your findings. Explicit listing of necessary assumptions and conditions, as well as checks where applicable.

Excellent

Interpretations of your findings, and method comparisons between your chosen methods and other options in the same class.

Task

Full description of your second chosen analysis method, including analysis results and assumptions or conditions for its applicability. Explicit listing of necessary assumptions and conditions, as well as checks where applicable.

Good

A comprehensive and visualized report on the structure in the data as well as descriptions of your findings.

Excellent

Interpretations of your findings, and method comparisons between your chosen methods and other options in the same class.

Task

You will give a 10 minute presentation in class of your findings

Good

A well illustrated presentation that keeps to the time limit and describes the key findings in your report.

Excellent

A presentation that also remarks on how you chose the methods you chose, and what options you were choosing from.