14  Assessment 4

A project showing power simulation

14.1 General Instructions

Your task is to replicate this html report using the same datasets from assessment 3 and code. The start of the assessment is identical to assessment 3, with additional custom function creation, iteration, analysis, and power simulation.

As for assessments 2 and 3, the submission must contain the following:

As for assessment 3, your code should demonstrate the following skills:

Specific to assessment 4, you must demonstrate the following skills:

14.2 Specific Instructions

14.2.1 Table 2

Add standard deviations to Table 2.

14.2.2 Simulation

Create a custom function that takes the means and standard deviations for male and female faces for a specified region and creates a new dataset of 60 male and 60 female faces with simulated average ratings with those distributions.

14.2.3 Analysis

Run a t-test on the simulated data to see if there is a gender difference in the characteristic. Use a two-tailed test with an alpha value of 0.05. Report the outcome of the test using inline code.

14.2.4 Iteration and Power

Create a new custom function that simulates the new dataset using your previous function. Add the t-test to this new custom function and return only the p-value.

Run this custom function 100 times for each region and report what percent of the runs give a significant p-value with a critical alpha of 0.05/11 (i.e., the Bonferroni-corrected p-value for running all 11 regions).

14.2.5 Reflection

Reflect on the areas of this report where you used techniques that improve reproducibility and reusability. Include a list of 5 things you did to ensure reproducibility and explain what aspect of reproducibility the feature improves. Each explanation should be about 100-200 words.

14.3 Hints and Tips

  • The html theme used by the demo report is flatly
  • The ggplot theme is theme_bw()
  • You will not be asked to do anything ‘tricky’ in this assessment, so if you find yourself needing 20 lines of code just to customise the labels in a plot, try looking for more efficient alternatives.
  • You will definitely need to join data, and convert between wide and long. Make sure you do this efficiently, creating the minimum number of extra data tables needed, and not re-creating the same table for different tasks.
  • If you find yourself doing the same thing to many columns, you could probably do it more efficiently by reshaping the data longer first
  • Use pipes to avoid creating many single-use tables.
  • It is fine if your plot fonts are a slightly different size or there are small differences in visual presentation
  • Set argument defaults in custom functions if there is a sensible default value, and leave them without defaults if a user will always have to set that value to use to function.

14.4 Submission

  • Covers: chapters 1-10, emphasising 8-9
  • Worth: 40%
  • Do not put your name in your report; use your student ID as the author.
  • Please submit a zip file containing:
    1. the .rproj file
    2. your reproducible script, named report4_studentID.qmd
    3. any additional files necessary to reproduce your report(e.g., images or bibliography files),
    4. the rendered html report, named report4_studentID.html.

14.5 Marking Rubric

ILO A: Excellent B: Very Good C: Good D: Satisfactory E: Poor
Research & Knowledge
Skills from Chapters 1-3: You demonstrate skills to create reproducible reports and visualise data
Skills from Chapters 4-7: You demonstrate skills to wrangle data
Custom Functions: You create custom functions that correctly implement similar code with small differences.
Iteration: You can repeat the same code efficiently over a data structure
Data Simulation: You can simulate a simple dataset from descriptive statistics
Evaluation
Reflection: Your reflections note valid areas where reproducibility is important, and correctly identify and explain techniques you used to ensure reproduciblity
Communication
Code Clarity: Your code is organised in the quarto script cleanly and clearly, using separate code chunks to intersperse text and relevant code. Your code chunks contain comments that clarify the purpose of the code, but not overly-explaining each step. The names you use for objects are clear, consistent, and concise.
Code Efficiency: While there are many ways to do the same things in R, some ways are more efficient than others. These avoid unnecessary code (e.g., do not load packages you do not use) and redundancy (e.g., do not load or process the data the same way in several places).