Session 13: Project work I

After five weeks of statistics training (already!), we would like to switch gears a bit. Instead of introducing new topics, we would like to ask you to apply what you have learned so far by conducting analyses on data you are familiar with.

We are aware that many of you have already worked with real Marcus data for some time and have likely implemented statistical methods that go beyond what we have covered so far in the training. This week’s assignment is therefore not necessarily intended to help you make direct progress on your project work. Rather, it should be viewed as an opportunity to practice the specific methods we have introduced, with a focus on simulation, bootstrapping, and permutation. We believe that these methods not only provide a general statistical framework that supports robust workflows and reliable results, but also help strengthen statistical reasoning. In turn, this can deepen your analytic understanding and help you identify appropriate methods for future data analysis challenges.

We advise you to work with your own project data as much as possible. However, if some of the exercises described below cannot be carried out using your data, for example because your dataset contains too few variables, lacks suitable variation, or does not include the types of variables required for a particular method, please feel free to use datasets available in R packages. Examples of such datasets include sleep (effects of sleep medication) and PlantGrowth (group comparisons).

Use Quarto to organize your analyses, combining code, results, and reflections in a single reproducible document. Please prepare a clear final report with figures and key conclusions, and be ready to present your work next Wednesday.

For a numeric variable in your data, calculate its mean and standard deviation, and then use simulation to estimate a 95% confidence interval for the mean.

Investigate how the confidence interval changes if your sample size is doubled or quadrupled. To help visualize the effect of sample size on uncertainty, plot the confidence intervals for the three sample sizes (original, doubled and quadrupled) on the same graph. Show each mean and confidence interval as a point range using geom_pointrange
Reflect on how increasing the sample size influences the width of the interval and why the interval becomes narrower as the sample size grows.
With reference to your plot and perhaps the RPsychologist visualization we looked at in Session 10, how would you define the confidence interval?

Many classical parametric statistical tests (i.e. t tests) make certain assumptions about the data. One key assumption is that observations should be drawn from a particular distribution, commonly a normal distribution. When data violate this assumption, drawing inferences from the test statistic will inevitably be invalid. As you inspect your dataset, perhaps plotting variable histograms to gauge normality of continuous variables, you may observe a left- or right-tailed skew. How much skewness is acceptable (is ‘normalish’) before violating assumptions (though other measures of normality are also relevant, including tail density)? Of course, this parametric assumption is one of the reasons why this training has encouraged the use of such non-parametric techniques as resampling (bootstrapping and permutation), which exacts no such requirement of the data.

We would like you to take the same numeric variable in your data and use bootstrapping to test its skewness. Specifically, you will need to compute the difference between the mean and median, a commonly used simple indicator of skewness, for each bootstrapped sample you take.
If the data were perfectly normally distributed, what would the difference between mean and median be? What sign would you expect for increasingly left- or increasingly right-skewed data?
Calculate the 95% confidence interval around the mean from your bootstrapped samples, using approaches covered in the previous sessions. How would you interpret a CI that overlapped with 0?
Compare your findings with a common test of normality, the Shapiro-Wilk test using shapiro.test(), where lower W values (and lower p-values) indicate decreasing likelihood of the sample following a normal distribution.

Select two numeric variables in your data and calculate their covariance.

What is the maximum possible covariance of the two variables, assuming a perfect positive relationship? Divide the observed covariance by the maximum possible covariance. What have you just calculated? As a quick refresher, what other method could you use to standardize covariance?
Next, use a permutation test by randomly shuffling one variable many times and calculating the correlation for each permutation. Compare your observed correlation to the distribution obtained from the permutations and visualize the permutation distribution, marking your observed correlation to illustrate the concept of statistical significance.

Select a binary grouping variable and two numeric variables in your data. If you have a grouping variable with three or more levels, please filter your data for two of three. Likewise, if you don’t have a grouping variable, you could discretize another variable (i.e. a median split with values below and equal to the median assigned to Group 1 and above, Group 2).

First, calculate the correlation between the two numeric variables separately within each group, and use bootstrapping to generate a 95% confidence interval for each group’s correlation. Examine these intervals to understand the uncertainty around each correlation.
Then, for each bootstrap iteration, compute the difference between the correlation in group 1 and the correlation in group 2. Use these differences to construct a 95% confidence interval for the difference between groups. Based on this interval, evaluate whether the difference is likely to be meaningful.
Reflect on how comparing both the group-specific intervals and the interval for the difference helps you understand uncertainty in the results.
Visualize the correlation and 95% CI with geom_pointrange.

Select a binary grouping variable in your data and a numeric outcome variable. Calculate the difference in means between the two groups. Next, perform a permutation test by randomly shuffling the group labels many times and recalculating the mean difference for each shuffled dataset.

Compare your observed difference to the distribution of differences obtained from the permutations and determine how unusual your observed value is. Use this to evaluate whether the observed difference is likely to arise by chance.
Use the infer package to visualize the observed value and probability density for differences equal to and more extreme than the observed difference.

To highlight the ability to compute different test statistics via resampling techniques, we’d like to explore a test of differences in proportions. The classical counterpart to this test is a one- or two-sampled z-test of proportions. Choose a binary grouping variable and a response variable (variables that take values of 0 and 1). If you don’t have variables of this type in your data, either try discretizing a continuous numeric variable or switch over to an R dataset of your choosing.

Compute the proportion of ‘successes’ (where the response variable == 1 or simply one of the two named factor levels) within each group. Across 10,000 permutation, shuffle the response variable across groups and recalculate the proportions of successes in each shuffled group.
Plot a histogram or a geom_density plot of the null distribution of proportional differences between the two groups and annotate it with the group difference in the proportion of observed successes. What is the probability of finding a proportional difference equal to or more extreme than the one observed?

Select the same binary grouping variable and a numeric outcome variable in your data.

First, calculate Cohen’s d to quantify the effect size between the two groups. Then, using the sample sizes of your groups as fixed values, simulate how statistical power changes for different hypothetical values of Cohen’s d (this simulation will require two nested map functions).
Generate a power curve that shows the probability of detecting a statistically significant difference as a function of the effect size.
Reflect on how effect size and sample size influence power, and consider what this means for designing studies and interpreting non-significant results in your own data.
Brainstorm how you might determine a reasonable effect size within your field.

Using the same binary grouping variable and numeric outcome as in the previous exercise, generate a bootstrapped sampling distribution of Cohen’s d by repeatedly resampling within each group and recalculating the effect size for each iteration. Examine the resulting distribution and calculate the median as well as the 25th and 75th percentiles, which represent the interquartile range of plausible effect sizes.

Use these three values to calculate the corresponding statistical power (using pwr) for detecting a difference given your group sample sizes. You should end up with an interquartile range of power.
Reflect on how considering the interquartile range captures the most likely values of the effect size and how this uncertainty influences estimated power. What are the pitfalls of performing post hoc power calculations and why might it bias our inference about the true power to detect an effect? If you’re unsure, try skimming Post hoc Power is Not Informative for some ideas. Discuss the implications for interpreting results or planning future studies, keeping in mind that extreme bootstrap values can overstate uncertainty, while the 25th–75th percentile range provides a more robust estimate of typical effect sizes.

Let’s revisit the idea of the Winner’s curse we discussed last week.

We will use the same binary grouping variable and numeric outcome. Please use permutation to generate a distribution of Cohen’s d values by shuffling the numeric outcome variable multiple times across groups. What would you expect the mean value of this distribution to be?
What does this distribution represent? You may find it helpful to revisit the RPsychologist visualization.
Plot both the bootstrapped sampling distribution of Cohen’s d you generated in Q8 and the distribution of Cohen’s d on the permuted data on the same axes using two geom_density plots. Add a vertical line to the plot marking the mean Cohen’s d in the distribution of bootstrapped samples. Now add lines at the 2.5th and 97.5th percentiles of the distribution of Cohen’s d from the permuted data. What do these two lines represent?
Let’s now explore what happens when you halve your sample. Take the first 50% of your samples of each of your groups and repeat the process of using permutation and bootstrapping to generate two distributions. Plot the distributions and compare the results with the full sample.