Making Sense of my Statistics Course

Here are some general notes.

Difficult words

Degrees of freedom in statistics describes how variables depend on each other. If we have a data for a rectangle containing width, height and area, we have 3 variables, but degree of freedom = 2, since area depends on $width*height$. A square on the other hand has one degree of freedom An elbow has one degree of freedom, a shoulder has two. Degrees of freedom is n-1. So if we have color and size, we have $n=2 \implies n-1 = 1$

From a statistics point of view. Degrees of freedom describes how many of your data points are variable vs determined. Suppose you have three data points with a mean of 21. What three numbers averaged together give a mean of 21? You can choose anything for the first two, but the third number is determined since you have to equal the mean. For example, if you have 15 and 5, the third can only be 43: (15+ 5+43 =63 /3 = 21). You can choose any two numbers for the first two spots, but the third is determined mathematically. In this sample, we have two degrees of freedom. The more datapoints in your sample, the more degrees of freedom you have.

t observed and p value

Code in R:

# sample set
l <- c(3003,3005,2997,3006,2999,2998,3007,3005,3001)
# standard error
# how sure are we about our data.
sem <- sd(l)/sqrt(length(l))
# What is our expectation.
mu <- 3000
x_bar <- mean(l)
# divide the diff between our average from sample and expected mean value, and get observed t.
tobs <- (x_bar-mu)/(sem)
# length(l) - 1 because we have n-1 degrees of freedom.
# p_value is the risk that the result we have gotten is due to chance. 
p_value <- (1 - pt(tobs, df=length(l)-1)) * 2
# level of significance.
# This is the acceptable risk that the result we have gotten is due to chance
alpha <- 0.05
1-p_value
p_value

Tricky stuff

When converting from log to normal through exp(x) we get the median, NOT the mean

Two samples

It is almost always fool proof to use the welch version. The pooled version is for similar variance. The pooled approach is sometimes useful when sample size is very small.

Exam questions

Common question is that we want to calculate n or power.

# calculate power
power.t.test(n = 10, delta = 2, sd = 1 sig.level = 0.05)

Henrik Zenkert - 2023-09-29