The Wise son:
I’m glad to see that you are taking a further step into sample size understanding. Obtaining a sample size and power analysis for multivariable models may be a challenging task because it is necessary to account for the correlation among all the variables. Moreover, there are a multitude of hypotheses being tested regarding the effect-sizes of each variable and the interaction between variables.
The Simple son:
Well, that’s exactly why we have “workable” rules-of-thumb in the selection of sample size when using multivariable regression analysis: select at least 30 cases per explanatory variable or per event.
If 30 per variable is too much, some would use 10 cases per variable. I have no idea who invented this rule and how it is justified but it is simple and I like KISS - keep it simple and short.
The Wicked son:
Let’s test you guys. What happens to the required sample size when the correlation among the covariables increases? I’m willing to give you the answer because I had a good hair day today. Well, it depends on the type of regression: in linear multiple regression, the required sample size gets smaller, while in logistic or Cox multiple regression, it gets larger. Why? Check it out in this paper here. I’m even willing to give you the code to calculate sample size with logistic regression:
NLR <- function(alphaw=0.05, Powerw=0.80, Roh=0, OR=1.5, P=0.5) {
theta<-log(OR)
theta2<-theta^2
theta2m<- -1*theta2
Zalpha<-qnorm(1-alphaw)
Zbeta<- -1*qnorm(1-Powerw)
lam<-(1+(1+theta2)*exp(5*theta2/4)) * (1+exp(theta2m/4))^-1
n<-round(((Zalpha+exp(theta2m/4)*Zbeta)^2) * ( (1+2*P*lam)/(P*theta2) ))
NM<-round(n/(1-Roh^2))
return(NM) }
That’s it, try it by yourself on your own responsibility..
He who couldn’t ask:
I have multiple anxiety attacks when you are starting with your code, Wicked. But I’m in a good relationship with Cohen (1988), Kelley et al, Peduzzi et al and all the other friends cited in that paper. I make them dinner every Wednesday and they send you their warm regards.