Brett Klamer
Table Of Contents

Q&A

Concise questions and answers for a variety of topics. Things you should probably know off the top of your head. Most of this is summarization of texts, papers, stackexchange, or other notes. However, I’ve lost direct references for many.

Basic Stats

  1. What is statistics?
  2. Explain the law of large numbers.
  3. Explain the central limit theorem.
  4. What is the standard error?
  5. What is a p-value?
  6. If p>0.05 should you fail to reject the null hypothesis?
  7. If p≤0.05 should you reject the null hypothesis?
  8. If p>0.05 does it show evidence for the null hypothesis?
  9. If p≤0.05 does this mean the chance you’ve made a false positive conclusion is 5%?
  10. Statistical significance is a property of the phenomenon being studied, and thus statistical tests detect significance.
  11. Explain the 95% confidence interval.
  12. The observed 95% CI has a 95% chance of containing the true effect size.
  13. If two CIs overlap, the difference between two estimates or studies is not significant.
  14. What is type 1 error?
  15. What is type 2 error?
  16. When do you need to adjust for multiple comparisons?
  17. What is the probability of a false positive among multiple independent hypothesis tests?
  18. What is power?
  19. What information is needed for power analysis and sample size calculation?
  20. Is it ok to calculate retrospective power?
  21. What is Pearson’s correlation coefficient?
  22. What is selection bias?

Paradoxes, Phenomena, and Fallacies

  1. What is Simpson’s paradox?
  2. What is Lord’s paradox?
  3. What is Suppression?
  4. What is Berkson’s paradox?
  5. What is Will Roger’s phenomenon?
  6. What is regression to the mean?
  7. What is Stein’s paradox?
  8. What is the Jeffreys-Lindley paradox?
  9. What is the absence of evidence fallacy?
  10. What is the ecological fallacy?
  11. What is the prosecutor’s fallacy?
  12. What is the gambler’s fallacy?
  13. What is the low birth weight paradox?
  14. What is the table 2 fallacy?
  15. Other methods and ideas to avoid.

Models

Theory

  1. What is ordinary least squares (OLS)?
  2. What is maximum likelihood estimation (MLE)?
  3. What is Bayesian inference?
  4. What are common algorithms for optimization?
  5. What are loss/cost/objective functions?
  6. What is the likelihood function?
  7. What is the difference between a linear and nonlinear model?
  8. What is the difference between linear and generalized linear models?
  9. What is the bias-variance tradeoff?
  10. What is regularization?
  11. What is sparsity?
  12. What’s the difference between likelihood ratio, Wald, and score tests?

Assumptions

  1. What are the assumptions of linear regression?
  2. How can you validate the validity assumption of linear regression?
  3. How can you validate the linearity assumption of linear regression?
  4. How can you validate the multicolinearity assumption of linear regression?
  5. How can you validate the independence assumption of linear regression?
  6. How can you validate the equal variance assumption of linear regression?
  7. How can you validate the normality of errors assumption of linear regression?
  8. How can you fix the equal variance assumption of linear regression?
  9. How can you fix the independence assumption of linear regression?
  10. How can you fix the linearity assumption of linear regression?
  11. How can you fix the normality of errors assumption of linear regression?
  12. What are the assumptions of logistic regression?
  13. What are the assumptions of Cox proportional hazards regression?

Specification

  1. What conditions are required for causal interpretations in observational studies?
  2. What are DAGs?
  3. What is the backdoor criterion?
  4. What is a backdoor path?
  5. What is a confounding variable?
  6. What is a collider variable?
  7. What is M-bias?
  8. What is an instrumental variable?
  9. What is mediation?
  10. What is moderation?
  11. What is interaction/effect modification?
  12. How would you perform exploratory model selection?
  13. How would you perform Bayesian model selection?

Performance

  1. What is cross validation?
  2. What is K-fold cross validation?
  3. What is bootstrap cross validation?
  4. What cross validation method applies to time series data?
  5. What are the differences between training, validation, and test sets?
  6. What is internal model validation?
  7. What is external model validation?
  8. How would you assess performance of Bayesian models?
  9. What are some model fit summaries for linear regression?
  10. What are some model fit summaries for logistic regression?
  11. When is a model underfit?
  12. When is a model overfit?

Other

  1. What is logistic regression?
  2. How would you interpret the coefficient for a continuous predictor in linear regression?
  3. How would you interpret the coefficient for a categorical predictor in linear regression?
  4. How would you interpret the coefficient for a continuous predictor in logistic regression?
  5. How would you interpret the coefficient for a categorical predictor in logistic regression?
  6. What is the risk ratio?
  7. What is the odds ratio?
  8. What are collapsible vs non-collapsible estimates?
  9. What is supervised learning?
  10. What is unsupervised learning?
  11. What model would you use for an ordered response variable?
  12. What is best subset regression?
  13. What is Lasso regression?
  14. What is Ridge regression?
  15. What is ensembling?
  16. What is bagging?
  17. What is boosting?
  18. What is random forests?
  19. What is gradient boosting?
  20. What is pruning a decision tree?
  21. What is the class imbalance problem?
  22. What are precision, recall, sensitivity, and specificity?
  23. What is an ROC curve?
  24. When should you use a ROC or PR curve?
  25. What is the ROC-AUC or c-index?
  26. What is the confusion matrix?
  27. What are the common binary diagnostic summaries?
  28. What is deep learning?
  29. Why should a linear model include an intercept term?
  30. If two predictor variables have zero correlation, can we consider them statistically independent?
  31. What is a principled way to handle missing data?
  32. What are two methods for analyzing pre-post data?
  33. How would you determine the sample size required for statistical modeling?
  34. How should you model nested data?
  35. What is principal component analysis (PCA)?
  36. Explain the concepts of centering, scaling, and standardizing numeric variables.

Math Stat

  1. What are the basic properties of variance?
  2. What are the basic properties of covariance?
  3. What are the basic properties of expectation?

Probability

Basics

  1. What are the basic rules of probability?
  2. What is permutation?
  3. What is combination?

Distributions

  1. What is the normal distribution?
  2. What is the binomial distribution?
  3. What is the uniform distribution?
  4. What is the discrete uniform distribution?
  5. What is the geometric distribution?
  6. What is the Poisson distribution?
  7. What is the exponential distribution?
  8. What is the gamma distribution?
  9. What is the negative binomial distribution?
  10. What is the Bernoulli distribution?
  11. Review the probability distribution flowchart.

Last updated 2021-09-05