Brett Klamer

Table Of Contents

Q&A

Concise questions and answers for a variety of topics. Things you should probably know off the top of your head. Unfortunately I’ve lost references for most, though generally these are short summaries taken from texts, papers, stackexchange, or other notes.

Basic Stats

  1. What is statistics?
  2. Explain the law of large numbers.
  3. Explain the central limit theorem.
  4. What is the standard error?
  5. What is a p-value?
  6. If p>0.05 should you fail to reject the null hypothesis?
  7. If p≤0.05 should you reject the null hypothesis?
  8. If p>0.05 does it show evidence for the null hypothesis?
  9. If p≤0.05 does this mean the chance you’ve made a false positive conclusion is 5%?
  10. Statistical significance is a property of the phenomenon being studied, and thus statistical tests detect significance.
  11. Explain the 95% confidence interval.
  12. The observed 95% CI has a 95% chance of containing the true effect size.
  13. If two CIs overlap, the difference between two estimates or studies is not significant.
  14. What is type 1 error?
  15. What is type 2 error?
  16. When do you need to adjust for multiple comparisons?
  17. What is the probability of a false positive among multiple independent hypothesis tests?
  18. What is power?
  19. What information is needed for power analysis and sample size calculation?
  20. Is it ok to calculate retrospective power?
  21. What is Pearson’s correlation coefficient?
  22. What is selection bias?

Models

Theory

  1. What is ordinary least squares (OLS)?
  2. What is maximum likelihood estimation (MLE)?
  3. What is the likelihood function?
  4. What is the difference between a linear and nonlinear model?
  5. What is the difference between linear and generalized linear models?
  6. What is the bias-variance tradeoff?
  7. What is regularization?
  8. What is sparsity?
  9. Explain Bayesian inference.
  10. What’s the difference between likelihood ratio, Wald, and score tests?

Assumptions

  1. What are the assumptions of linear regression?
  2. How can you validate the validity assumption of linear regression?
  3. How can you validate the linearity assumption of linear regression?
  4. How can you validate the multicolinearity assumption of linear regression?
  5. How can you validate the independence assumption of linear regression?
  6. How can you validate the equal variance assumption of linear regression?
  7. How can you validate the normality of errors assumption of linear regression?
  8. How can you fix the equal variance assumption of linear regression?
  9. How can you fix the independence assumption of linear regression?
  10. How can you fix the linearity assumption of linear regression?
  11. How can you fix the normality of errors assumption of linear regression?
  12. What are the assumptions of logistic regression?
  13. What are the assumptions of Cox proportional hazards regression?

Paradoxes and other issues

  1. What is Simpson’s paradox?
  2. What is Lord’s paradox?
  3. What is Suppression?
  4. What is Berkson’s paradox?
  5. What is Will Roger’s phenomenon?
  6. What is regression to the mean?

Other

  1. What is logistic regression?
  2. How would you interpret the coefficient for a continuous predictor in linear regression?
  3. How would you interpret the coefficient for a categorical predictor in linear regression?
  4. How would you interpret the coefficient for a continuous predictor in logistic regression?
  5. How would you interpret the coefficient for a categorical predictor in logistic regression?
  6. What is the risk ratio?
  7. What is the odds ratio?
  8. Collapsible vs non-collapsible estimates?
  9. When is a model underfit?
  10. When is a model overfit?
  11. What is supervised learning?
  12. What is unsupervised learning?
  13. What is cross validation?
  14. What is K-fold cross validation?
  15. What is bootstrap cross validation?
  16. What cross validation method applies to time series data?
  17. What model would you use for an ordered response variable?
  18. How would you perform model selection?
  19. What is Lasso regression?
  20. What is Ridge regression?
  21. What is the class imbalance problem?
  22. What is an interaction effect?
  23. What are precision, recall, sensitivity, and specificity?
  24. What is an ROC curve?
  25. When should you use a ROC or PR curve?
  26. What is the ROC-AUC?
  27. What is the confusion matrix?
  28. What are the common binary diagnostic summaries?
  29. What is pruning a decision tree?
  30. What are some model fit summaries for linear regression?
  31. What are some model fit summaries for logistic regression?
  32. Why should a linear model include an intercept term?
  33. If two predictor variables have zero correlation, can we consider them statistically independent?
  34. What is a principled way to handle missing data?
  35. What are two methods for analyzing pre-post data?
  36. How would you determine the sample size required for a prediction model?
  37. What is internal model validation?
  38. What is external model validation?
  39. What is a confounding variable?
  40. How should you model nested data?
  41. What is principal component analysis (PCA)?

Math Stat

  1. What are the basic properties of variance?
  2. What are the basic properties of covariance?
  3. What are the basic properties of expectation?

Probability

Basics

  1. What are the basic rules of probability?
  2. What is permutation?
  3. What is combination?

Distributions

  1. What is the normal distribution?
  2. What is the binomial distribution?
  3. What is the uniform distribution?
  4. What is the geometric distribution?
  5. What is the Poisson distribution?
  6. What is the exponential distribution?
  7. What is the gamma distribution?
  8. What is the negative binomial distribution?
  9. What is the Bernoulli distribution?
  10. Review the probability distribution flowchart.