It has been a while since I last maintained this page. Use at your own peril.
Here is a machine learning to statistics translator I try to maintain. Most concepts are not really identical, so “translations” should be considered approximate. Based on the same idea in “All Of Statistics” by Larry Wasserman [p.15]. Many thanks to Ohad Shamir, and Saharon Rosset for helping populate the table. Despite my best efforts, it probably still contains errors. Drop me a note if you find any.
| Concept | Statistics | Machine Learning |
|---|---|---|
| assumed model | parameter space | hypothesis class |
| model | hypothesis | |
| misspecified model | agnostic learning | |
| deterministic outcome | realizable learning | |
| sampling distribution | generative model | |
| output range | – | improper learning |
| model types | CART | decision tree, axis parallel rectangles |
| piecewise constant function | decision list | |
| Bayes net | directed acyclic graph (of conditional probabilities) | |
| latent variable model | model based collaborative filtering | |
| neighborhood methods | memory based collaborative filtering | |
| multivariate distribution | graphical model | |
| – | Boolean circuit | |
| – | kCNF | |
| – | kDNF | |
| – | k-clause CNF | |
| – | k-term DNF | |
| – | CNF | |
| – | DNF | |
| – | Boolean formula | |
| – | Boolean threshold function | |
| – | Boolean circuit | |
| – | threshold circuit | |
| – | acyclic finite automata | |
| tasks / problem setup | estimation | learning |
| classification | supervised learning | |
| clustering | unsupervised learning | |
| – | transductive learning | |
| frequentist inference | – | |
| – | semi supervised learning | |
| support estimation | manifold learning | |
| hypothesis | – | |
| fixed design | conditional model, discriminative model | |
| random design | generative model | |
| adaptive design of experiments | active learning | |
| MANOVA, vector regression | structured learning | |
| basis augmentation | feature creation | |
| missing data imputation | collaborative filtering | |
| statistical process control | semi supervised novelty detection | |
| data | data, sample, observations | examples, training sample, instances |
| – | validation sample | |
| – | test sample | |
| covariates, design, \(X\)-matrix | features, attributes | |
| methods | M-estimation | empirical risk minimization |
| R-estimation | – | |
| L-Estimation | – | |
| moment matching | – | |
| quantile matching | – | |
| U-estimation, V-estimation | generative unsupervised RKHS learning | |
| K-estimation | – | |
| Fisher’s LDA (assuming independence) | Gaussian naive Bayes | |
| interval methods | confidence intervals | PAC learnable |
| credible interval | PAC Bayes learnable | |
| fiducial interval | – | |
| prediction interval | – | |
| error decomposition | misspecification error | approximation error |
| risk | estimation error, expected prediction error, test Error | |
| – | optimization Error | |
| optimism | test error-training Error | |
| RSS | empirical risk, training error | |
| Jackknife | hypothesis stability | |
| model selection | structural learning | |
| problem complexity measures | generalized degrees of freedom | Rademacher complexity |
| sample size | sample complexity |