A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
created at March 28, 2015, 12:34 a.m.
Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA
created at Feb. 10, 2017, 10:12 a.m.