general natural language facilities for node
created at May 7, 2011, 2:35 a.m.
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
created at Nov. 19, 2013, 11:47 p.m.
Distributed, masterless, high performance, fault tolerant data processing
created at Dec. 2, 2013, 1:21 a.m.
A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks
created at March 30, 2014, 5:34 a.m.
the best machine learning tutorials on the web
created at April 28, 2011, 1:01 a.m.
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
created at Jan. 14, 2013, 3:46 p.m.
Streaming MapReduce with Scalding and Storm
created at Sept. 25, 2012, 10:38 p.m.
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
created at July 25, 2014, 8:08 p.m.
Sparkling Water provides H2O functionality inside Spark cluster
created at Oct. 13, 2014, 11:06 p.m.
An Open Source Machine Learning Framework for Everyone
created at Nov. 7, 2015, 1:19 a.m.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
created at Jan. 23, 2015, 7:38 p.m.