XML data source for Spark SQL and DataFrames
created at Nov. 26, 2015, 2:46 a.m.
PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
created at Nov. 29, 2015, 10:03 a.m.
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
created at March 3, 2016, 4:01 p.m.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
created at Aug. 29, 2016, 7:59 a.m.
Essential Spark extensions and helper methods ✨😲
created at Feb. 16, 2017, 3:41 p.m.
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
created at April 6, 2017, 9:40 p.m.
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
created at June 25, 2017, 7 a.m.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
created at July 6, 2017, 10:13 a.m.
pyspark methods to enhance developer productivity 📣 👯 🎉
created at Sept. 15, 2017, 1:02 p.m.
State of the Art Natural Language Processing
created at Sept. 24, 2017, 7:36 p.m.