scio

A Scala API for Apache Beam and Google Cloud Dataflow.

created at March 26, 2015, 7:07 p.m.

Scala

115

2,520

507

GitHub
voyager

🛰️ Voyager is an approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.

created at April 13, 2023, 1:07 p.m.

C++

8

863

25

GitHub
sparkey

Simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.

created at Aug. 30, 2013, 2:52 p.m.

C

146

1,129

79

GitHub
featran

A Scala feature transformation library for data science and machine learning

created at May 8, 2017, 5:20 p.m.

Scala

34

447

73

GitHub
dbeam

DBeam extracts SQL tables using JDBC and Apache Beam

created at Nov. 9, 2017, 1:21 p.m.

Scala

20

40

12

GitHub
ratatool

A tool for data sampling, data generation, and data diffing

created at Aug. 1, 2016, 5:33 p.m.

Scala

31

335

61

GitHub
homebrew-public

Homebrew formula for open-source software developed by Spotify

created at Nov. 10, 2014, 4:33 p.m.

Ruby

26

27

18

GitHub
docker-gc

Docker garbage collection of containers and images

created at June 23, 2014, 12:52 a.m.

Shell

110

4,315

402

GitHub
luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

created at Sept. 20, 2012, 3:06 p.m.

Python

509

13,976

2,203

GitHub
annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

created at April 1, 2013, 8:29 p.m.

C++

319

11,774

1,119

GitHub
noether

Scala Aggregators used for ML Model metrics monitoring

created at March 30, 2018, 5:25 p.m.

Scala

24

36

7

GitHub
dockerfile-maven

A set of Maven tools for dealing with Dockerfiles

created at March 16, 2016, 12:42 p.m.

Java

124

1,079

185

GitHub
chartify

Python library that makes it easy for data scientists to create charts.

created at Sept. 17, 2018, 2:12 p.m.

Python

55

1,354

83

GitHub
big-data-rosetta-code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

created at Feb. 18, 2016, 10:29 p.m.

Scala

41

170

21

GitHub
spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames

created at April 22, 2016, 8:17 p.m.

Scala

33

129

37

GitHub
Mobius.swift

A functional reactive framework for managing state evolution and side-effects [Swift implementation]

created at March 4, 2019, 1:52 p.m.

Swift

16

207

4

GitHub
NFHTTP

A cross platform C++ HTTP library that interfaces natively to other platforms.

created at Aug. 10, 2018, 5:25 p.m.

C

26

385

17

GitHub
XCLogParser

Tool to parse Xcode and xcodebuild logs stored in the xcactivitylog format

created at May 22, 2019, 8:40 p.m.

Swift

26

795

45

GitHub
backstage

Backstage is an open platform for building developer portals

created at Jan. 24, 2020, 10:39 p.m.

TypeScript

160

8,469

671

GitHub
XCMetrics

XCMetrics is the easiest way to collect Xcode build metrics and improve developer productivity.

created at Jan. 19, 2021, 2:16 p.m.

Swift

23

523

9

GitHub
pedalboard

🎛 🔊 A Python library for adding effects to audio.

created at July 6, 2021, 1:04 p.m.

C++

45

3,028

87

GitHub
XCRemoteCache

None

created at Sept. 3, 2021, 3:55 p.m.

Swift

22

471

11

GitHub
ruler

Gradle plugin which helps you analyze the size of your Android apps.

created at July 28, 2021, 8:33 p.m.

Kotlin

15

540

23

GitHub
basic-pitch

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

created at May 3, 2022, 9:10 a.m.

Python

13

548

17

GitHub
tfreader

TensorFlow TFRecord reader CLI tool

created at Jan. 29, 2020, 10:08 p.m.

Scala

21

52

16

GitHub