This is a collection of medium-data and out-of-core software projects and tools
Machine Learning Data Scientist at Team Lead level
This is a collection of medium-data and out-of-core software projects and tools
Python pandas-API compatible out-of-core computation library.
There is feedback that it's not very stable and doesn't play well in the Kubernetes ecosystem. Hence, Spark was chosen over this.
File formats are processing data
data storage format. Heavily R-based but has cross language support coming. Built in C++
A straight dump of Apache Arrow in-memory format onto disk. Has support in R, Python, and Julia.
It explains some interesting points about what needs to be done to make an outstanding distributed data processing platform