DataFu can be used only with Apache Hadoop and Pig.
Each function is unit tested and code coverage is being tracked for the entire library.
DataFu was developed at LinkedIn and is written in Java.
Here are some key features of "DataFu":
Functions for:
· PageRank
· Quantiles (median), variance, etc.
· Sessionization
· Convenience bag functions (e.g., set operations, enumerating bags, etc)
· Convenience utility functions (e.g., assertions, easier writing of
· EvalFuncs)
· Bag operations
· Link analysis, and many more