Apache Pig 0.12.1

This is a high-level data-flow language and execution framework for parallel computations
This is a platform used in analyzing large data sets consisting of high-level languages for expressing data analysis programs.

It is coupled with infrastructure for evaluating programs.

The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

Main features:

  • Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.
  • Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.
  • Extensibility. Users can create their own functions to do special-purpose processing.

last updated on:
April 19th, 2014, 17:59 GMT
developed by:
Apache Software Foundation
license type:
Apache License
operating system(s):
Windows / Linux / Mac OS / BSD / Solaris
C: \ Database Tools


In a hurry? Add it to your Download Basket!

user rating 1



What's New in version 0.11.0
  • This release includes DateType datatype, RANK, CUBE and ROLLUP operators, Groovy udfs, custom reducer estimation, schema-based tuples and HCatalog DDL integration.
read full changelog

Add your review!