· Tools to enable easy data extract/transform/load (ETL).
· A mechanism to impose structure on a variety of data formats.
· Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM).
· Query execution via MapReduce.
· Tools to enable easy data extract/transform/load (ETL)
· A mechanism to impose structure on a variety of data formats
· Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM)
· Query execution via MapReduce
Bugs:
· Exception on windows when using the jdbc driver. "IOException: The system cannot find the path specified".
· Schema creation scripts are incomplete since they leave out tables that are specific to DataNucleus.
Improvements:
· Improve miscellaneous error messages.
· Return correct Major / Minor version numbers for JDBC Hive Driver.
· Add the HivePreparedStatement implementation based on current HIVE supported data-type.
Tasks:
· Hive in Maven.
· Provide Metastore upgrade scripts and default schemas for PostgreSQL.
New Feature:
· Authorization infrastructure for Hive
· Implement Indexing in Hive
· Add reflect() UDF for reflective invocation of Java methods
· Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
· Implement GenericUDF str_to_map
· Patch to support HAVING clause in Hive
· Track the joins which are being converted to map-join automatically
· Call frequency and duration metrics for HiveMetaStore via jmx
· Maintain lastAccessTime in the metastore
Improvement:
· Provide option to export a HEADER
· Support for distinct selection on two or more columns
· Describe extended table/partition output is cryptic
· Missing some Jdbc functionality like getTables, getColumns and HiveResultSet.get* methods based on column name.
· Tapping logs from child processes
· Support filter pushdown against non-native tables
· Replace dependencies on HBase deprecated API
· Add queryid while locking
· Update transident_lastDdlTime only if not specified
· Add more debug information for hive locking
· HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
· Show the time the local task takes
· Create a new ZooKeeper instance when retrying lock, and more info for debug
· Add a option to run task to check map-join possibility in non-local mode
· More debugging for locking
· Add an option in dynamic partition inserts to throw an error if 0 partitions are created
Bugs:
· "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the same name
· NULL is not handled correctly in join
· HiveInputFormat.getInputFormatFromCache "swallows" cause exception when throwing IOExcpetion
· Add progress in join and groupby
· Simple UDAFs with more than 1 parameter crash on empty row query
· UDF field() doesn't work
· Dynamic partition inserts left empty files uncleaned in Hadoop 0.17 local mode
· Skip counter update when RunningJob.getCounters() returns null
· Let user specify serde for custom scripts.
· Add UDF unhex.
· Remove lzocodec import from FileSinkOperator.
· Driver NullPointerException when calling getResults without first compiling.
· Performance improvement for RCFile and ColumnarSerDe in Hive.