Skip to content
Oct 4 11

Additional Features of Hive in Amazon Elastic MapReduce

by Vaibhav Aggarwal

Here is a list of additional Hive features. Quite a few of them have been submitted back to the Hive community.

http://aws.amazon.com/articles/2856

Aug 9 11

Hive on Amazon Elastic Map Reduce

by Vaibhav Aggarwal

https://cwiki.apache.org/confluence/display/Hive/HiveAwsEmr

Sep 24 10

Profiling Hadoop

by Vaibhav Aggarwal

I was searching for something which will help me profile Hadoop jobs in order to optimize performance of certain queries. Here is a cool link which demonstrates how to do that:

http://williamlouth.wordpress.com/2010/02/05/metering-profiling-apache-hadoop-jobs/

Aug 13 10

How to mount LVM Logical Volume Manager

by Vaibhav Aggarwal

It all started with me trying to mount an LVM volume. So I will start with how to mount an LVM volume:

1. The first step is to scan all disks for physical volumes

>pvscan
PV /dev/sde2   VG VolGroup00   lvm2 [21.88 GB / 0    free]

2. The second step is to activate the logical volume

>vgchange -a y VolGroup00

1 logical volume(s) in volume group "VolGroup00" now active

3. The third step is to display attributes of a logical volume

>lvdisplay
--- Logical volume ---
LV Name                /dev/VolGroup00/LogVol01
VG Name                VolGroup00
LV Write Access        read/write
LV Status              available

4. The final step is to mount

>mount /dev/VolGroup00/LogVol00 /mnt/disk00

Here is a cool link which describes advantages of LVM in detail:

http://www.faqs.org/docs/Linux-HOWTO/LVM-HOWTO.html#AEN68

Aug 9 10

P is not equal to NP, disappointed?

by Vaibhav Aggarwal

It seems that an IITian from HP recently proved that P is not equal to NP.

Here is the link:

Aug 7 10

Hive introduces dynamic partitions

by admin

Version 0.6 onwards Hive supports dynamic partitions.

From Hive wiki:

http://wiki.apache.org/hadoop/Hive/Tutorial#Dynamic-partition_Insert

Dynamic-partition insert (or multi-partition insert) is designed to solve this problem by dynamically determining which partitions should be created and populated while scanning the input table. This is a newly added feature that is only available from version 0.6.0 (trunk now). In the dynamic partition insert, the input column values are evaluated to determine which partition this row should be inserted into. If that partition has not been created, it will create that partition automatically. Using this feature you need only one insert statement to create and populate all necessary partitions. In addition, since there is only one insert statement, there is only one corresponding MapReduce job. This significantly improves performance and reduce the Hadoop cluster workload comparing to the multiple insert case.

Below is an example of loading data to all country partitions using one insert statement:

    FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
           SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country