If you want to write a HQL hive query and run it mulitple times from a shell script, each time passing it different data for the query, here is a quick example that should get you started.
The first thing to know is that by specifying n number of -hivevar key value pairs when invoking hive on the command line will allow you to pass that data into the hive process.
For example, if you do the following
$ hive -e 'SELECT * FROM some_table' -hivevar FOO=blah
You will have passed in a key of FOO with the value of ‘blah’ to the hive process.
A more practical example would be wanting to run the same hive query over multiple data partitions.
In this example, I’ve got a hive database that has a ‘packets’ table partitioned by hours which looks like 2014032601.
The hive query file (dest_ip_hive.sql) would look like:
SELECT packets.sourceip FROM packets
WHERE packets.destip = "${hivevar:DEST_IP}"
AND packets.hour = ${hivevar:HOUR}
GROUP BY packets.sourceip
And a shell script that would dynamically set those values for each invocation of hive would look like:
#!/bin/bash
#
# Destination IP that we are using to determine which
# packets we will examine.
#
DEST_IP="10.0.1.10"
for HOUR in 2014032209 2014032210 2014032211 2014032212
do
echo "Running hive query for HOUR $HOUR"
# Run a hive query from the command line setting variables that will be
# expaned in the .sql file.
hive -hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP \
-f dest_ip_hive.sql > ${DEST_IP}-{$HOUR}.out
done
For each hour defined in the for loop, we will execute a hive command telling it to run the query contained in the file dest_ip_hive.sql. The DEST_IP and HOUR variables that will be expanded in the query are passed to hive via the
-hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP
part of the hive command. And the output for each query will be written to a different file for each query.