The following article provides a comparison between BASH and JAQL for basic data management and manipulation.
Selecting- All Fields
BASH
more mydata.dat
JAQL
read(hdfs( Customers.dat ));
-One Field
BASH
cut -c 13-22 mydata.dat
JAQL
$Customers = read( Customers.dat ); $Customers -> transform { name: $.name };
– Subset
BASH
less mydata.dat |awk {if (substr($0,13,10) == 2786586641) print $0}
JAQL
$Customers = read( Customers.dat ); $Customers -> filter $.age == 18 -> transform { name: $.name };
Sorting
BASH
sort -n -t +2 mydata.dat
JAQL
$Customers = read( Customers.dat ); $Customers -> sort by [$.age] -> transform { name: $.name };
Join
BASH
join -1 2 -2 1 mydata_2.dat mydata_lkup.dat | less or (if no unmatched values and both files are sorted) paste mydata_2.dat mydata_lkup.dat
JAQL
$CustomersPurchases =join $Customers, $PurchaseOrder where $Customers.CustomerId == $PurchaseOrder.CustomerId into {$Customers.name, $PurchaseOrder.*} ;
Sample
BASH
more mydata.dat| head -10
JAQL
$Customers = read( Customers.dat ); $Customers -> top(2);
Aggregate Analysis
BASH
awk BEGIN { FS=OFS=SUBSEP= }{arr[$2,$3]++ }END {for (i in arr) print i,arr[i]} mydata_lkup.dat
JAQL
$CustomersPurchases -> group by $Customers_group = $.CustomerId into {$Customers_group, total: count($[*].POID)};
Unique
BASH
less mydata_lkup.dat|cut -c 12|uniq
JAQL
$CustomersPurchases ->group by $Customers_group = $.CustomerId into {$Customers_group };