Data Management Quick Comparison 2

The following article provides a comparison between BASH and JAQL for basic data management and manipulation. 

 

Selecting- All Fields

BASH

more mydata.dat

JAQL

read(hdfs( Customers.dat )); 

-One Field

BASH

cut -c 13-22 mydata.dat 

JAQL

$Customers = read( Customers.dat );
$Customers -> transform { name: $.name };

– Subset

BASH

less mydata.dat |awk {if (substr($0,13,10) == 2786586641) print $0}

JAQL

$Customers = read( Customers.dat );
$Customers -> filter $.age == 18 ->  transform { name: $.name };

Sorting

BASH

sort -n -t +2 mydata.dat

JAQL

$Customers = read( Customers.dat );
$Customers -> sort by [$.age] -> transform { name: $.name };

Join

BASH

join -1 2 -2 1 mydata_2.dat mydata_lkup.dat | less
or (if no unmatched values and both files are sorted)
paste mydata_2.dat mydata_lkup.dat

JAQL

$CustomersPurchases =join $Customers, $PurchaseOrder where $Customers.CustomerId
== $PurchaseOrder.CustomerId into {$Customers.name, $PurchaseOrder.*} ;   

Sample

BASH

more mydata.dat| head -10

JAQL

$Customers = read( Customers.dat );
$Customers -> top(2); 

Aggregate Analysis

BASH

awk BEGIN { FS=OFS=SUBSEP= }{arr[$2,$3]++ }END {for (i in arr) print i,arr[i]} mydata_lkup.dat

JAQL

$CustomersPurchases -> group by $Customers_group = $.CustomerId into
{$Customers_group, total: count($[*].POID)};

Unique

BASH

less mydata_lkup.dat|cut -c 12|uniq

JAQL

$CustomersPurchases ->group by $Customers_group = $.CustomerId into
{$Customers_group };