Unix/Linux Data Management

Data management using Unix/Linux is easy but it does have a few quirks. First, typical the header (the first row which contains the field names) is not contained in the data file requiring a scheme or layout of the data files. That is because most of the file operation does not recognize the first line as different from the rest of the file.  Also, complex multi table queries can only be achieved in multiple steps unlike SQL and SAS.  But the speed and efficiency of code make Unix/Linux a strong data management tool.

 
Example:
Customer.dat
1 Joe 23
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18
PurchaseOrder.dat
1 3 Fiction
2 1 Biography
3 1 Fiction
4 2 Biography
5 3 Fiction
6 4 Fiction
 
     

  This data would be stored without column names.

SELECT

less customer.dat
1 Joe 23
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18

ORDER BY

less customer.dat |sort -k 1.10-,1.12
5 Susan 18
1 Joe 23
3 Lin 34
2 Mika 45
4 Sara 56

WHERE

less customer.dat |awk {if (substr($0,3,5) == Susan )  print $0}
5 Susan 18

INNER JOIN

sort -k 2 purchaseorder.dat > srt_purchaseorder.dat
join -1 1 -2 2 customer.dat srt_purchaseorder.dat
1 Joe 23 2 Biography
1 Joe 23 3 Fiction
2 Mika 45 4 Biography
3 Lin 34 1 Fiction
3 Lin 34 5 Fiction
4 Sara 56 6 Fiction

LEFT OUTER JOIN

sort -k 2 purchaseorder.dat > srt_purchaseorder.dat
join -a1 -1 1 -2 2 customer.dat srt_purchaseorder.dat
1 Joe 23 2 Biography
1 Joe 23 3 Fiction
2 Mika 45 4 Biography
3 Lin 34 1 Fiction
3 Lin 34 5 Fiction
4 Sara 56 6 Fiction
5 Susan 18 NULL NULL

GROUP BY

join  -1 1 -2 2 customer.dat srt_purchaseorder.dat > sum.dat
awk BEGIN { FS=OFS=SUBSEP= }{arr[$2,$3]++ }END {for (i in arr) print i,arr[i]}   sum.dat
Joe 23 1
Mika 45 2
Lin 34 2
Sara 56 1

UPDATE

less customer.dat |awk {if (substr($0,3,5) == Susan )  print substr($0,0,9) 59  ; else print $0 }
1 Joe 26
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18

INSERT

cat customer.dat new_cust.dat
1 Joe 23
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18
6 Terry 50

DELETE

less customer.dat |awk {if (substr($0,1,1) != 1 )  print $0
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18

 

Leave a Reply

Your email address will not be published. Required fields are marked *