Data Management Quick Comparison 1

SAS

Selecting -All Fields

SQL

Select * from mydata;

SAS

Proc print data = mydata;

– One Field

SQL

Select col1 from mydata
Proc print data = mydata (keep = col1);

– Subset

SQL

Select * from mydata where col2 = 2786586641

SAS

Proc print data = mydata (where = (col2 =2786586641);

Sorting

SQL

select * from mydata sort by col2

SAS

Proc sort data = mydata; by col2;

Join

SQL

Select * from mydata_2 as mydt inner join mydata_lkup lkup on lkup.col1 = mydt.col2

SAS

Proc sort data = mydata; by col2;
Proc sort data = mydata_lkup; 
    by col1; 

data tmp;
     merge mydata (in = _mydt rename col2 _lnk) 
        mydata_lkup( in = _lkup rename col1 = _lnk);
             by _lnk;
run; 

Sample

SQL

Select top 10 from mydata

SAS

Proc print data = mydata (obs =10);

Aggregate Analysis

SQL

Select col2, count(*) from mydata_lkup group by col2

SAS

proc freq data = mydata_lkup;

tables col2;
run;

______or_________________

Proc sort data = mydata_lkup;
By col2;

Data agg;
    set mydata;
       by col2;
         retain count;
     If first.col2 eq 1 then do;
         Count = 0;
    end;

Count = count +1;

If last.col2;
     keep col2 count;
quit;
run;

Unique

SQL

Select distinct col2 from mydata_lkup;

SAS

Proc sort data = mydata_lkup nodupkey;
By col2;

SAS

As a statistical package SAS has few equals. It has a rich set of statistical procedures and excels at data manipulation. The examples below are for Base SAS. The EDI is ok and there is a learning version available. BASE SAS is a procedural scripting language.

Tip1

Proc SQL SAS allows you to code using both their internal proprietary data manipulation method, data steps and SQL.

Tip2

Dictionary

proc sql noprint NoErrorStop;
create table TmpVarNames as
(select OldCol.name
from dictionary.Columns as OldCol
where oldcol.libname = upper( Work )
and OldCol.memname =  upper( tmpBase )
and upper(OldCol.name) ne  upper( &DependVarible. )
and OldCol.type ne  char
and OldCol.format not in  ( DATETIME.       ) )
order by  OldCol.name ;
Tip3

Macro Variables

Variables that can be used through out your script. %let &MyMacroVar = Test;
&MyMacroVar.;
Tip4

Build Dynamic Commands

proc sql noprint NoErrorStop;
select %quote(%)AddWieghts( || name || ,
&WeightVariable. , tmpBase )   Into :RunTest0 separated by ;

from  TmpVarNames;
quit;
&RunTest0.;

Tip5

Macros You can create functions or macros of operations  often repeated in your code.

%macro MyMacro();
%Put ‘Hello’;
%end;
%MyMacro;
%MyMacro;
%MyMacro;
%MyMacro;
Tip6

Options There are a number of system options you can control in your code.

Option obs = 100;

Sets the max observation for all datasets to 100;

Tip7 SasAutos One of the most useful options is to set  the path for a common macro library.  Now when searching for macros called within you script SAS will include this path.

Options SasAutos = ( c:\StatMacros\ ) ;
Tip8

View Macros in Memory

proc sql noprint;
select %let ||compress(name)|| =
into : resetmv separated by ;
from dictionary.macros
where scope= GLOBAL ;
quit;
Tip9

ODS ODS allows you to out put all results as HTML.

ods html;
<<My code here>>
odo html close;
Tip10

Display a Message

%window start
#5  @28  &TxtMessage ;
%display start;
Tip11

Find Error

%if &syserr > 4 %then %do;
%end;
Tip12

Halt Code The code below puts a %macro into the code stream therefore ignoring all subsequent code until %mend is typed in.

%if &syserr > 4 %then %do;
%unquote(%nrstr(%macro noexecdummy;));
%end;

SAS Data

It is a useful exercise to solve the same problems in SAS as we did in SQL using the dataset procedure.

Below is an example is the same as in the SQL section. It is a database containing two tables, customers and purchase order. The Customer table has customerid (the unique identifier for the customer), name (The customer’s name) and age (the customer’s age). The PurchaseOrder table has POID (the unique id for the purchase order), customerid (which refers back to the customer who made the purchase) and Purchase (what the purchase was).

Example:

 

Customer
CustomerID Name Age
1 Joe 23
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18
PurchaseOrder
POID CustomerID Purchase
1 3 Fiction
2 1 Biography
3 1 Fiction
4 2 Biography
5 3 Fiction
6 4 Fiction

 

SET

 

DATA output

SET MyDataBase.Customer;

RUN:

 

 

CustomerID Name Age
1 Joe 23
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18

 

PROC SORT

 

DATA output

SET MyDataBase.Customer;

RUN:

PROC SORT DATA = output

BY age;

RUN;

 

CustomerID Name Age
5 Susan 18
1 Joe 23
3 Lin 34
2 Mika 45
4 Sara 56

 

IF

 

DATA output

SET MyDataBase.Customer;

IF age eq 18;

RUN:

CustomerID Name Age
5 Susan 18

 

MERGE

 

PROC SORT DATA = MyDataBase.Customer

BY CustomerID;

RUN;

 

PROC SORT DATA = MyDataBase.PurchaseOrder

BY CustomerID;

RUN;

 

DATA output

MERGE MyDataBase.Customer (in = a)

MyDataBase.PurchaseOrder (in = b);

By CustomerID;

if a and b;

RUN:

CustomerID Name Age POID Purchase
1 Joe 23 2 Biography
1 Joe 23 3 Fiction
2 Mika 45 4 Biography
3 Lin 34 1 Fiction
3 Lin 34 5 Fiction
4 Sara 56 6 Fiction

 

MERGE with a condition

 

PROC SORT DATA = MyDataBase.Customer

BY CustomerID;

RUN;

 

PROC SORT DATA = MyDataBase.PurchaseOrder

BY CustomerID;

RUN;

 

DATA output

MERGE MyDataBase.Customer (in = a)

MyDataBase.PurchaseOrder (in = b);

By CustomerID;

if a;

RUN:

CustomerID Name Age POID Purchase
1 Joe 23 2 Biography
1 Joe 23 3 Fiction
2 Mika 45 4 Biography
3 Lin 34 1 Fiction
3 Lin 34 5 Fiction
4 Sara 56 6 Fiction
5 Susan 18 NULL NULL

 

FIRST (Group BY)

 

proc freq data = mydata_lkup;

tables col2;

______________or__________________

PROC SORT DATA = output;

BY CustomerID;

RUN;

 

DATA outputSummary;

Do UNTIL (LAST.CustomerID);

SET output ;

BY CustomerID;

orders = orders + 1;

IF LAST.CustomerID THEN OUTPUT;

END;

KEEP Name Age Orders;

RUN;

Name Age Orders
Joe 23 1
Mika 45 2
Lin 34 2
Sara 56 1

 

UPDATE

 

DATA Customer;SET Customer;

If CustomerID = 1

Age = 26;

Run;

Customer
CustomerID Name Age
1 Joe 26
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18

 

INSERT

 

DATA temp;

DO i = 1 to 1;

CustomerID = 6;

Name = Terry ;

Age = 50;

OUTPUT;

END;

RUN;

 

DATA Customer;

SET Customer temp;

RUN;

Customer
CustomerID Name Age
1 Joe 23
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18
6 Terry 50

 

DELETE

 

DATA Customer;SET Customer;

If CustomerID = 1 DELETE;

Run;

Customer
CustomerID Name Age
2 Mika 45
3 Lin 34
4 Sara 56
5 Susan 18