Scalability of Table Lookup Techniques(摘录)

author:Rick Langston, SAS Institute Inc., Cary, NC

 

 

THE FIVE TECHNIQUES
For the purposes of comparing resource utilization, we created a SAS data set of key/label pairs that consists of
incrementing numbers from 1 to n, with labels being the number using the Z8. format. For example, the key value of 1
had the label 00000001. The base table is the SAS data set that contains the keys to look up, and the lookup table is
the SAS data set that contains the key/label pairs. For simplicity, we used the same table for the base table and the
lookup table. The base table was sorted using a random value to properly exemplify real-world scenarios where the
key is not in sorted order in the base table. For further simplicity, the base table used start and label as variable
names so that it could be treated as a CNTLIN= data set.


Using the base table named temp, we tried five standard techniques for table lookup. The following SAS code was
used for each technique:

 

/*-----PROC FORMAT and PUT function-----*/
proc format cntlin=temp; run;
data _null_; set temp(rename=(label=shouldbe)) end=eof;
    label=put(start,$testfmt.);
    if label=shouldbe then matched+1;
    if eof;
    if matched=_n_ then put 'all matched';
    else put 'not all matched';
run;

/*-----hash object-----*/
data _null_; set temp(rename=(label=shouldbe)) end=eof;
      length label $20;
      retain label ' ';

      if _n_=1 then do;
          declare hash ht(dataset:"temp");
          ht.defineKey("start");
          ht.defineData("label");
          ht.defineDone();
          end;

     rc = ht.find();
     if rc = 0 then do;
         if label=shouldbe then matched+1;
         end;
     if eof;
     if matched=_n_ then put 'all matched';
     else put 'not all matched';
run;

/*-----merge-----*/
proc sort data=temp out=temp2(drop=random rename=(label=shouldbe)); by start;
run;
proc sort data=temp out=lookup(drop=random); by start;
run;
data _null_; merge temp2(in=want) lookup end=eof; by start;
       if label=shouldbe then matched+1;
       if eof;
       if matched=_n_ then put 'all matched';
       else put 'not all matched';
run;

/*-----key= usage-----*/
data lookup(index=(start)); set temp(keep=start label);
      run;
data _null_; set temp(keep=start label rename=(label=shouldbe)) end=eof;
      set lookup key=start;
      if label=shouldbe then matched+1;
      if eof;
      if matched=_n_ then put 'all matched';
      else put 'not all matched';
run;


/*-----SQL inner join-----*/
proc sql;
      create table merged as
            select label,shouldbe from temp a inner join temp(rename=(label=shouldbe)) b
                 on a.start = b.start;
quit;

data _null_; set merged end=eof;
     if label=shouldbe then matched+1;
     if eof;
     if matched=_n_ then put 'all matched';
     else put 'not all matched';
run;

 

 

后记

还有几篇文章对此作了阐述,只有细微的不同:

 

http://www2.sas.com/proceedings/forum2008/095-2008.pdf

http://www2.sas.com/proceedings/sugi27/p011-27.pdf

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值