Topic
Creating a character variable from another character variable using the function put: Avoiding truncation due to values not available in the format
Keywords
*function #put *format #proc format #(default=)
#other
Data
data lb; length lbtest $40; lbtest='Hematocrit'; output; lbtest='Leukocytes'; output; lbtest='HDL Cholesterol/Total Cholesterol'; output; run;
Program
proc format; value $lbtest 'Hematocrit' ='HCT' 'Leukocytes' ='WBC' 'HDL Cholesterol'='HDL'; value $lbtestcd 'HCT'='Hematocrit' 'WBC'='Leukocytes' 'HDL'='HDL Cholesterol'; run; data lb_new; set lb; length lbtestcd $40; lbtestcd=put(lbtest,$lbtest.); run;
The variable lbtestcd
is created using the variable lbtest
and the format $lbtest.
The goal is to get the short name of the laboratory test from the long one.
Hematocrit
becomesHCT
Leukocytes
becomesWBC
- The value "
HDL Cholesterol/Total Cholesterol
" is not in format$lbtest.
. It becomes "HDL
" which has a different meaning. According to$lbtestcd.
format, it meansHDL Cholesterol
.
a) Task
Display the width and the default width of the format label $lbtest.
- Using the fmtlib option
- By saving format details in a dataset.
Possible Solution 1
proc format fmtlib; select $lbtest; run;
Possible Solution 2
proc format cntlout=lbtest; select $lbtest; run; proc print data=lbtest noobs; var fmtname start label min max length default; run;
The format label width is the maximum number of character observed in the LABEL
variable.
The default format label width is equal to the format label width unless another value is specified when creating the format. In this example, the default format label width is 3.
b) Question
Why "HDL Cholesterol/Total Cholesterol
" becomes "HDL
"?
Possible Solution
The default format label width, 3, is used in the put function because no other width is set to overwrite the default value.
Given that no code in the format is matching the value "HDL Cholesterol/Total Cholesterol
", the whole text is copied over, but only the three first characters are kept due to the format label width.
The variable length of 40 does not solve the issue. It could only further shorten the string when its value is smaller than the format label width used.
c) Task
Update the program in order to display the full test name in the variable lbtestcd
when no match is available in the format.
- Change the default format label width when creating the format, or
- Change the format label width given in the put function
Before |
After |
Possible Solution 1
proc format; value $lbtest (default=40) 'Hematocrit' ='HCT' 'Leukocytes' ='WBC' 'HDL Cholesterol'='HDL'; run; data lb_new; set lb; *length lbtestcd $40; lbtestcd=put(lbtest,$lbtest.); run;
Possible Solution 2
The default width of the format label can be overwritten when using the format in the put function.
data lb_new; set lb; *length lbtestcd $40; lbtestcd=put(lbtest,$lbtest40.); run;
d) Task
Update the program in order to display the text "To Check
" when no match is available in the format.
Before |
After |
Possible Solution
proc format; value $lbtest 'Hematocrit' ='HCT' 'Leukocytes' ='WBC' 'HDL Cholesterol'='HDL' other ='To Check'; run; data lb_new; set lb; length lbtestcd $40; lbtestcd=put(lbtest,$lbtest.); run;