Wednesday, January 9, 2008

Missing Values in SAS


Missing values in SAS
Numeric missing values are represented by a single period (.).
Character missing values are represented by a single blank enclosed in quotes (' ').
Special numeric missing values are represented by a single period followed by a single letter or an underscore (for example .A, .S, .Z, ._).
Special missing values
These are only available for numeric variables and are used for distinguishing between different types of missing values.
Responses to a questionnaire, for example, could be missing for one of several reasons (Refused, illness, Dead, not home). By using special missing values, each of these can be tabulated separately, but the variables are still treated as missing by SAS in data analysis.data survey;
missing A I R;
input id q1;
cards;
8401 2
8402 A
8403 1
8404 1
8405 2
8406 3
8407 A
8408 1
8408 R
8410 2
;
proc format;
value q1f
.A='Not home'
.R='Refused'
;
run;
proc freq data=survey;
table q1 / missprint;
format q1 q1f.;
run;
Sort order for missing values
There is a serious logic error in the following code:if age < 20 then agecat=1;
else if age < 50 then agecat=2;
else if age ge 50 then agecat=3;
else if age=. then agecat=9;
Sort order
Symbol
Description
smallest
_
underscore

.
period

A-Z
special missing values A (smallest) through Z (largest)

-n
negative numbers

0
zero
largest
+n
positive numbers
Working with missing values
When transforming or creating SAS variables, the first part of the code should deal with the case where variables are missing.if age=. then agecat=.;
else if age < 20 then agecat=1;
else if age < 50 then agecat=2;
else agecat=3;
Note that if you use special missing values then 'if age=.' cannot be used and 'if age le .Z' must be used to identify missing values.
Note that the result of any operation on missing values will return a missing value. In the following example, the variable total will be missing if any one of q1-q6 is missing.total=q1+q2+q3+q4+q5+q6;
An alternative is to use:total=sum(of q1-q6);
in which missing values are assumed to be zero.
Even if you think that a variable should not contain any missing values, you should always write your code under the assumption that there may be missing values.

No comments: