Scientific Online Resource System

Scripta Scientifica Pharmaceutica

FUZZIFIED DUMMY VARIABLES HAVE A BETTER FIT THAN REGULAR DUMMY VARIABLES IN STATISTICAL MODELS – AN EXEMPLARY APPROACH BASED ON DATA FROM A GERMAN NUTRITION SURVEY

Kurt Gedrich

Abstract

Introduction: In statistical analyses, the classification of quantitative dependent variables grants substantial flexibility when modelling their effect on an outcome variable. An observation’s membership to a specific class (e.g. a subject’s age group) is traditionally coded by dummy variables taking only values of either zero or one. This, however, leads to undesired breaks at the intersection of two classes (staircase function).

Materials and Methods: With regular dummy variables (rDV), an observation can only be a member of a single class (e.g. a subject can only be assigned to one age group). With fuzzified dummy variables (fDV), however, an observation is simultaneously assigned to two neighboring classes. The closer the value of the quantitative variable to the center of a class, the higher the respective degree of membership (with a maximum value of one). At the intersection of two classes, the degree of membership is 0.5 to both of these classes.

As with rDV, fDVs sum up to exactly one. Since fuzzification does not affect the number of dummy variables required for classifying a quantitative variable, the degrees of freedom of the respective statistical analysis remain unchanged.

The effect of fuzzifying dummy variables is exemplarily analyzed for the association of age and body mass index (BMI) in a subsample of a German nutrition survey (n=1500).

Results: Models with fDV show consistently better fits (i.e. higher R² values) than those with rDV. The difference is the more pronounced, the lower the number of age groups.

In contrast to rDV, fDVs do not show staircase functions in scatterplots of mean values of age and predicted BMI, but rather smooth transitions across age groups.

Conclusion: It seems that statistical modelling with fDVs leads to a better fit than rDV. They might contribute to avoiding artifacts in data analyses possibly evoked by inappropriate staircase functions.


Keywords

statistical modelling, dummy variables, fuzzy logic




DOI: http://dx.doi.org/10.14748/ssp.v4i1.3978

Refbacks

Article Tools
Email this article (Login required)
About The Author

Kurt Gedrich

Font Size


|