Mixed Variable Discriminant Analysis With Zero Cell Frequency

Physical Science Project Topics

Get the Complete Project Materials Now! ยป

When available data for discriminant analysis is the mixed variables type, the common procedure assigns codes to the possible states of the discrete variables and proceeds with analysis as if all data are continuous. This may lead to loss of information. The Location Model (LM) proposed by Olkin and Tate and developed by Krzanowski has combined these two data types. The problems of large number of parameters to be estimated when discrete variables are many, inability to perform analysis when one or more cell has zero frequency and limitation of number of discrete variables to be handled are some of the disadvantages of LM. Therefore, the aim of this study is to propose a modification to the LM, called Modified Location Model (MLM), particularly for when one or more cells of the resulting contingency table have zero frequency. The objectives were to: (i) propose a model, MLM, for developing a discrimination procedure for mixed variables; (ii) derive an estimator of the variance covariance matrix for the mixed variable case; (iii) derive an estimator of the vector of means and cell probabilities in the presence of empty cell(s); and (iv) compare the proposed procedure with two existing methods, namely, the LM and Fisher Linear Discriminant Function (FLDF). rnrnThe MLM procedure was obtained by first estimating the variance covariance matrices of each cell of the two groups. When one or more cells have zero frequency, it uses the Independent Binary Model (IBM) to estimate cell probabilities, vector of means and variance covariance matrix for the empty cell(s). Simulated data were analyzed for several combinations of number of discrete and continuous variables including states within variables that were cross classified yielding some empty cells. Error rates, sensitivity and specificity measures were used as performance criteria. Three sets of real life data were used to validate the results obtained from simulation study. rnrnFindings from this study were that:rni. MLM procedure for mixed variables in discriminant analysis was obtained;rnii. a procedure for estimating variance covariance matrix when there are empty cells was obtained;rniii. IBM, for estimating cell probabilities and mean vector when there are empty cells was obtained;rniv. proposed MLM gave higher classification accuracy than the LM over all cases considered;rnv. MLM was better than both the FLDF procedure and the LM for small sample sizes in terms of classification accuracy;rnvi. MLM and FLDF procedure performed closely and better than LM in terms of specificity and sensitivity; andrnvii. MLM performed better than both the LM and the FLDF procedure when validated with real life data.rnrnThe study concluded that the proposed Modified Location Model was feasible, applicable and performed better than both FLDF and LM procedures based on error rate, when available data for discriminant analysis is of the mixed variables type. The MLM is useful when one or more unique response patterns of discrete variable have empty/zero frequency. It is therefore recommended for discriminant analysis of mixed variables especially when there are many discrete variables

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Mixed Variable Discriminant Analysis With Zero Cell Frequency

168