Application Of Data Mining Technology To Identify Significant Patterns In Census Or Survey Data The Case Of 2001 Child Labor Survey In Ethiopia

Information Sciences Project Topics

Get the Complete Project Materials Now! »

Knowledge and understa nding of a problem is always the first step in identifying effectivernsolutions . Child labor is both a sign and cause of poverty that should b eliminated as soon asrnpossible. In Ethiopia, there is no much statistical data on chi ld labor practice. To fill this data gap,rnthe FORE, CSA carried out country wide child labor survey in 200 I . This organization uses veryrnsimple statistical tools to show summary figures of different variables involved in 2001 childrnlabor survey database. However traditional statistical method s are not good enough to discoverrncomplex relationships from large volume databases. The inefficiency of these tools necessitatedrnthe development of more powerful methods and techniques that can be used to studyrnrelationships and patters through the large volumes of data collected for example for census andrnsurvey purposes. In developed world, govemmrnt non-govemment organizations which havernaccess to censuses and surveys are making use of the relatively new a nd modern technology, datarnmining, to identify important patterns and relationships within the data that is accumulated inrnlarge database.rnThe application of data mining techniques to official data such as the 200 I child labor survey hasrngreat potential • in supporting good public policy. This research focused on identifyingrnrelationships between attributes within the 200 I child labor survey database that can be used tornclearly understand the nature of child labor problem in Ethiopia . So the goal of the data miningrnprocess in this research was identifying interesting pattems and relationships in the 2001 childrnlabor database.rnAfter the identification and understanding of the problem domain and the research objectives, thernremaining stages of the research project focused on the following three major phases in datarnmining process. During the first phase, selection of the appropriate data mining tool which can bernused to attain the defined data mining goal and the target dataset used in model building were thernmajor tasks. The next phase, data cleaning and preparation, involved identifying and correctingrnmis-transmitted information, consolidating and combining records, transforming data from onernform to another suitable for the selected data mining tool, handling missing attributes andrnselecting relevant attributes for generating meaningful association rules. As a final step for datarnpreparation, the selected dataset was categorized into five classes using expectation maximizationrnclustering algorithm implemented in knowledge studio version 3.0. A dataset of 2398 recordsrnwith 63 attributes were used for clustering purpose.rnApriori is an association rule algorithm which is implemented in Weka software. in the thirdrnphase, model building and evaluation, the apriori algorithm was used to generate associationrnrules from the clustered as well as non-clustered selected dataset. Different attributes were givenrnto apriori in an effort to generate meaningful rules.rnThe results from this study were encouraging, which strengthened the hypothesis that interestingrnpattems can be generated from census and survey database by applying one of the data miningrntechniques: association rule mining.rnKey words: Data mining, knowledge discovery, association rule, apriori algorithm

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Application Of Data Mining Technology To Identify Significant Patterns In Census Or Survey Data  The Case Of 2001 Child Labor Survey In Ethiopia

204