In the past decade, bioinformatics has been a fast growing research field in health sector due to the advent of microarray technology. Amongst many active microarray researches, gene expression microarray classification has been a hot topic in recent years and attracted the attention of many researchers from different research fields such as data mining, machine learning, and statistics.

Gene expression data analysis plays a vital role in medical diagnosis and drug discovery. With huge volume of gene expression data, the possibilities of cancer classification have to be explored. Many methods have been proposed with promising results. Various statistical gene selection techniques, which are an integral pre-processing step for classification along with few supervised classification methods, were used in various works. The initiation of efficient classification algorithm for cancer gene expression data has been exploded in health sector during recent years. Particular application of Data mining algorithms for microarray technologies is in cancer research with a goal of early diagnosis of cancer. In machine learning community, supervised learning is to build predictive models using gene expression measurements of a number of individuals with known class membership.

This paper presents a new and novel supervised classification method for cancer classification and prediction. The proposed framework uses four stages in classifying and predicting future outcomes:
1. The first stage , pre-processing the database such as random division of the database for training and testing, noise removal, missing data estimation, individual feature(gene) ranking was proposed.
2. In the second stage all possible subsets of features were generated and ranking features pair wise.
3. In the third stage, all important gene pairs which achieved zero error in training using the best classifier were extracted.
4. Finally the fourth step classification and prediction. This work found to be efficient in reducing the number of genes that can best predict the type of cancer with reduced complexity and computational burden.

">
 [PCST]
PCST Network

Public Communication of Science and Technology

 

A new data mining model for cancer classification with minimum gene features

R Mallika   Sri Ramakrishna College of Arts and Science for Women

In the past decade, bioinformatics has been a fast growing research field in health sector due to the advent of microarray technology. Amongst many active microarray researches, gene expression microarray classification has been a hot topic in recent years and attracted the attention of many researchers from different research fields such as data mining, machine learning, and statistics.

Gene expression data analysis plays a vital role in medical diagnosis and drug discovery. With huge volume of gene expression data, the possibilities of cancer classification have to be explored. Many methods have been proposed with promising results. Various statistical gene selection techniques, which are an integral pre-processing step for classification along with few supervised classification methods, were used in various works. The initiation of efficient classification algorithm for cancer gene expression data has been exploded in health sector during recent years. Particular application of Data mining algorithms for microarray technologies is in cancer research with a goal of early diagnosis of cancer. In machine learning community, supervised learning is to build predictive models using gene expression measurements of a number of individuals with known class membership.

This paper presents a new and novel supervised classification method for cancer classification and prediction. The proposed framework uses four stages in classifying and predicting future outcomes:
1. The first stage , pre-processing the database such as random division of the database for training and testing, noise removal, missing data estimation, individual feature(gene) ranking was proposed.
2. In the second stage all possible subsets of features were generated and ranking features pair wise.
3. In the third stage, all important gene pairs which achieved zero error in training using the best classifier were extracted.
4. Finally the fourth step classification and prediction. This work found to be efficient in reducing the number of genes that can best predict the type of cancer with reduced complexity and computational burden.

A copy of the full paper has not yet been submitted.

BACK TO TOP