Sketch Image

CAMUR: Classifier with Alternative and MUltiple Rule-based models

Next Generation Sequencing (NGS) techniques are rapidly spreading, providing huge amounts of genomic data. Therefore, knowledge extraction methods from NGS data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies for cancer with rule-based classification algorithms designed to build models, that discriminate cases from controls. State of the art algorithms typically extract a single classification model that contains few features (genes). On the other hand, our goal is to elicit a higher amount of knowledge by computing more alternative classification models and therefore to discover several features that are related to the predicted class.

We propose CAMUR (Classifier with Alternative and MUltiple Rule-based models), a new method and software package able to extract multiple, alternative, and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set (or a partial combination) of the features present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a complete querying tool.

We downloaded, processed, and analyzed three different RNA-Seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA). Our experimental results show the efficacy of CAMUR: we obtain several high reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced.

CAMUR DOWNLOADS

CAMUR_Poster.pdf
CAMUR_Software_Package.zip
Camur_User_Guide.pdf
Supplementary Data.zip
Tcga2Camur.zip

Data sets

breast_invasive_carcinoma.zip
breast_invasive_carcinoma_rsem.zip
example_breast.csv
head_and_neck_squamous_cell_carcinoma.zip
stomach_adenocarcinoma.zip
wils_tumor.zip