Latest News

MISSEL: Multiple Sub Sequences Extractor for cLassification

November 2015

A new feature selection algorithm that is able to extract alternative and equivalent classification models has been released.

CAMUR: Classifier with Alternative and MUltiple Rule-based models

October 2015

A new classification technique that is able to compute multiple rule-based models has been developed.

LAF: Logic Alignment Free

April 2015

A new section and software for classifying biological sequences with alignment free techniques and rule based supervised methods.

Software

BLOG - DNA Barcoding with LOGic formulas

BLOG is a data mining software designed specifically for DNA Barcode analysis applications, The aim of the system is to identify logic rules that are able to recognize the species (also referred as class) of a specimen by analyzing its barcode sequence. The standard input of the program is a FASTA format file of barcode sequences containing the training and the testing set. The FASTA format is an internationally agreed upon format for nucleotide sequences.
DNA Barcodes classification with supervised machine learning techniques

The DNA Barcodes sequences classification problem may be approached as a supervised machine learning problem in the following way: given a reference library composed of DNA Barcode specimen sequences of known species and a collection of unknown DNA Barcode sequences (query set) recognize the latter into the species that are present in the library. This problem may be solved with a special software procedure present in this section.
LAF: Logic Alignment Free

LAF combines alignment free k-mer frequency counts sequence representations and logic data mining. Therefore, it allows the analysis of biological sequences without the strict requirement of an alignment or of an overlapping DNA gene region. This leads to the possibility of performing classification of non coding DNA, which is not alignable, and of whole genomes, which are very hard to align, as the problem of whole genome alignment is computationally hard.
GELA - Gene Expression Logic Analyzer

GELA (Gene Expression Logic Analyzer) is a novel tool able to perform a knowledge discovery in gene expression profiles data of RNA-Seq. In particular, it is able to deal with the RNA-Seq technologies and the gene expression profiles. GELA and our knowledge extraction algorithm is tested on the public RNA-seq dataset of The Cancer Genome Atlas (TCGA), obtaining promising results.
NGS Read Comparisons

The software implements a method to evaluate the similarity between next generation sequencing (ngs) reads. This method does not rely on the alignment of the reads and it is based on the distance between the frequencies of their substrings of fixed dimensions (k-mers). We compare this alignment-free distance with the similarity measures derived from two alignment methods: Needleman-Wunsch and Blast.
MALA - MicroArray Logic Analyzer

MALA is specifically designed for the analysis of Microarray data. The rational data representing the gene expression is discretized into a limited number of intervals for each cell of the array; the discrete variables so obtained are then used to select a small subset of the genes that have strong discriminating power for the classes considered. The usual DMB algorithms for feature selection and logic formula extraction are then used to identify networks of genes - and related thresholds on their expression level - that characterize the classes.
DMIB - Data Mining in Big

DMIB is a general tool for the deployment of our software for logic data analysis. It is not designed for a specific type of application (as it is the case for MALA and BLOG). Its configuration is a little more complex but, as usual, it required a training set of tagged elements, and indication on the type of input features, some details on running time and solution dimensions.
BINAT - Biological Networks Analysis Tool

The increasing availability of large network datasets along with the progresses in experimental high-throughput technologies have promoted the need for tools allowing easy integration of experimental data with data derived from network computational analysis. In order to enrich experimental data with network topological parameters, the Cytoscape plugin BiNAT (Biological Networks Analysis Tool) has been developed by Fabio Cumbo.