FAQ
- What is the DMB System?
- Why should I use it?
- What is the DMBCSV format?
- What is the input format of BLOG?
- What is the output of BLOG?
- What is the input format of MALA?
- What is the output of MALA?
- How big can my datasets be?
- Can I customize the DMB methods in more detail?
- Why does it take so long to obtain results?
- I don't understand the algorithms!
- Can I download the source code?
-
DMB stands for Data Mining Big.
The DMB System is a collection of data analysis tools.
It is an implementation of a set of algorithms for automatic classification.
-
Because our applications are particulary efficient and accurate for classifying.
-
The DMB format uses the standard CSV style (Comma-Separated Values).
The first row of the file contains the names of the samples (they must be all different). The second row contains the class name to which the sample belongs to.
The first column contains the feature name (the variable of the experiments). The second column describes the variabile type: NUM, if it assumes numerical values, or ORD, if it assumes values from finite set of elements.
Here is an example:
Exp 1 | Exp 2 | Exp 3 | Exp 4 | ||
class | A | B | A | B | |
Gene 1 | NUM | 1.50 | 0.42 | 0.70 | 1.05 |
Gene 2 | NUM | 1.00 | 1.40 | 0.70 | 0.65 |
Pos 1 | ORD | A | T | G | T |
Pos 2 | ORD | A | A | C | A |
What is the input format of BLOG?
-
The input format of BLOG is standard
FASTA format file with the barcode sequences.
The second field of the description line should contain the specimen class. For instance: >EM1232|squalus edmundsi|..|...
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
>EM1237|squalus mitsukuri|..|...
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
You can download an input example file here.
What is the output of BLOG?
-
Output files are:
testfile.bdp1.txt | #bdp1 (see bmc article) |
testfile.bdp2.txt | #bdp2 (see bmc article) |
testfile.bdp3.txt | #bdp3 (see bmc article) |
testfile.bdp4.html | #bdp4 (a summary of the outputs) |
testfile.stats.html | #the test set classification statistics |
trainfile.stats.html | #the train set classification statistics |
testfile.confmatrix.html | #the test set confusion matrix |
trainfile.confmatrix.html | #the train set confusion matrix |
trainfile.formulas.csv | #the logic formulas in csv format |
What is the input format of MALA?
-
The input format of MALA is DMBCSV format.
You can download an input example file here.
-
Output files are:
The logic classification formulas
The classification statistics
-
The upload limit is set to 6 MB. For larger files please contact
us.
Can I customize the DMB methods in more detail?
-
Sure, feel free to contact us for customized data analysis at DMB mail.
Why does it take so long to obtain results?
-
Maybe our servers are busy. Please be patient. Take a coffee, the results will arrive soon..
I don't understand the algorithms!
-
Read the system description carefully. Check your input!
Otherwise contact us at DMB mail.
Can I download the source code?
-
Contact us at DMB mail.