Sketch Image

DMIB

Data Mining in Big (DMIB) is a general tool for the deployment of our software for logic data analysis. It is not designed for a specific type of application (as it is the case for MALA and BLOG). Its configuration is a little more complex then - but as usual it required a training set of tagged elements, and indication on the type of input features, some details on running time and solution dimensions. For the input file, follow the description below. The parameters needed for the correct running of DMIB are:

  1. DMBCSV File - The input file using the DMBCSV format
  2. Clustering - Should we perform features clustering? This reduces the number of features to be examinated
  3. Number of Features - The maximum number of selectable features
  4. GRASP Execution Time / GRASP Number of Iterations - The GRASP algorithm will try to execute the selected number of iterations until it does not exceed the selected running time
  5. Sampling Type - SLICING permits to set the train and test percentage. CROSS VALIDATION permits to set the number of subsets to test the goodness of the classification
  6. SLICING
    1. Test Percentage - The percentage of elements in the test set
    2. Train Percentage - The percentage of elements in the training set
  7. CROSS VALIDATION
    1. Number of Subsets - The number of subsets in which the set is divided