Friday, February 11, 2011

Embedding Weka in a Java Application

Discovered a book, Practical Artificial Intelligence Programming in Java, from here:
http://www.markwatson.com/.  Lots of good stuff in this, but the bit that really got my attention was how easy and clean the explanation was for how to embed Weka into a Java application.

If you are not familiar with Weka, I highly recommend that you go here to learn more about it: http://www.cs.waikato.ac.nz/ml/weka/

In summary, the steps to use and modify the example were as follows:
  1. Create a project
  2. Add weka.jar to the build path
  3. Copy and tweak the code from the example
  4. Examine results

Here is a screen shot of adding weka.jar to the project:


Here is the Java code (modified only slightly from the example referenced above):

import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.ADTree;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.unsupervised.attribute.Remove;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class main {
    public static void main(String[] args) throws Exception {

        Instances training_data = new Instances(new BufferedReader(
                new FileReader("test_data/weather.arff")));
        training_data.setClassIndex(training_data.numAttributes() - 1);

        Instances testing_data = new Instances(new BufferedReader(
                new FileReader("test_data/weather.arff")));
        testing_data.setClassIndex(training_data.numAttributes() - 1);

        String summary = training_data.toSummaryString();
        int number_samples = training_data.numInstances();
        int number_attributes_per_sample = training_data.numAttributes();
        System.out.println("Number of attributes in model = "
                + number_attributes_per_sample);
        System.out.println("Number of samples = " + number_samples);
        System.out.println("Summary: " + summary);
        System.out.println();

        // J48 j48 = new J48();
        ADTree adt = new ADTree();

        Remove rm = new Remove();

        rm.setAttributeIndices("1");

        FilteredClassifier fc = new FilteredClassifier();
        fc.setFilter(rm);
        fc.setClassifier(adt);

        fc.buildClassifier(training_data);

        for (int i = 0; i < testing_data.numInstances(); i++) {
            double pred = fc.classifyInstance(testing_data.instance(i));
            System.out.print("given value: "
                    + testing_data.classAttribute().value(
                            (int) testing_data.instance(i).classValue()));
            System.out.println(". predicted value: "
                    + testing_data.classAttribute().value((int) pred));

        }
    }
}

Here are the results:


























I used the ADTree classifier on the weather.arff demo data.  This is a very simple example, and I hope in the future to go into more detail about how machine learning tools like Weka can be used as part of an agent based programming approach.