Relational Datamining

1.Intro

Much of the data and processes we are trying to model are relational in nature.  The data tables often relate in a one to many fashion.  An example of a one to many relationship is one person can own multiple books. This is troublesome for most statistical and data mining techniques.  They require a flat file where one row contains all the information required to processes that row.  In relational data this is not true. Relational data mining holds the promise of improved pattern discovery in relational data.

2. East-West Train

Ryszard Michalski in 1980 helped bring the issue of relational data mining into the attention of data miners with his East-West Train challenge.  In this challenge he gave ten trains each pulling a diverse set of cars.   The challenge was to design an algorithm that would predict which train is traveling East by the type of car(s) it is pulling. This is a relational problem because each train is pulling many cars each of which has many attributes.  Using traditional methodology of pattern discovery this problem would take an enormous amount of computational time as every permutation of the data would have to have a corresponding flat file.  From the challenge many new techniques were created.

Figure 1. Michalski s Original Ten Trains

Solution

One rule was simple, if the train is pulling a small, enclosed car it is traveling east.

3. ILP

Inductive Logic Programming is one way of answering Michalski s challenge.  ILP combines logic with machine learning algorithms in the hopes of greatly reducing the number of searches required to intelligently explore the information space.

a) FOIL

First order inductive leaner (FOIL) , a popular ILP.

b) LINUS

c)  Progol

Comparable in performance to FOIL.