The focusing algorithm

The focusing algorithm (Young, Plotkin and Linz, 1977) is considered by researchers to be one of the most powerful techniques to learn concepts. The algorithm aims to produce a definition that is consistent with all given positive data, but none of the negative. The focusing algorithm uses a version space search though the concept space. The concept space covers all the possible concept descriptions. The version space only covers concept descriptions which are consistent with the given training instances. The focusing algorithm is similar to Mitchell’s candidate elimination algorithm (Mitchell, Keller & Kedar-Cabelli, 1986)

In the focusing algorithm the concept being learned is represented by a set of trees. There is one tree for each attribute used to describe the concept.

Suppose we are concerned with the sizes of vehicles used to reach a particular destination. Attributes in this example are ‘vehicle type’ and ‘size’. For each of these attributes a tree is required covering all possible instances that can occur. These may be as shown in figure 3.01. These two trees represent the concept space for the example.

 

 

Figure 3.01 : The concept space for the vehicle size example

 

An upper boundary (U) is initially placed above each tree and a lower boundary (L) is placed below. The version space lies between the U and L boundaries and constitutes the search space for that concept. Concepts are learned my moving the L and U boundaries upward and downwards respectively. Initially the version space covers the entire concept space.

The steps of the focussing algorithm are as follows :

 

  • For positive instances:
    • For each tree move L to be coincident with the instance node or to a node above the new and previous instance nodes
  • For negative instances:
    • Select a tree in which U is above the instance-node
     
    • Move U towards L to a node above or coincident with L which is not above the instance node.
     
    • If there is more than one way to do this create a set of version spaces to explore in parallel. This is called a far miss otherwise it is called a near miss.

     

    Example:

    Let a concept space be as described above and let the training instances be as follows :

     

    [med moped]

    -

    positive example of the concept
    [tiny car]

    -

    positive example of the concept
    [big jet]

    -

    negative example of the concept
    [med car]

    -

    positive example of the concept
    [med jet]

    -

    negative example of the concept

     

    The first training example [med moped] is a positive example and so the lower bound moved to coincident with med and moped. The have been no previous training examples at this stage. The version space after the first training example is shown in figure 3.02.

     

     

    Figure 3.02 : The version space after the first training example

     

    The second training example [tiny car] is again a positive example of the concept. As the modes of transport mentioned in training examples so far are car and moped the lower bound moves to vehicle so as to cover both of these. Similarly the lower bound in the size tree moves to small. The resulting version space is shown in figure 3.03.

     

     

    Figure 3.03 : The version space after the second training example

     

    The third training example [big jet] is a negative example. A tree is selected in this case the size tree and U moved to small node. It cannot move to the large node as this is above the training instance. Figure 3.04 shows the version space after the third training example.

     

     

    Figure 3.04 : The version space after the third training example

     

    The fourth training example [med car] is again positive but does not result in any changes to the bounds as the lower bound already covers this instance.

    The final training example [med jet] is a negative example. A tree is selected in this case the transport tree and the upper bound moved to vehicle. It cannot move to plane as this is directly above the instance. The version space after all the training examples have been processed is shown in figure 3.05.

     

     

    Figure 3.05 : The version space after the final training example

     

    There are no more training examples to be processed. As can seen in both trees the upper boundary U and the lower boundary L have converged at a node. Finally we can describe the concept being learned here as [small vehicle]. This description covers all the positive and none of the negative training data.

    Table of Contents

    References



    Authored by Serengul Smith

    E-mail to: serengul1@mdx.ac.uk
    School of Computing Science Middlesex University
    Revised: September 1998