
The scope of statistical clustering is limited as the object descriptions are restricted to numeric values. The conceptual clustering algorithm developed by Michalski (1980) overcomes this restriction and allows contextual information to be taken into account.
Conceptual clustering allows object description of the following form to be clustered.
| Attribute | Value |
| Height (M) | Tall |
| Weight (Kg) | Heavy |
| IQ | Average |
This allows attribute description vectors of the following form :
| Object 1 | (Tall | Heavy | Average) |
| Object 2 | (Tall | Heavy | Low) |
| Object 3 | (Short | Light | High) |
.... |
.... |
The Euclidean and city block distance metrics are no longer appropriate. An alternative distance measure that can be used is the number of attributes that two objects do not have in common. In the above example the distance between object 1 and object 2 is 2. The distance between object 2 and object 3 is 3. Clustering using the statistical clustering algorithms can now be used.
Authored by Serengul Smith
E-mail to:
serengul1@mdx.ac.uk
School of Computing Science Middlesex University
Revised: September 1998