The PNC2 Cluster Algorithm is a supervised hierarchical agglomerative cluster algorithm. Such algorithms start treating each learn data tuple as an own cluster and then merge iteratively two clusters with each other until an abortion criterion is met or until all tuples are merged within one single cluster. In most hierarchical algorithms the clusters to be merged are chosen based on a distance or point density criterion evaluated in the (input) space. Thus these algorithms are capable to capture the structure of the data in the (input) space, but they do not consider, if the merged tuples have the same or at least a similar output value. The PNC2 Cluster Algorithm in contrast executes a merge test, which evaluates, if the merged cluster is a valid rule.
Each cluster is represented by an axis-parallel cuboid given in the input space and by an associated output value. The cuboid is build to include all the input vectors of the data tuples merged within the cluster. A cluster can be transformed to an If-Then rule: The cuboid corresponds to the premise and the associated output value is taken as the conclusion.
Note: The term cuboid fits for continuous inputs only. How symbolic inputs are represented is explained in Representation and distance measure.
The agglomerative clustering is done as follows: First of all, each data tuple is considered as an own cluster by taking the data
tuple's output as the cluster's output value and by building a trivial cuboid, that includes only the data tuple's input vectors. Only clusters with the same output value can be merged. Thus the clusters are assorted in groups according to their output values and it is tried to merge as many clusters as possible within each group. Therefore the two clusters closest to each other and not previously tested are chosen and it is evaluated by the means of the merge test, if the merged cluster is a valid rule. The clustering within a group is finished if there are no possible pairs of two clusters not tested anymore.
A prediction of the output value given an input position is made by using a weighted average of the output values of the clusters, whose cuboids are nearest to the given input position. Thus the PNC2 Cluster Algorithm behaves like a rule-based system if the input position is inside of a cuboid. If no rule is active, i.e. if the input position is outside of any cuboid, this procedure acts like a k-nearest-neighbor approach with the difference, that not distances between an input position and learn data tuples are evaluated, but the distances are determined from the input position to a cuboid. Thus the clusters can be viewed as generalized data tuples.