What is it all about ?
Previous  Top  Next

Data-based modelling
The following figure illustrates the basic task of data-based modelling. Given is a system with an input vector x and the corresponding output y. The input vector consists out of one or several single input variables. The aim is to build up a model, which predicts the unknown output value given the input vector. A tuple P=(x,y) with the input vector and the output value is also called data tuple.




Nominal, Ordinal and Continuous Variables
With respect to the possible values there are three different types of variables.
 
·Nominal variables can only have symbolic values, that cannot be ordered with respect to a greater-less relation. An example for a nominal variable is the color of an object, which can have the different symbols red, green and blue.  
 
·Ordinal variables can only have symbolic values, but, in contrast to nominal variables, they can be ordered with respect to a greater-less relation. An example for an ordinal variable is a temperature that is measured with the qualitative terms cold, normal, warm and hot. Another example is the age of a person that is measured in years. Within the PNC2 Rule Induction System, ordinal variables with just a few different symbols, as the above example with the temperature, are treated as nominal. But ordinal variables with many different symbols, as the above example with the age, are treated as continuous.  
 
·Continuous variables can have arbitrary real values - only limited by the precision of the measuring device. An example for a continuous variable is the temperature measured in centigrade.  


Classification and regression tasks

By the means of the output variable's type there are two fundamentally different types of learning. If the output is nominal, one has got to deal with a classification task. Whereas if the output is continuous, one has got to solve a regression task. The PNC2 Rule Induction System is primarily intended for classification tasks.



Prediction accuracy
Usually the prediction accuracy of a learned model is evaluated with respect to a new and unseen test data sample. Therefore, based upon the particular input vectors, a prediction of the output value is estimated for each test data tuple. Then the difference between the real and the predicted output values is evaluated and summarized into a single loss function value as follows:
 
·Classification tasks  
Mean classification error (MCE), i.e. the mean number of miss-classifications done  
 
·Regression tasks  
Mean absolute error (MAE)