Norbert Jankowski, Krzysztof Grąbczewski, Rafał Adamczak |

Department of Informatics,

Nicolaus Copernicus University,

- Rules and problems with understanding of data

- Application and optimization of rule-based classifiers

- Confidence intervals and probabilistic confidence intervals

- Real-life example - psychometric data

- Discussion

- ML learning camp: NN are no good! Black boxes taking decisions.
- Knowledge in neural networks: opaque, hidden, incomprehensible.
- Rules forever!

Are rules indeed the only way to understand the data?

What type of explanation is satisfactory? Interesting cognitive psychology problem.

Knowledge accessible to humans: symbols, similarity to prototypes, visualization.

Psychology: examplar and prototype theories of categorization; rules only in logic is simple.

- IF the number of rules is relatively small and
- IF the accuracy is sufficiently high.
- THEN rules may be an optimal choice.

Crisp logical rules are most desirable but ...

- only one class is predicted - black-and-white picture
- reliable crisp rules may reject some cases as unclassified
- discontinous cost function, only non-gradient optimization

Fuzzy rules - continuous membership functions.

- not so comprehensible as the crisp rules
- discontinous cost function, only non-gradient optimization
- involve additional positions/shapes parameters -

danger of overparameterization

Fixed set of membership functions with predetermined shapes - bad idea.

Curse of dimensionality: *k* linguistic variables in *d* dimensions gives *k ^{d}* areas.

Context-dependent linguistic variables - adapt membership functions in each rule.

Interpretation of crisp rules may be misleading.

Crisp rules may be unstable against small perturbations of input values.

Statisticians: rule-based classifiers are unstable.

Probabilities estimated using fuzzy rules change smoothly.

How to find the best fuzziness/precision tradeoff ?

How to understand what the best classifier is doing?

of rule-based classifiers

Methodology of rule extraction:

- Select linguistic variables.

For continuous*x*use*s*(_{k}*X*) true if_{k},X'_{k}*x*in [*X*]._{k},X'_{k} - Extract rules from data using neural, machine learning or statistical techniques;

explore the simplicity/accuracy rate tradeoff. - Optimize rules and linguistic variables (
*X*intervals) using the extracted rules;_{k},X'_{k}

explore the reliability/rejection rate tradeoff. - Explore the uncertainty of the input values.
- Repeat the procedure until a stable set of rules is found.

This approach leads to the following important improvements for any rule-based system:

- Crisp logical rules are preserved giving maximal comprehensibility.
- Instead of 0/1 decisions "probabilities" of classes
*p*(*C*are obtained._{i }| X; M) - Uncertainties of inputs
*s*provide additional adaptive parameters._{i} - Inexpensive gradient method are used allowing for optimization of very large sets of rules.
- Rules with wider classification margins are obtained, overcoming the brittleness problem.

IF probability of new classes quickly grows (here from 0-33%) with the assumed uncertainty of the measurement (here between 0-3%)

THEN analyze probabilistic confidence levels.

Probabilities of different diagnoses may be interpolated to show change of the mental health over time.

Probabilistic confidence levels allow to see detailed changes.

There are many ways to understand the data: rules, prototypes, visualization.

Only reliable, accurate, stable and sufficiently simple rules are useful.

Unstable sets of rules contain little useful information and may be misleading.

Simplicity/accuracy rate tradeoff should be explored.

Optimization of sets of rules allows to explore reliability/rejection rate tradeoff.

Classification probabilities are important, rules are not sufficient.

The neigborhood of the unknown input should always be explored.

Probabilities of classification should be parametrized by uncertainties of inputs.

Probabilistic confidence intervals enable detailed interpretation of cases.

Exploratory data analysis (visualization) is always worth using.

These methods may be used with any classifier, so why not use the best one?