Extraction of logical rules and selection of features using neural networks

Extraction of logical rules and selection of features using neural networks

UMK - logo

          Włodzisław Duch 

wduch

Computational Intelligence Laboratory,
Department of Informatics,
Nicolaus Copernicus University,

Grudziądzka 5, 87-100 Toruń, Poland.

e-mail: id: wduch, na serwerze fizyka.umk.pl.

WWW: https://www.fizyka.umk.pl/~duch

 


Mushrooms dataset:

Real data from samples of 23 species of mushrooms.

22 attributes and 3 class labels: edible, poisonous, not recommended. 4208 (51.8%) edible, 3916 (48.2%) not.

Attributes: cap shape (6, e.g.. bell, conical,flat...), cap surface (4), cap color (10), bruises (2), odor (9), gill attachment (4), gill spacing (3), gill size (2), gill color (12), stalk shape (2), stalk root (7, many missing values), surface above the ring (4), surface below the ring (4), color above the ring (9), color below the ring (9), veil type (2), veil color (4), ring number (3), spore print color (9), population (6), habitat (7).

Task: identify edible mushrooms, find relevant features. The books says there is no rule ...

Data sample:

Mushroom-1 is: edible,convex,fibrous,yellow,bruises,anise,free,crowded, narrow,brown,tapering,bulbous,smooth,smooth,white,white, partial,white,one,pendant,purple,several,woods

Mushroom-2 is: edible,flat,smooth,white,bruises,almond,free,crowded, narrow,pink,tapering,bulbous,smooth,smooth,white,white, partial,white,one,pendant,purple,several,woods

Mushroom-3 is: edible,bell,smooth,white,bruises,almond,free,close,broad, white,enlarging,club,smooth,smooth,white,white,partial, white,one,pendant,black,scattered,meadows

Mushroom-4 is: poisonous,convex,smooth,white,bruises,pungent,free,close, narrow,white,enlarging,equal,smooth,smooth,white,white, partial,white,one,pendant,black,scattered,urban

Mushroom-5 is: poisonous,convex,smooth,white,bruises,pungent,free,close, narrow,pink,enlarging,equal,smooth,smooth,white,white, partial,white,one,pendant,black,several,urban

Mushroom-8000 is: poisonous,convex,smooth,white,bruises,pungent,free,close, narrow,pink,enlarging,equal,smooth,smooth,white,white, partial,white,one,pendant,brown,scattered,urban

Rule for edible:

IF odor=(almond.or.anise.or.none).and.spore_print_color=not.green THEN edible
48 errors, 99.41% correct

Rules for poisonous - 6 attributes only:

R1) IF odor=not(almond.or.anise.or.none) THEN poisonous
120 errors, 98.52% accuracy

R2) IF spore_print_color=green THEN poisonous
48 errors, 99.41% correct

R3) IF odor=none.and.stalk_surface_below_ring=scaly.and.stalk_color_above_ring=not.brown THEN poisonous
8 errors, 99.90%

R4) IF habitat=leaves.and.cap_color=white THEN poisonous
no errors!


Iris dataset: 150 Iris flowers of 3 kinds, leaf and petal width and length in cm.

5.1,3.5,1.4,0.2, Iris-setosa
4.9,3.0,1.4,0.2, Iris-setosa
4.7,3.2,1.3,0.2, Iris-setosa
4.6,3.1,1.5,0.2, Iris-setosa
5.0,3.6,1.4,0.2, Iris-setosa
5.4,3.9,1.7,0.4, Iris-setosa
4.6,3.4,1.4,0.3, Iris-setosa
5.0,3.4,1.5,0.2, Iris-setosa
4.4,2.9,1.4,0.2, Iris-setosa
4.9,3.1,1.5,0.1, Iris-setosa
6.3,3.3,4.7,1.6, Iris-versicolor
4.9,2.4,3.3,1.0, Iris-versicolor
6.6,2.9,4.6,1.3, Iris-versicolor
5.2,2.7,3.9,1.4, Iris-versicolor
5.0,2.0,3.5,1.0, Iris-versicolor
5.9,3.0,4.2,1.5, Iris-versicolor
6.0,2.2,4.0,1.0, Iris-versicolor
6.1,2.9,4.7,1.4, Iris-versicolor
5.6,2.9,3.6,1.3 ,Iris-versicolor
6.7,3.1,4.4,1.4, Iris-versicolor
5.6,3.0,4.5,1.5, Iris-versicolor
5.8,2.7,4.1,1.0, Iris-versicolor
6.2,2.2,4.5,1.5, Iris-versicolor
5.6,2.5,3.9,1.1, Iris-versicolor
6.3,2.9,5.6,1.8, Iris-virginica
6.5,3.0,5.8,2.2, Iris-virginica
7.6,3.0,6.6,2.1, Iris-virginica
4.9,2.5,4.5,1.7, Iris-virginica
7.3,2.9,6.3,1.8, Iris-virginica
6.7,2.5,5.8,1.8, Iris-virginica
7.2,3.6,6.1,2.5, Iris-virginica
6.5,3.2,5.1,2.0, Iris-virginica
6.4,2.7,5.3,1.9, Iris-virginica
6.8,3.0,5.5,2.1, Iris-virginica
5.7,2.5,5.0,2.0, Iris-virginica
5.8,2.8,5.1,2.4, Iris-virginica
6.4,3.2,5.3,2.3, Iris-virginica
6.5,3.0,5.5,1.8 Iris-virginica

What can we say about such data?

IF (x3 < 2.5) iris-setosa;
IF (x3 > 4.8) iris-virginica
ELSE versicolor

The very simple rules for the Iris dataset (3 errors, 98.0%):

IF (x3=small) iris-setosa;
IF (x3=large.or.x4 =large) virginica
ELSE versicolor