Using AIAI’s CBR Tool
- Generally, you need 3 files to do anything:
- A .key file that describes the data
- A .cbr file that has comma delimited data to retrieve
previous cases from (“training” data)
- A .tst file that has comma delimited data to use as test
data
- Each of these files is loaded using its own button on the
toolbar – and should be loaded in the order above
- The .key file has a specific format to be followed:
- First line in every example key file I’ve seen merely has
a zero
- Second line in every example key file I’ve seen has: "(None)"
- Third line has the CBR Threshold (in quotes). Instead of
using a specified number of neighbors K, AIAI’s version uses all cases
whose similarity is measured to be greater than this threshold. E.g.:
“88” – means better than 88
percent match
- One line for each attribute in the data file. Info for
each attribute includes (in order):
- The attribute name (in quotes),
- The type of data / matching. Most common possibilities
are:
- String Exact – value either matches or not
- String Fuzzy – some form of partial matching is done
- Number Exact – value either matches or not – only
appropriate if values can be from among a small number of integers
- Number Fuzzy – some form of partial matching is done
- Weighting – This should always be 0 for the ID attribute
and for the “Goal” attribute (attribute to be predicted). For a first
go, this should probably be 1 for all other attributes
- Whether the attribute is a goal (“True”) or not
(“False”)
- The .key file can be created semi-automatically from a
comma-delimited file.
- From AIAI CBR, choose Options
- Select the Keys/Files/Stats tab
- In the File Management group, to the left of Templates,
click on the 3rd icon “Create a Template”
- Find the file
- Description of the data will be guessed at by the
software; you will need to make changes (double click on a value to
change):
- Operators – make sure that they are appropriate (if
numbers can be a range of matching instead of all or nothing matching,
make sure that “Number Fuzzy” is specified; Decide if character strings should
have partial matching – and specify String Exact or String Fuzzy)
- Weighting – make sure that the “ID” attribute and the
attribute that is the “answer” have weights set to 0. Any attributes
that shouldn’t be used to judge similarity should have weights set to 0.
Initially, other weights should be 1
- Goals – make sure that the “answer” attribute has Goals
= True; all others should be False
- Make sure you set the ID attribute using the button
provided
- Save using the diskette icon
- The CBR Monitor toolbar button allows you to see progress
while experiments run (but only shows the most recent test)
- The CBR Summary toolbar button allows you to see summary
results during an experiment (shows test case answer and prediction, no
details of which training cases match)
- The Run Batch CBR toolbar button allows you to run an
experiment (automatically prompting you for where to save a (non-detailed)
trace of the experiment).
- The Run CBR toolbar button allows you to run one test case
at a time (but doesn’t provide a trace of what happened)
- Options – some options are turned off in the downloadable
version
- CBR Threshold can be changed here using “spinner” control
under Thresholding/ K Neighbourhood
- ZoomHelp On/Off under System Optionsprovides some pop-up
help on some features of the software as you hover over them
- Diagnosis Type allows various ways of voting for an
answer – the default “Probabilistic Curve” appears to do a weighted vote.
“Best Match” uses only the best match. “One Case One Vote” does
non-weighted vote. I believe that “Identify Outliers” attempts to
discredit some cases.
- Adaptive Modifiers attempt to do adaptation instead of
merely voting for the answer. I haven’t played with them enough to know
how well they do, and I don’t know the theory behind them.
- Keys/Files/Stats tab allows you to change info from the
key file (and save the changes if you like), specify files (as from the
toolbar) and see statistics about an experiment.
- Learning tab – all capabilities turned off in the
downloaded version
If the .cbr and .tst files contain the same cases, the test
is a “leave one out” test – the case being tested on is temporarily left out of
the training data so that it cannot be used (unfairly) to get the correct
answer.
You could use the software to actually advise problem-solving
instead of just running experiments. Have the goal in the .tst file be
“unknown” or something like that. The experiment will say you got them all
wrong (since “unknown” will not be predicted), but look at the predicted goal
for the CBR recommendation.