Using AIAI’s CBR Tool

Generally, you need 3 files to do anything:

A .key file that describes the data
A .cbr file that has comma delimited data to retrieve previous cases from (“training” data)
A .tst file that has comma delimited data to use as test data
Each of these files is loaded using its own button on the toolbar – and should be loaded in the order above

The .key file has a specific format to be followed:

First line in every example key file I’ve seen merely has a zero
Second line in every example key file I’ve seen has: "(None)"
Third line has the CBR Threshold (in quotes). Instead of using a specified number of neighbors K, AIAI’s version uses all cases whose similarity is measured to be greater than this threshold. E.g.:

“88” – means better than 88 percent match

One line for each attribute in the data file. Info for each attribute includes (in order):

The attribute name (in quotes),
The type of data / matching. Most common possibilities are:

String Exact – value either matches or not
String Fuzzy – some form of partial matching is done
Number Exact – value either matches or not – only appropriate if values can be from among a small number of integers
Number Fuzzy – some form of partial matching is done

Weighting – This should always be 0 for the ID attribute and for the “Goal” attribute (attribute to be predicted). For a first go, this should probably be 1 for all other attributes
Whether the attribute is a goal (“True”) or not (“False”)

The .key file can be created semi-automatically from a comma-delimited file.

From AIAI CBR, choose Options
Select the Keys/Files/Stats tab
In the File Management group, to the left of Templates, click on the 3^rd icon “Create a Template”
Find the file
Description of the data will be guessed at by the software; you will need to make changes (double click on a value to change):

Operators – make sure that they are appropriate (if numbers can be a range of matching instead of all or nothing matching, make sure that “Number Fuzzy” is specified; Decide if character strings should have partial matching – and specify String Exact or String Fuzzy)
Weighting – make sure that the “ID” attribute and the attribute that is the “answer” have weights set to 0. Any attributes that shouldn’t be used to judge similarity should have weights set to 0. Initially, other weights should be 1
Goals – make sure that the “answer” attribute has Goals = True; all others should be False
Make sure you set the ID attribute using the button provided

Save using the diskette icon

The CBR Monitor toolbar button allows you to see progress while experiments run (but only shows the most recent test)
The CBR Summary toolbar button allows you to see summary results during an experiment (shows test case answer and prediction, no details of which training cases match)
The Run Batch CBR toolbar button allows you to run an experiment (automatically prompting you for where to save a (non-detailed) trace of the experiment).
The Run CBR toolbar button allows you to run one test case at a time (but doesn’t provide a trace of what happened)
Options – some options are turned off in the downloadable version

CBR Threshold can be changed here using “spinner” control under Thresholding/ K Neighbourhood
ZoomHelp On/Off under System Optionsprovides some pop-up help on some features of the software as you hover over them
Diagnosis Type allows various ways of voting for an answer – the default “Probabilistic Curve” appears to do a weighted vote. “Best Match” uses only the best match. “One Case One Vote” does non-weighted vote. I believe that “Identify Outliers” attempts to discredit some cases.
Adaptive Modifiers attempt to do adaptation instead of merely voting for the answer. I haven’t played with them enough to know how well they do, and I don’t know the theory behind them.

Keys/Files/Stats tab allows you to change info from the key file (and save the changes if you like), specify files (as from the toolbar) and see statistics about an experiment.
Learning tab – all capabilities turned off in the downloaded version

If the .cbr and .tst files contain the same cases, the test is a “leave one out” test – the case being tested on is temporarily left out of the training data so that it cannot be used (unfairly) to get the correct answer.

You could use the software to actually advise problem-solving instead of just running experiments. Have the goal in the .tst file be “unknown” or something like that. The experiment will say you got them all wrong (since “unknown” will not be predicted), but look at the predicted goal for the CBR recommendation.