KNN -- K-Nearest Neighbour Supervised Classifier

Performs supervised classification using the K-Nearest Neighbour method under either resubstitution or independent classification paradigms.

See Also: FUZCLUS, KCLUS, ISOCLUS, NGCLUS

PARAMETERS

KNN is controlled by the following global parameters:

Name     Prompt                                  Count     Type
FILE     Database File Name                      1-64      Char
DBSA     Database Subarea Channel                1-16      Int
DBIC     Database Input Channel List             1-16      Int
DBBS     Database Class Bitmap Segments          1-16      Int
DBOC     Database Output Channel List            1         Int
MASK     Area Mask (Window or Bitmap)            0-4       Int
KVALUE   Number of Nearest Neighbours            1         Int
MAXSAM   Maximum number of samples per class     0-1       Int
REPORT   Report Mode: TERM/OFF/filename          0-64      Char

FILE

Specifies the name of the PCIDISK file in which the training set signature data, the validation set data, and the class bitmaps reside.

 EASI>FILE="filespec

DBSA

Specifies the database channels containing the classified pixels for the training set data. These channels can either be classified channels created by one of the classification programs (i.e. ISOCLUS) or multispectral images.

 EASI>DBSA=i,j,...,p

DBIC

Specifies the database channels that need to be classified. DBIC must specify the same number of channels as were specified in DBSA.

 EASI>DBIC=i,j,...,p

DBBS

Specifies the bitmaps (type 101) containing training sites to use in the classification.

 EASI>DBBS=i,j,...,p
The DBBS values are segment numbers of the database bitmap segments. A range of bitmap segments can be specified with negative values. For example: {1,-4,10} is internally expanded to {1,2,3,4,10}.

DBOC

Specifies the database channel to hold the resulting theme map. Only one channel may be specified. The theme map will contain as many theme classes as there are DBBS values.

 EASI>DBOC=i

MASK

Specifies the area of the unclassified set channels which will be classified. This can be one of the following:

 EASI>MASK=xoff,yoff,xsize,ysize      | use rectangular window
 EASI>MASK=b                          | process only under bitmap b
 EASI>MASK=                           | use entire area

KVALUE

Specifies the number of neighbours (k) to be used. A k value between 1 and 10 is usually effective. KVALUE must be a positive integer.

 EASI>KVALUE= i

MAXSAM

Specifies the maximum number of samples per training class.

 EASI>MAXSAM=i
 EASI>MAXSAM=                   | defaults to 200

REPORT

Specifies the file to which the generated report should be appended.

 EASI>REPORT="filename"
Note: The following names have special meaning:

 EASI>REPORT="TERM"     | generates reports on your terminal
 EASI>REPORT="DISK"     | generates reports on file "IMPRPT.LST"
 EASI>REPORT="OFF"      | usually cancels report generation, but
                        | KNN forces REPORT to terminal output
 EASI>REPORT=           | defaults to generate reports on your terminal

DETAILS

KNN performs non-parametric supervised classification using the K-Nearest Neighbour (k-NN) algorithm. Both training and unclassified data sets must be furnished as image channels and not as class signature segments. Up to 16 image channels can be analysed (i.e. 16 features) and 64 classes found using this program.

The training set is created by reading in all image data from the DBSAs that are contained under the specified database segment bitmaps. Each database bitmap corresponds to one class which is labelled using the bitmap segment number.

Samples from the unclassified set channels (DBIC) that lie under the area specified by MASK are classified. Classification is performed by computing the Euclidean distance between the unclassified sample's feature vector and each training set sample's feature vector. The labels of the k (specified by KVALUE) closest training samples are found. The unclassified sample is assigned to the class which has the majority of the k labels. In the event of a tie, the algorithm chooses the class with the label with the nearest distance encountered. Typical k values range from 1 to 10, with larger values necessary for noisy or high dimensionality data.

It is possible to use the same data for both training and unclassified sets. This is considered classification by resubstitution. The sample being classified is automatically excluded from the list of potential k-NNs during resubstitution.

The k-NN classifier can involve a large amount of computation as each unclassified pixel is compared to each training pixel. Users should take appropriate care in creating the database signature bitmaps so that they are representative of each cover class. The user can also specify a maximum population size for any class training set with the MAXSAM parameter. A default value of MAXSAM=200 is used.

The k-NN classifier has been shown to asymptotically approach the lower bound of the Bayes optimal error. This property applies to both parametric and non-parametric class conditional probability density functions. In addition, the k-NN classifier does not demand global dimensionality reduction of the training feature space to ensure accurate and precise results. One should refer to texts such as Fukunaga for specific information on the appropriate design of a k-NN classifier, especially the choice for k and MAXSAM.

Reference:

 K. Fukunaga. Introduction to Statistical Pattern Recognition.
 1990. Academic Press, Boston.

REPORT

An example output listing produced by k-NN is shown below. This listing can be directed to any report device (REPORT). In this example, we have produced a 6-class thematic map using as input 6 database bitmap segments, two input database subarea channels, and two database input channels. The k=3 nearest neighbour classifier was used. Each row in the table pertains to a particular class. Each column contains the following information:

 Seg    segment number of class bitmap.
 Name   Name of class bitmap segment.
 Code   Identification value (code) of class bitmap
        (pixel value used to encode theme map)
 Pixels Number of pixels in class 
 %Image Percent of image covered by class


 KNN  K-Nearest Neighbour Classifier  V5.3 EASI/PACE   12:11 30-Nov-93

 irvine.pix    [11 channels     512P 512L] 

 Seg   Name    Code    Pixels    %Image

 9    Water1    0         553      5.53
 11   Urban     1        2342     23.42
 12   Range     2        4033     40.33
 13   Crop1     3        2532     25.32
 14   Crop2     4         169      1.69
 15   Crop3     5         129      1.29
 16   Forest    6         242      2.42

      Total             10000    100.00

MONITOR

Program progress can be monitored by printing the percentage of pixels under MASK that are already classified, in odometer fashion. A system parameter, MONITOR, controls this activity:

 EASI>MONITOR="ON"      | turn monitor ON (default)
 EASI>MONITOR="OFF"     | turn monitor OFF (recommended if 
                        | running in batch/background mode)

EXAMPLE

Label a portion of irvine.pix using the k-NN classifier. The top left corner of the region to be labelled is at pixel 10, line 20. The region is rectangular with size 100 pixel by 100 lines. Use Landsat MSS channels 2 and 4 as features for the classifier.

Use the bitmaps in segments 9,11,12,13,14,15, and 16 to indicate training subareas for the classifier. Set the value of k to 2 and limit the number of samples per class to 220. Save the output theme map to channel 8.

 FILE= "irvine.pix"     | Database file name is irvine.pix
 DBSA= 2,4              | Select image channels for training. 
 DBIC= 2,4              | Select image channels to be classified 
 DBBS= 9,11,-16         | Identify the training set classes.
 DBOC= 8                | Select channel for output theme map
 MASK= 10,20,100,100    | Select the window to be classified
 KVALUE= 2              | Select 2 nearest neighbour classifier
 MAXSAM=220             | Specify the maximum training class size
 REPORT= ""             | Send report to terminal
The following report should appear upon completion:

 KNN  K-Nearest Neighbour Classifier  V5.3 EASI/PACE   12:11 30-Nov-93

 irvine.pix    [11 Channels     512P 512L] 

 Seg   Name    Code    Pixels    %Image

 9    Water1    0         553      5.53
 11   Urban     1        2342     23.42
 12   Range     2        4033     40.33
 13   Crop1     3        2532     25.32
 14   Crop2     4         169      1.69
 15   Crop3     5         129      1.29
 16   Forest    6         242      2.42

      Total             10000    100.00
Note that the user could have used a bitmap segment to specify the region to be classified rather than a window. The bitmap could be created by first using the DCP program to trace its outline on a graphic plane while the image is displayed. The graphic plane could then be saved as a bitmap segment using the VIB program. The use of a bitmap segment as a mask is necessary when classifying a non-rectangular region. This mask could also be created using ImageWorks.


About PCI Help Gateway