NGCLUS2 -- Multi-bit Narendra-Goldberg Clustering

Produces non-parametric multi-dimensional clustering, based on the algorithm developed by Narendra and Goldberg. Up to 16 multi-bit input channels can be used. The output is a theme map directed database image channel.

See Also: NGCLUS, ISOCLUS, KCLUS, MLC, AGGREG

PARAMETERS

NGCLUS2 is controlled by the following global parameters:

Name     Prompt                                  Count     Type
FILE     Database File Name                      1-64      Char
DBIC     Database Input Channel List             1-16      Int
DBOC     Clustering Result Output Channel        0-1       Int
MASK     Area Mask (Window or Bitmap)            0-4       Int
CLTHRS   Cluster Neighbour Threshold             0-1       Int
SAMPRM   Minimum Sample Threshold                0-1       Int
SMOOTH   Number of Smoothing                     0-1       Int
SIGGEN   Generate Signatures: YES/NO             1-4       Char
RES      Resolution                              1-0       Int
REPORT   Report Mode: TERM/OFF/filename          0-64      Char

FILE

Specifies the name of the PCIDSK image file for which a histogram file will be generated.

 EASI>FILE="filespec"

DBIC

Specifies the input image channels on FILE for which a histogram file will be generated.

 EASI>DBIC=i,...j,k

DBOC

Specifies the output channel for the clustering results. If no value is specified, the clustering results will not be saved.

 EASI>DBOC=i                      | results saved to channel i
 EASI>DBOC=                       | no results saved
DBOC can be equal to DBIC. Only the area under MASK is written to DBOC.

MASK

Specifies the area in the input channel which should be processed. This can be one of the following:

 EASI>MASK=xoff,yoff,xsize,ysize  | process window
 EASI>MASK=b                      | process only under bitmap
                                  | stored in segment b
 EASI>MASK=                       | process entire channel

REPORT

Specifies the file to which the generated report should be appended.

 EASI>REPORT="filename"
The following names have special meaning:

 EASI>REPORT="TERM"      | generates reports on your terminal
 EASI>REPORT="DISK"      | generates reports on file "IMPRPT.LST"
 EASI>REPORT="OFF"       | switches off report generation
 EASI>REPORT=            | defaults to terminal output
NGCLUS2 generates a report of the total number of clusters and pixels.

CLTHRS

  Valid Values:   0 <= x <= 255
  Default:        <none>
Specifies a cluster threshold distance in grey levels.

 EASI>CLTHRS=n
Two vectors are considered as neighbours when the difference between the vectors in each channel is less than CLTHRS. The default is the difference between the maximum and minimum grey level values in all channels, divided by 64.

SAMPRM

  Valid Values:   x >= 0
  Default:        5
Specifies the minimum number of samples allowed in a clustering, allowing the user to eliminate clusters with very few samples.

 EASI>SAMPRM=n
If the number of samples in a cluster is less than SAMPRM, each sample inside the cluster will be merged into a neighbouring cluster.

SMOOTH

  Valid Values:   x > 0
  Default:        5
Specifies the histogram threshold.

 EASI>SMOOTH=n
If the histogram value of a vector is less than SMOOTH, the value will be replaced by the average histogram value of the vector and its neighbours.

SIGGEN

Specifies whether the signature for each cluster will be generated. The user can utilize the signatures as inputs for the MLC (Maximum Likelihood Classification) program to class other images.

 EASI>SIGGEN="YES       | generate signatures
 EASI>SIGGEN=           | defaults to NO
A maximum of 1000 signatures can be created by this program. Therefore, signatures are not generated for class values greater than 1000.

RES

  Valid Values:   x >= 0
  Default:        1
Specifies the scaling power to scale down the grey level value of each pixel during histogram generation.

 EASI>RES=n
Each pixel's grey level value is divided by 2 to the power of Res. For example, if the input grey level value is 256 and Res is set equal to 2, the resulting gray level value will be 64.

DETAILS

This is a generalized version of the NGCLUS (Narendra and Goldberg) algorithm. This program avoids the four 8-bit channel limitation imposed by the use of a hash table. To allow multi-bit data and more input channels, a hash table is not being used in this program at the expense of computation time. This program is mainly for users who really need to use Narendra and Goldberg's algorithm for more than 4 channels or for multi-bit data. The user is warned that this program will take a lot of computation time. Also, a maximum of 1000 classes can be generated by this program.

This program provides an alternative for unsupervised classification. It is based on the algorithm developed by P.M. Narendra and M. Goldberg. The clustering algorithm operates upon the histogram and isolates the vectors into clusters that are unimodal in the histogram, with the boundaries between clusters running through the valleys in the histogram. This is a reasonable way to characterize the clusters, which can be of any shape. The number of clusters need not be specified a priori and, moreover, the algorithm is noniterative.

Histogram generation is the first step in the histogram clustering procedure. Histogram clustering uses one of several non-parametric histogram-based algorithms for unsupervised image data.

NGCLUS2 creates a new histogram based on image data stored in up to 16 database image channels (DBIC) on a specified database file (FILE).

SMOOTH is a parameter which can be used to smooth the histogram by averaging the histogram values over the different neighbourhoods of each vector. The histogram value of each vector is replaced by the new smoothed value. In this program "adaptive smoothing" is used. Only vectors with histogram values less than SMOOTH will be smoothed. This tends to smooth the histogram only over the low density areas that are prone to noise.

The MASK parameter specifies the area within the input channel which will be processed. Only the area under mask will be classified and the rest of the image will not be processed. If a single value is specified, then this value refers to a bitmap segment, which defines the area to be classified. When four values are specified, these values define the x,y offsets and x,y dimensions of a rectangular window within the image to be classified. If defaulted, the entire database is processed.

It is quite common for satellite images to have a lot of black- filled areas (with zero gray levels) which should not be included in the classification. To solve this problem, the user can first run the program THR by setting the TVAL's minimum and maximum values to 1 and 255, respectively. A bitmap mask is thus created only on the image area.

The user then inputs this bitmap as the MASK parameter in this program.

The result of the clustering is a theme map directed to a specified database image channel (DBOC). A theme map encodes each cluster with a unique grey level. For example, cluster 1 is assigned the grey level 1, and cluster 2 is assigned with the grey level 2. Grey level 0 represents unclassified pixels. Therefore, if the theme map is later directed to the display, a pseudo-colour table should be loaded so that each cluster is represented by a different colour. If no value is specified for DBOC, the clustering results will not be saved.

NGCLUS2 generates a report of the total number of clusters and samples.

More details of Narendra & Goldberg's algorithm can be found in NGCLUS and in the following paper:

 Narendra & Goldberg.  1977.  "A Non-parametric clustering scheme 
 for Landsat".  Pattern Recognition, vol.9.  pp. 207-215.

EXAMPLE

This example generates clusters based on four input channels of the file irvine.pix.

 EASI>FILE="IRVINE.PIX"
 EASI>DBIC=1,2,3,4          | input channels
 EASI>DBOC=7                | output channel
 EASI>MASK=                 | process entire image
 EASI>CLTHRS=1              | maximum neighbour vector difference
 EASI>SAMPRM=10             | at least 10 samples per cluster
 EASI>SMOOTH=3              | maximum histogram value
 EASI>SIGGEN="NO"           | no signature generation
 EASI>RES=1                 | divide each grey level by 2
 EASI>REPORT=               | send report to terminal
 EASI>R NGCLUS2
The following is a sample report produced by NGCLUS2.

RESULTS
-------

 
 Final Results :
 No. of Clusters : 72
 Cluster   Samples  :
 (  1)      6259
 (  2)       973
 (  3)      2603
 (  4)      9016
 (  5)     16281
 (  6)     26444
 (  7)        39
 (  8)    102557
 (  9)        19
 ( 10)       690
 ( 11)      1198
 ( 12)     12435
 ( 13)      3213
 ( 14)       786
 ( 15)     24540
 ( 16)       771
 ( 17)     20122
 ( 18)       259
 ( 19)      3268
 ( 20)       198
 ( 21)       980
 ( 22)       194
 ( 23)      2179
 ( 24)      2189
 ( 25)      1249
 ( 26)       924
 ( 27)       595
 ( 28)       254
 ( 29)       834
 ( 30)       294
 ( 31)       182
 ( 32)       606
 ( 33)      2854
 ( 34)      1744
 ( 35)       233
 ( 36)      1606
 ( 37)      1412
 ( 38)       157
 ( 39)       349
 ( 40)        23
 ( 41)        58
 ( 42)      3909
 ( 43)        23
 ( 44)        36
 ( 45)      1052
 ( 46)        53
 ( 47)       408
 ( 48)      1146
 ( 49)       401
 ( 50)        41
 ( 51)       111
 ( 52)        35
 ( 53)       586
 ( 54)        69
 ( 55)       164
 ( 56)        75
 ( 57)       372
 ( 58)       125
 ( 59)        69
 ( 60)       259
 ( 61)       369
 ( 62)       556
 ( 63)       200
 ( 64)       101
 ( 65)       184
 ( 66)        50
 ( 67)       312
 ( 68)       360
 ( 69)       166
 ( 70)       192
 ( 71)        54
 ( 72)        79
        --------
          262144

 Unclassified :          0

About PCI Help Gateway