Parent Topic: ISOCLUS

ALGORITHM

The ISOCLUS program is based (with minor modifications) on the ISODATA method described in the following publication:

 Tou, Julius T. and Rafael C. Gonzalez.  1974.  Pattern 
 Recognition Principles.  Addison-Wesley Publishing Co.
The ISODATA method is similar in principle to the K-means procedure in the sense that cluster centers are iteratively determined sampled means. Unlike the latter algorithm, however, ISODATA represents a fairly comprehensive set of additional heuristic procedures which have been incorporated into an interactive scheme. Up to 16 image channels and 255 clusters can be found in this program. The program reads in image data from up to 16 specified channels on a specified file.

Due to the large amount of memory required, ISOCLUS will sample a subset of the image data during cluster mean calculations to generate a histogram. The amount of sampling depends on the amount of image data. For example, for a 1024x1024 image data, ISOCLUS will sample every other pixel during calculation of cluster means. However, when writing results to an output channel, all pixels will be classified.

The ISOCLUS algorithm is described as follows:

Before the algorithm is executed, it is necessary to specify a set Nc of initial centres, Z1, Z2, ..., ZN. This set, which need not necessarily be equal in number to the desired cluster centres, can be formed by selecting samples from the given set of data.

For a set of N samples, {X1, X2, ..., Xn}, ISODATA consists of the following principal steps:

STEP 1: Specify the process parameters.

 NUMCLUS = number of cluster centres desired;
 SAMPRM  = a parameter against which the number of samples
             in a cluster domain is compared;
 STDV    = Standard Deviation parameter;
 LUMP    = Lumping Parameter;
 MAXPAIR = Maximum number of pairs of cluster centres which
             can be lumped;
 MAXITER = Maximum number of iterations allowed.
STEP 2: Distribute the N samples among the present cluster centres, using the following relation:

          XinSj, if ||X-Zj|| < ||X-Zi||,
i=1,2,...,Nc; i not equal j for all X in the sample set. In this notation, Sj represents the subset of samples assigned to cluster center Zj.

STEP 3: Discard sample subsets with fewer than SAMPRM members; that is,

 if for any j, Nj < SAMPRM, then discard Sj and reduce Nc by 1.
STEP 4: Update each cluster center Zj, j=1,2,...,Nc, by setting it equal to the sample mean of its corresponding set Sj; that is,

          Zj = 1/Nj  SUM X,  j=1,2,...Nc
                    XinSj
where Nj is the number of samples in Sj. If samples were deleted in step 3, go to step 2.

STEP 5: Compute the average distance Dj of samples in the cluster domain Sj from their corresponding cluster centre, using the following relation:

          Dj = 1/Nj  SUM ||X-Zj||, j=1,2,..Nc
                    XinSj
STEP 6: Compute the overall average distance of the samples from their respective cluster centres, using the following relation:

                      Nc
           Do = 1/N  SUM Nj Dj
                     j=1
STEP 7: If this is the last iteration, set LUMP = 0 and go to step 11; if Nc is less than or equal to NUMCLUS / 2, go to step 8; if this is an even-numbered iteration, or if Nc is greater than or equal to 2 (NUMCLUS), go to Step 11; otherwise, continue.

STEP 8: Find the standard deviation vector Vj = (V1j, V2j,.. Vnj)' for each sample subset, using the relation

        Vij = sqrt( 1/Nj SUM (Xik-Zij)**2) i=1,2,..n;
                                           j=1,2,..Nc
where n is the sample dimensionality, Xik is the ith component of the kth sample in Sj, Zij is the ith component of Zj, and Nj is the number of samples in Sj. Each component of Vj represents the standard deviation of the samples in Sj along a principal coordinate axis.

STEP 9: Find the maximum component of each Vj, j=1,2,..Nc, and denote it by Vjmax.

STEP 10: If for any Vjmax, j=1,2,...,Nc, there is Vjmax > STDV and (a) Dj > Do and Nj > 2(SAMPRM+1) or (b) Nc is less than or equal to NUMCLUS/2, then split Zj into two new cluster centres Zj+ and Zj-, delete Zj, and increase Nc by 1. Cluster centre Zj+ is formed by adding a given quantity Gj to the component of Zj which corresponds to the maximum component of Vj; Zj- is formed by subtracting Gj from the same component of Zj. One way of specifying Gj is to let it be equal to some fraction of Vjmax. That is, Gj = k Vjmax, where k is greater than 0 and less than or equal to 1. In this program, k=0.5. If splitting took place in this step, go to Step 2; otherwise continue.

STEP 11: Compute the pairwise distances Dij between all cluster centres.

        Dij=||Zi-Zj||, i = 1,2,..,Nc-1; j=i+1,...,Nc
STEP 12: Compare the distance Dij against the parameter LUMP. Arrange the L smallest distances, which are less than LUMP, in ascending order:

                [Di1j1, Di2j2, ...,DiLjL]
where Di1j1 < Di2j2 < ... DiLjL and L is the maximum number of pairs of cluster centres which can be lumped together. The lumping process is discussed in the next step.

STEP 13: With each distance Diljl there is an associated pair of cluster centres, Zil and Zjl. Starting with the smallest of these distances, perform a pairwise lumping operation according to the following rule:

For l = 1,2,....,L, if neither Zil nor Zjl has been used in lumping in this iteration, merge these two cluster centres using the following relation:

        Zl* = [1/(Nil+Njl)] * [ Nil(Zil) + Njl(Zjl)]
Delete Zil and Zjl and reduce Nc by 1.

It is noted that only pairwise lumping is allowed and that a lumped cluster centre is obtained by weighting each old center by the number of samples in its domain. Experimental evidence indicates that more complex lumping can produce unsatisfactory results. It is also important to note that, since a cluster centre can be lumped only once, this step will not always result in L lumped centers.

STEP 14: If this is the last iteration, the algorithm terminates. Otherwise go to Step 1 if any of the process parameters require changing at the user's discretion, or go to Step 2 if the parameters are to remain the same for the next iteration.

Please refer to Tou for more details.


Parent Topic: ISOCLUS
About PCI Help Gateway