## 1.

## INTRODUCTION

Important development has been made on the field of signal and image processing in the last years. Owing to the enormous achievements of the electronic computers, the digital techniques play each time a more important role. Although the limit of capabilities of computers has been shifted far beyond expectations, electronics presents physical limits resulting from interactions between electrons and from the necessity of having permanent conducting lines. Optics offers alternative solutions for interconnections. Moreover, the possibility of realization of operations with the speed of light and the inherent parallelism of optical processors are additional advantages of Optics. Convolution is a well known example of the type of operations which can be ideally performed in simple optical systems. This is the reason for our interest in hybrid optoelectronic systems which are expected to deliver solutions to both general and highly specialized problems in image processing.

In the two last decades, nonlinear image processing has become one of the most fruitful subfields of image processing^{1-12}. Nonlinear filters are inherently locally adaptative, because they are based on the local characteristics of the processed image. As a consequence, these nonlinear filters have very useful properties as, for instance, they simultaneously remove noise and preserve edges in contrast to linear filters. The important issue in nonlinear processing is the search for effective nonlinear processors. The processing can be performed in different ways: digitally, optically, and in hybrid optoelectronic systems. Digital implementation of, for example, stack filters^{7} is connected with technological difficulties and, as a result, the speed of digital operations is limited. Optical implementation of, for instance, median filtering can be realized at a megahertz frame rate but with limited flexibility^{13}. By their nature, nonlinear operations do not match traditional linear and shift invariant optical systems. Hybrid optoelectronic systems are actually promissing because of both their high processing rate and their possible universal and flexible character.

Two important methods of nonlinear image processing are rank-order^{5,7,8,11} and morphological filtering^{3,14,15}. Although they are different from the point of view of mathematical approach, they may lead to similar image modifications.

In morphological filters (MF), additivity and scalar multiplication of linear shiftinvariant filters is replaced by inclusiveness. The inclusion relation means that any MF is an increasing transformation and thus filtering leads to a loss of information. In contrast, linear systems preserve information within their transmission bands. As a consequence, morphological filtering is irreversible, whereas linear filtering is, in principle, reversible. MF present also the property of idempotence, i.e., unchangedness when multiplied by itself. The same property characterizes nonlinear rank-order filters, which after several passes over an image can result in finding the image roots^{11}. Therefore, MF have different properties from those of optical linear filters based on convolution. Nevertheless, the basic morphological operations, i.e., dilation and erosion, can be performed by means of convolution and threshold. Thus, the linear optical systems can perform morphological filtering if they are complemented with electronics in order to carry out the nonlinear thresholding operation. The possibility of optoelectronic implementations of MF results from the study of Maragos and Schafer on relations among morphological, order statistics and stack filters^{9,16}. They proved the equivalence of rank-order and stack filters and those MF that conmute with thresholding. This condition is met in the case of both binary and gray-scale images interacting with binary structuring elements. In both methods, processing of a gray-scale image slice by slice is based on the threshold decomposition concept^{5}, which led to the definition of the stacking property^{7} of Boolean functions. Nonlinear image processing based on the thresholded local convolution approach permits operations on image details of the size smaller than or equal to that of the convolution kernel. The processing results in modifications of local histograms calculated for neighbourhoods contained within the kernel windows.

The purpose of local histogram modifications can be various, examples of which are noise removal, image detail enhancement, skeletonization, and segmentation. In both rank-order and morphological processing the mechanism of detail enhancement is quite similar. The details are extracted as the difference (residue) between the original image and its nonlinearly processed versions, which are low-pass filtered. In rank-order processing the usual low-pass filter is the median one, which neglects extreme image pixel values contained within the local convolution window. In morphological processing the opening and closing transformations have selective low-pass filter properties. In binary images, the opening filters out small sets and small convex details of objects. Thus the gray-scale images are smoothed by the opening owing to removal of convex details that on each grade of gray are thinner than the structuring element. The morphological closing operation is dual to the opening. Therefore in binary images the closing fills in small dark holes within objects and connects closely disjointed parts of objects into one. The gray-scale images are smoothed by the closing owing to removal of concave details that are smaller than the kernel. In terms of intensity the opening removes bright details of an image, while the closing removes dark details. Low-pass filtering is easily performed in an optical system because of its limited modulation transfer function. Thus the efficiency of optical systems in low-pass filtering results in good performance of hybrid high-pass filtering processors. In both methods the size and the shape of preserved details depend on the neighborhoods within which the operations are realized.

There are several optoelectronic implementations of morphological and rank-order nonlinear processors^{13,17,25}. In all of them, because of optical convolution, the digital computations are reduced to calculating the maximum, minimum, and other rank-order values and depend less on the neighborhood shape and size. In the first demonstration of the optoelectronic rank-order processor a binary spatial light modulator (SLM) was employed to introduce simultaneously into an optical convolver all of the binary slices of an input image^{13}. A computer-generated hologram played a double role of an image beam deflector for slices and a structuring element. In this experiment a gray-scale image of 48 × 48 pixels with 16 gray-levels was processed in real time. In another hybrid morphological processor the real-time programmable processing of limited-size images was presented^{18}. A binary input image and a structuring element were introduced into an optical system by means of two SLM’s. The use of a lenslet array illuminator yielded a convolution due to angular projection. The convolution with angular projection was also employed in the morphological processor with a laser beam scanner^{20}. In the morphological processor based on a coherent 4-f type correlator the impulse response of the Fourier-plane holographic filter played the role of a structuring element^{21, 22}. In other realizations of rank-order and morphological processors, noncoherent convolvers by use of either a plane of misfocus or shadow casting were applied^{17,23,25}. The slices of the gray-scale input images were introduced into the convolvers by means of photographic transparencies, a TV monitor, or a SLM. In all of the above-mentioned systems, looping and sequential regime of work were necessary as a consequence of the sequential structure of rank-order and morphological filters on the one hand, and the threshold decomposition concept and stacking property on the other.

Recently, Tasto and Rhodes showed that both rank-order and morphological filtering of threshold decomposed images realized in optoelectronic processors exhibits a high degree of noise immunity and permits high-accuracy processing^{26}. Their assessment as well as the progress in real-time processing techniques encourages continuation of research on hybrid optoelectronic systems for both rank-order and morphological processing.

If image processing is a very important and wide field of research, not less important is the field of pattern recognition, in order to determine the position of a signal target among other objects in a scene. Location of a reference image in a scene may be accomplished by the determination of the position where a function of similarity reaches a maximum or, equivalently, where a given error criterion reaches a minimum. One such error criterion is given by the mean-squared error (MSE) between two functions.

The vast majority of research on image detection has been based on minimizing the MSE, which leads to maximizing the cross-correlation function^{27}. The MSE criterion has been shown to be optimal if the signal to be detected is corrupted with additive Gaussian noise^{28}, but for deviations from this Gaussian assumption other error criteria are more robust. One such criterion, widely used in signal processing and template matching, is the mean-absolute error (MAE) (see Ref. [29], for instance).

In detection tasks the MAE criterion has two key advantages over the MSE criterion^{30}: (a) the MAE increases faster than the MSE when the reference is changed from its optimal matching position, and (b) the MAE criterion is more robust in the presence of non-Gaussian noise distributions and, in particular, in the presence of outliers (such a salt and pepper noise).

Maragos, in Ref. [30], defined a nonlinear correlation named the morphological correlation (MC), that is optimal when using the MAE criterion, i.e., minimizing the MAE is equivalent to maximizing the MC. This correlation is called morphological since it is defined in terms of erosion like minimum correlation-pattern values, so it has some connection with mathematical morphology theory. The term “morphological correlation” is used because the morphological autocorrelation of a function *f*(*x,y*) is equal to the area under the signal obtained by erosion of the function *f*(*x,y*) by a two-point structuring element {(*0,0*),(*x,y*)}. This correlation provides better performance and higher discrimination capabilities in pattern recognition tasks in comparison with linear correlation^{3,16,30,31}.

Analogously to morphological and rank-order operations, MC implementation can be performed by use of the threshold-decomposition concept^{5}. So, the MC between two gray-level functions can be expressed by the use of the threshold decomposition of both functions. It turns to be the sum over all amplitudes of the linear correlations between the thresholded versions (binary slices) of both functions at every gray-level value. Note that this correlation does not fulfill gray-scale mathematical-morphology properties. Nevertheless, following Maragos^{30}, we use the name of morphological correlation because its closeness to morphological erosion.

Up to now, what there is in common between nonlinear morphological image processing and pattern recognition by the use of MC is the threshold decomposition of the gray-level images to be processed, which are decomposed into binary slices. In this paper, we want to take profit of this decomposition in order to introduce optoelectronic processors allowing different operations both in nonlinear image processing and in nonlinear pattern recognition. It is divided in two main blocks, the first devoted to nonlinear image processing using optoelectronic image processors, the second one devoted to different approaches to morphological correlation and the corresponding optoelectronic systems to implement them.

## 2.

## NONLINEAR IMAGE PROCESSING

Rank-order and morphological methods of image improvement can be divided into two broad groups of algorithms, which aim at either image smoothing or enhancement of image details^{8,10,11,14,15,24}. Image-smoothing algorithms are used for removal of noise that has one of several possible properties or that is a mixture of different types of noise. At the same time, information about fine image details is preserved. The noisesuppression algorithms are used for preprocessing of images that afterward are subjects of enhancement operations such as, for instance, histogram modifications or edge extraction. A sequence of proper operations may lead to a considerable image improvement appreciated by a human observer. Application of smoothing and enhancing algorithms may also precede pattern-recognition tasks.

## 2.1

### Rank-Order Algorithms for Enhancement of Image Details

Let {*V*(*k*)} be a discrete input image with Q gray-scale levels of intensity quantization: **k** *=* (*k*_{1}, *k*_{2}) is a vector coordinate of an input image element; *k*_{1} = 1, …, *N*_{1} *and k*_{2} = 1, …, *N*_{2} *N _{1}*

*xN*

_{2}=N is the image matrix size. According to the threshold decomposition concept

^{5}, the kth element

*V*(

*k*) of an input image is represented as a sum of kth elements of all binary slices as

where *Xq*(** k**) is the kth element of a binary slice of an input image obtained through decomposition with a threshold q; that is,

For each slice {Xq(**k**)}, where braces denote the whole set of q-level elements, local operations are performed within a spatial neighborhood S of arbitrary size and shape that is similar for each **k**th input-image element. The spatial neighborhood S is cast by a scanning binary (flat) structuring element that characterizes the local convolution. The local-convolution operation can be very efficiently performed in a computer unless the structuring elements become too large or non rectangular. Alternatively, local convolutions can be accomplished in parallel in optical correlators. We believe that fully programmable correlators for processing of large images by means of large and arbitrarily shaped structuring elements should become feasible soon.

The possibility of parallel optical calculation of local convolutions was the basis of an optical-digital method of local histogram calculation^{23}. This method results from a theorem proved in Ref. [23], which says that the local q-level histogram of an arbitrary neighborhood in an input image is equal to the pointwise difference of the two convolution patterns obtained by convolving the slices at the levels q + 1 and q with a binary mask, thereby defining the neighborhood. We note that, for each pixel of the input image, the pixel value in the convolution pattern of the q-level binary slice and the kernel is equal to the number of pixels in the neighborhood that are on the q-level and higher values.

Detail-enhancing algorithms are designed to increase local nonhomogeneities of intensity distribution of an input image. An increase of local contrast can be accomplished in a variety of ways. A simple and linear method is to enhance these pixels that differ from average pixel values calculated within its spatial neighborhoods S:

where the output pixel value Y(**k**) is given by the sum of a bias term equal to the average pixel value calculated within neighborhood S of the input pixel V(**k**) and another term that is a difference between the input pixel value and the before-mentioned average enhanced by a gain coefficient G. The above algorithm becomes nonlinear when the mean operation is replaced with the med operation; that is, the median value of the neighborhood S is considered as a reference.

The most general nonlinear rank-order unsharp masking algorithm (UM) is defined as follows^{32}:

where S[V(**k**)] is an arbitrary neighborhood of the **k**th element of an input image, *A* is the offset, G is the gain coefficient of input-image details that differ from the median value, and minus denotes a pointwise subtraction. Taking advantage of the threshold decomposition concept, we process in sequence binary slices (Xq(**k**)} of the input image rather than the gray-scale input image {V(**k**)} itself. For each **k**th input slice element a local convolution is made and the median value *med*{S[Xq(**k**)]} is calculated within the **k**th-element neighborhood, S, defined by the binary convolution kernel. The pointwise sum of all processed slices gives the output gray-scale image. Coefficients *A* and G are image dependent and are calculated as follows. Pointwise subtracting the median from the original, we find minimum (-*n*) and maximum (*m*) difference pixel values as well as the zero level (as a fraction f of the full range [-n, *m*]). Then the gain coefficient is calculated as G = 255/(m + n) and the offset *A* = 255f.

## 2.2

### Morphological Algorithms for Enhancement of Image Details

Morphological filters are composed of two basic operations: erosion and dilation. The erosion is defined as the locus of the center of the structuring element S when S is included in the binary slice X, such that in the extreme case it follows the border tangentially from inside. The dilation is defined as the locus of the center of the structuring element when S intersects X, such that in the extreme case it follows the border tangentially from outside. The simplest morphological filters are the opening and closing. The morphological opening for binary slices is defined as follows:

where the erosion ε_{S} of the image slice (Xq(**k**)} by the structuring element S is followed by the dilation δ_{S} of the looped eroded slice by the same kernel. The opening filters out bright details of an input image and is frequently used to remove salt elements of the two-sided impulsive noise. The opening of a gray-scale image γ_{S}{V(**k**)} is obtained by stacking of the processed binary slices.

The dual operation of opening is closing, defined as:

where dilation and erosion are applied in reverse order to that of opening. The closing filters out dark details of an input image and is frequently used to remove pepper elements of the two-sided impulsive noise. Here also the closing of a gray-scale image φ_{S}{V(**k**)} results from summing up the processed binary slices.

A simple example of processing is the morphological gradient of a gray-scale image, defined as:

where the minus sign represents a pointwise substraction of dilated and eroded images that we obtained by stacking the binary processed slices.

Other well known operations are the morphological white-top-hat algorithm (WTH), which enhances bright details, and the black-top-hat algorithm (BTH), which enhances dark features. WTH is defined as:

that is, as the difference (residue) between the input image and its opening. Analogously, BTH is given by the pointwise substraction of an original image from its closing:

For the purpose of comparison of results of rank-order and morphological methods for image detail enhancement we propose to combine white and black top hats. The aim is to unite bright and dark details obtained with top hats, as in the case of unsharp masking. The difference D{V(**k**)} between white and black top hats, which retrieves the original contrast of details, is defined as follows:

where *A* is a normalization constant and G is the gain coefficient of extracted details, both of which are calculated similarly as in the case of Eq. (3). Analogy between the unsharp masking and the difference of top-hats algorithms is straightforward. In the first one, bright and dark details that outlie from the local median values are properly increased by a factor of G and displayed on a bias level *A* calculated for the whole image. In the second one, bright and dark details are obtained from calculated differences between the input image and its morphological opening and closing, then are multiplied by the gain coefficient G, which depends on the dynamic range of the difference of top hats, and the details are displayed on a bias level *A* calculated for the whole image. Both algorithms are very good contrast detectors suitable for enhancement of bright and dark details that are smaller than or equal in size and shape to the structuring element used to modify the input image.

## 2.3

### Experimental systems

In our experiments, mainly two different optoelectronic morphological image processors have been used^{25,33,34}. Fig. 1 shows a block diagram describing the operation of both processors. An input gray-scale image is threshold decomposed into a stack of binary slices. Each slice is optically convolved with a binary structuring element, resulting in a stack of gray-scale convolution patterns. Then each convolution pattern is thresholded on the maximum (erosion) or the minimum (dilation) level. The resultant binary slices are added pointwise to form the output gray-scale image.

Fig. 2 shows the geometry of the first optical convolver employed in our experiments. Convolution is obtained by use of a plane of misfocus. The camera lens images a slice displayed on a spatial light modulator (SLM) onto the CCD camera. The system point-spread function becomes wider, and it can be shaped by the diaphragm. The SLM used was an Epson liquid-crystal with a resolution of 320×264 pixels. The resolution of the light sensitive CCD matrix in the camera is 765×581. The input and output images are stored in 256×256 matrices. Thus, while being processed the image was resampled three times. In a previous implementation^{24} a TV monitor was employed to display the binary slices.

To obtain the possibility of programming the point-spread function of the optical convolver it was necessary to change its design from the misfocus type convolver used in previous experiments^{24,25,33} to the shadow casting convolver design^{34,40}, because of the diffraction on the fine structure of the LCDs. In the misfocus type convolver a monochromatic point in the source produces in the true image plane of the source the Fourier transform of the kernel displayed on the pixelated LCD. Thus for a spatially incoherent illumination one gets the convolution of the input (the source) with the Fourier transform of the kernel. Broad band light would introduce additional chromatic dispersion. Thus in the plane of misfocus, which is close to the image plane of the input LCD, the effects of diffraction cannot be neglected. Influence of the periodical structure of the LCDs on the convolution result can be avoided by the use of conventional shadow casting architecture.

In the shadow casting convolver convolution is obtained by geometrical projection of the images. The scheme of shadow casting setup is shown in Fig. 3.

The convolved images are rescaled due to projection. The scale factor for the first object (LCD1) is equal to:

and for the second (LCD2):

An imaging system must be added to match the size of the convolution pattern to the size of the detector matrix. This introduces additional scale factor, equal for both objects.

Two considerations must be made:

First, that from the purely geometrical point of view, convolution is created in every plane along the axis of the setup. On the source side of the LCD2 it is, however, virtual convolution which can be seen by forming a real image of it. The use of a virtual convolution plane gives additional flexibility in choosing the values for scaling factors (value of A is negative in this case). Generalized shadow-casting convolver is based on this observation^{39}.

Second, that the amount of light emitted from diffusely illuminated point in the LCD1 plane is used only within the cone created by rays coming from that point to the edges of the LCD2. The smaller the distance *f* between LCD1 and LCD2, the bigger amount of light from the source is used. At the same time, the smaller the distance *f* the more divergent are the rays after LCD2. Therefore if the convolution is to be imaged without vignetting, the minimal entrance aperture of the imaging system increases as the distance between LCD1 and LCD2 decreases and the light efficiency decreases as the distance between LCD1 and LCD2 increases.

The final architecture of the optical setup is presented in Fig. 4. LCD1 and LCD2 are used as programmable spatial light modulators to form the input and the point spread function of the convolver. Lens L_{f} serves as a condenser correcting the light loss from the outer parts of the LCD1. Lenses L_{1} and L_{2} form a system which images demagnified convolution plane onto the light sensitive element of the CCD camera. The diagonal of the LCDs used is 31 mm. The diagonal of the light sensitive CCD matrix is 11 mm. Thus demagnification of about three times is required.

Therefore the distance between imaging lens and LCD2 is about four times bigger than the focal length. Taking into account that the distance between LCD1 and LCD2 cannot be large, we conclude that to image the convolution with this demagnification using one lens, an objective of F-number lower than one would be necessary. This is according to the fact that in order to preserve space invariance in the setup vignetting must be avoided. This difficulty can be overcome by the use of a system of objectives instead of a single lens. In the plane just behind LCD2 the lens L_{1} creating the image of LCD1 is placed. It makes the rays after LCD2 convergent. Both, LCD2 and the chosen convolution plane are close to that lens, so their virtual images are slightly shifted backward with small magnification. The lens L_{2} is the imaging lens which forms the final demagnified image of the convolution plane.

LCD1 is illuminated with narrow band thermal light. With this illumination the contrast ratio is increased with respect to that obtained with white light^{41}, and in addition the broad-band diffractional effects are reduced.

Under these conditions the resolution cells in both involved LCDs can be calculated by means of the modulus of the complex coherence function. Using the Van Cittert - Zernike theorem and assuming a pixel resolution in LCD2, the resolution cell in the LCD1 can be calculated. For a distance between the LCDs of 180 mm, a mean wavelength of 600 nm, and a pixel size of 80 µm, the size of the resolution cell in LCD1 equals 0.68 mm, which correspond to approximately 8 pixels. With the above mentioned kernel size of 120 pixels, a 15 *x* 15 kernel may be achieved without any loss in resolution of the system due to diffraction.

These calculations are made assuming continuous images in both LCDs. In fact both images have a pixelated structure. As the resolution cell in LCD1 is much larger than the pixel size, the effect of periodicity can be neglected. On the contrary, it must be taken into account in LCD2. For a single pixel illumination from LCD 1 the structure of LCD2 acts as a diffraction grating. The angular separation of the diffracted beams can be calculated from the period of the grating. In a first approximation the main effect in the projection will be to produce a set of replicas of the object in the panel with a separation depending on the period of LCD2 and the distance between LCD2 and the convolution

plane. For 80 µm period the distance which produces a shift of the first orders by 1 pixel is about 21 mm. In our experimental setup the chosen distance of 12 mm produces a separation of about 1/2 pixel. Consequently, the effect of the pixelated structure of both LCDs is, in practice, secondary.

## 2.4

### Experimental results

The performance of the processors is demonstrated on the input image of 256×256 pixels and 16 gray-levels shown in Fig. 5(a). It should be noticed the rich structure of the image.

Fig. 5(b) presents the result of a digital median filter with a binary kernel of 5×5 pixels. Fig. 5(c) shows the output of an optical median filter with a flat structuring element of the same size. Visual examination of both results confirms good performance of the optoelectronic processors.

For the purpose of quantitative comparison we use the mean absolute error (MAE) as a measure of similarity. The MAE, which is frequently used in filter optimization problems, is defined as follows:

where subscripts d and op indicate digitally and optically calculated medians. The absolute value of the difference (i.e., error) between optical and digital results is summed over the whole image matrix, normalized to the 256 gray-levels score, and divided by the total number of pixels. Analogously, MAE can be defined for morphological operations and filters used in our experiment.

The results of digital and optical median filtering have been used to calculate unsharp masking. In Fig. 5(d) and 5(e) the results of digital and optical unsharp masking are presented, respectively. Obviously, they preserve the similarity of digital and optical results of median filtration.

Fig. 5(f) shows the result of morphological opening by use of the same 5×5 pixels binary kernel with no corner pixels.

Fig. 5(g) shows the morphological gradient calculated in the optoelectronic processor by use of a square 3×3 pixels kernel, while Fig. 5(h) corresponds to the morphological gradient digitally calculated for the same input image and a kernel of the same size and shape. The results are quite similar.

Fig. 5(i) and 5(j) show black and white top hats calculated optically according to Eqs. (8) and (7), respectively. We note that both results of morphological processing are satisfactory. The wide presence of a black background confirms the good quality of optically calculated opening and closing.

Fig. 5(k) presents the result of the difference between optical white and black top hats calculated according to Eq. (9). The normalization constant A and the gain coefficient G are calculated in exactly the same way as in the case of the unsharp masking algorithm. We note that the experimental morphological result that combines bright and dark details of the image is easier to examine visually than the results of a rank-order unsharp masking algorithm calculated either digitally or optically. It is a consequence of the opening, closing, and median filter definitions that the combination of white and black top hats has a broader histogram than unsharp masking. Thus in the case of the morphological difference of top hats the dynamic range of the output image is more evenly employed than in the unsharp masking case.

In order to demonstrate the possibility of programming the kernel of convolution (the shape of the neighborhood in which the filters are calculated) experiments using a set of oriented kernels are performed. The kernel is a 7 pixel long line segment at 4 different orientations. Fig. 6(a) shows the image used in these experiments. It exhibits marked directional features, which would be removed by most conventional filters. This image has been corrupted with 16 % salt and pepper noise, as shown in Fig. 6(b).

The result of max/median filtering of the image from Fig. 6(b) is shown in Fig. 6(c). To prove the good edge and line performance of this operation, edge extraction by taking the difference of max/median and min/median is demonstrated in the presence of the same noise. Results are shown in Fig. 6(d). The noise from the image is completely removed whereas the line and edge information is maintained.

The ideal processing makes the threshold decomposition in all of the gray-levels of the image. Nevertheless a reduction in the number of binary slices permits to reduce the processing time and therefore to increase the speed of the opto-electronic processor. How to select the threshold levels can be an important task. As an example, we present the effects of the quantization on the thresholding levels in a test image with 256 gray-levels. We studied the result of rank-order and morphological operations using Q=256, 128, 64, 32, 16, 8, 4 and 2 uniform threshold levels. As a measure of similarity between the ideal filtered image (Q=256) and that filtered with another threshold levels, the mean absolute error (MAE) was used. Fig. 7(a) shows the evolution of MAE as a function of the number of threshold levels, when erosion operation is considered. The study was extended to other operations as dilation, median, opening and closing.

Later on, the processing is repeated with a non uniform selection of threshold levels. That selection is performed from the cumulative histogram of the image. Fig. 7(b) shows the cumulative histogram of the original image and the ideal eroded image (Q=256). If the selection of threshold levels is made taking into account that ideal eroded image, we arrive to results that improve the MAE with respect to the same number of levels selected with an uniform thresholding. Depending on the original image and of the operation to be performed, the use of uniform or non uniform thresholding could be quite significative.

With regard to nonuniform threshold decomposition, the key task is how to choose the threshold gray-levels. Intuitively, it is clear that a higher number of quantization levels must be assigned to the regions of the histogram where its population is higher (i.e., in principle, the choice of the quantization levels must be proportional to the value of the histogram function). In image processing, the histogram equalization is a standard technique with the same purposes. Performing this operation provides a histogram with the same population in every gray-level. The usual way is to take equal intervals in the Y axis (number of points) of the cumulative histogram, which results in a nonuniform quantization in the X axis (gray-levels). Thus, the slices are closer each other in the regions corresponding to a sharper cumulative histogram.

One is tempted to use this principle to reduce the quantization error. Nevertheless, the result would be sub-optimal. In the following, we will obtain the optimal nonuniform quantization levels for a given histogram distribution.

The standard approach to minimize the loss of information in the image quantization is to define a distorsion parameter *D*^{42}, which would take into account the gray-level difference between each pixel of the original image and its quantized version. Using a generic metric *W* to measure the elemental distortions, the parameter *D* can be written as

where Q is the number of quantization levels, *q* is a gray-level from the original image, *q*^{i} is the *i*-th quantization level, *R*_{i} is the quantization region corresponding to the *i*-th level, *p*(*q*) is the histogram function, and *d*(*q, q*^{i}) is the error criterion taken as distortion measuring in the quantization process. Therefore, if we want to minimize the MSE between the original image and its quantized version, *d*(*q, q*^{i}) would be equal to (*q-q*^{i})^{2}; on the other hand, if we want to minimize the MAE, *d*(*q, q*^{i}) would be equal to ∣*q-q*^{i}∣.

Using the asymptotic results of Gish et al.^{43} and Yamada et al.^{44} for general metrics, the Eq. (13) can be simplified to the Bennett distortion integral^{45}. In the event of minimizing the MSE, this integral can be written as

where *λ* (*q*) is the distribution of the quantization gray-levels; in the event of minimizing the MAE, the integral is^{46}

The optimal gray-level quantization *λ* (*q*) with the proposed metric *W*(*q*) can be obtained, in any case, using the Hölder inequality in the standard way^{47}. The results for the MSE and the MAE criterions are

We can see here why the intuitive idea of taking the quantization levels proportional to the value of the histogram function in order to reduce the quantization error is wrong. The optimal gray-level quantization *λ* (*q*) which minimizes the MSE criterion must be proportional to the cubic root of the histogram function. In the case of the MAE criterion, the optimal *λ* (*q*) which minimizes the MAE must be proportional to the square root of the histogram function.

We have proved this theory digitally with the input image show in Fig. 8(a). Fig. 8(b), (c) and (d) show the result of quantizing this input image with 16 gray-levels selected proportionaly to the histogram function, the square root of the histogram function and the cubic root of the histogram function, respectively. Fig. 8(c) is the one which minimizes the MAE with regard to Fig. 8(a), and Fig. 8(d) is the one which minimizes the MSE with regard to Fig. 8(a).

Table 1 shows the results for the MAE and the MSE when comparing the input image with every quantized version.

## Table 1:

Results for the MAE and the MSE when comparing the images shown in Fig. (8).

MAE | MSE | |
---|---|---|

Measured distortion between Fig. 8(a) and Fig. 8(b) | 10204 | 75912 |

Measured distortion between Fig. 8(a) and Fig. 8(c) | 9539 | 51821 |

Measured distortion between Fig. 8(a) and Fig. 8(d) | 9637 | 50539 |

It can be seen the MAE minimization in the event of taking the quantization levels proportionally to the square root of the histogram, and the MSE minimization in the event of taking the quantization levels proportionally to the cubic root of the histogram.

## 3.

## MORPHOLOGICAL CORRELATION

As stated above, Maragos in Ref. [30] defined a nonlinear correlation named the morphological correlation (MC) that is optimal when using the MAE criterion for the localization of a reference object in an input image. As in linear correlation, the location of the reference object in the observed image corresponds to the location of a maximum value of the intensity in the morphological correlation plane.

Let us consider two real-valued two-dimensional signals represented by f(ξ, η) and g(ξ, η), where (ξ, η) ϵ Z^{2}. For the sake of clarity we consider that both f and g are defined in the discrete domain, although the formal definitions can be done in the continuous domain. Assume that f is a pattern to be detected in g. For finding which shifted version of f best matches g, a standard approach has been to search for the shift lag (x, y) that minimizes the MSE,

over some subset *W* of *Z*^{2}. According to Ref. [30] and under certain assumptions, this matching criterion is equivalent to maximizing the linear cross correlation between g and f:

Other error criteria are more robust when the signal to be detected is corrupted with noise patterns that are different from additive Gaussian noise^{30}. One such criterion is the MAE, defined as

As in Ref. [30] and under certain assumptions, minimizing the MAE is equivalent to maximizing the non-linear cross correlation:

which corresponds to the definition of the MC given by Maragos.

Analogously to morphological and rank-order operations, MC implementation can be performed by use of the threshold-decomposition concept^{5}. The threshold decomposition of the quantized gray-level image *f*(*x, y*) is defined as

where

Now, for any (x_{1}, y_{1},) and (x_{2}, y_{2}),

because g_{q} and f_{q} are binary signals.

Then the MC between g and f, µ_{gf} (x, y), can be expressed by use of the threshold decomposition of g and f. It turns out to be the sum over all amplitudes of the linear correlations between thresholded versions of g and f at every gray-level value q^{31}:

where Q is the number of gray levels of the images, is the linear cross correlation between the qth binary slices f_{q} and g_{q} and the * denotes the linear correlation operation, which can be realized optically. To obtain Eq. (24) it has been taken into account that, for binary qth thresholded input signals, the MC and the linear correlation coincide [see Eqs. (19) and (21)] because the minimum of two binary numbers is equal to their product. Thus the result expressed by Eq. (24) together with the ability of optical systems to perform linear correlations and the computational cost of the digital calculation of Eq. (21) encouraged us to implement the MC optically^{48}.

## 3.1

### Experimental system

The first step is to obtain the optical correlations between the different thresholded images g_{q} and f_{q}. This can be accomplished in a programmable morphological processor^{25,34} or in a conventional VanderLugt correlator^{49}. However, in these processors the optical intensity output is obtained instead of the amplitude-correlation output needed in Eq. (24). This drawback, as is explained below, can be overcome by use of a joint transform correlator (JTC) scheme^{50}.

For obtaining the MC, the sum of Eq. (24) is performed on the joint power spectrum (JPS), which is the actual intensity output detected in the intermediate step of the JTC. So the amplitude-addition requirement in Eq. (24) implies that the JTC architecture is essential for obtaining the MC optically. Note that the phase is coded as intensity in the JPS; hence no phase information has been lost. A final Fourier transfrom will provide the MC.

So, as mentioned above, the first step in obtaining the optical MC is the calculation of each binary slice joint power spectrum, JPSq. The obtained distributions are added as

Comparison with Eq. (24) shows that the Fourier transform of the third term gives the MC between the scene g(x, *y*) and the reference *f*(*x, y*) object.

In our implementation we use a JTC, as shown in Fig. 9. The thresholded versions of the input and the reference are displayed side by side in an Epson liquid-crystal SLM. The JPSq of each binary slice is recorded with a CCD camera and stored. The JPS_{∑} is obtained as a summation [Eq. (25)] and readdressed to the SLM by a frame grabber. The final MC is achieved at the joint transform output plane. The generation of threshold components and the addition operation are done electronically.

With regard to the speed of the process, the bottleneck in the system is the video rate of the SLM (25 frames/s in our case), which is used to display both the binary slices and the addition of the JPS, which requires a two-cycle JTC architecture. The speed of the system can be significantly increased if it is splitted in two subsystems, each composed of a SLM, a Fourier transforming lens, and a CCD camera. In the first subsystem the input is the threshold decomposition of the joint input scene. The addition of the JPS obtained is fed into the second subsystem to produce the final MC.

## 3.2

### Experimental results

For demonstrating the advantages of the MC for image detection, optical experiments have been performed by use of the above-described JTC. We compare the linear correlation and the nonlinear MC in our optical experiments. In the SLM a joint input image composed of the input scene in the upper half and the reference object in the lower half is displayed. The input scene contains two objects on a dark background. A replica of the object to be recognized (target), used as the reference, is placed below the corresponding input scene. The objects in the joint input image have 16 gray-level values, and the outer shapes are identical. This implies that we are adding the JPS’s of 16 binary input scenes for performing the MC. Moreover, the saturation effect of the camera induces the enhancement of the high-frequency components. However, in spite of the bias building (dc plus low-spatial-frequency content) carried with each joint transform pattern, their addition, as performed electronically, does not involve any drawback in the recording of all the frequency components. We simply add the JPS’s and rescale the sum JPS_{£} before displaying it in the SLM for the next step.

The most interesting property of MC is its ability to recognize low-intensity images in presence of high-intensity patters to be rejected. In Fig. 10, target C is the lowintensity object to be detected against a high-intensity one, D.

Since conventional correlation is proportional to intensity and sensitive to shape variations that do not exist in this case, any linear filtering will produce a higher peak for the brighter object and so will not be able to distinguish the dark object in the presence of the bright one. On the other hand, MC, owing to its nonlinearity, pro-vides a higher peak for the dark object, so correct detection is obtained. The linear correlation and the MC peaks for this case are shown in Fig. 11. It is clear that linear correlation detects the brightest object, thus producing a false alarm [Fig. 11(a)]. With the MC a threshold lower than 50% is enough to reject the high-intensity object [Fig. 11(b)].

## 3.3

### Rotation-invariant morphological correlation

The MC does not provide detection with rotation invariance. To achieve this, we have proposed^{51} a modification of this nonlinear correlation and introduced an enhanced morphological correlation that we call rotation invariant morphological correlation (RIMC). In addition to a high discrimination capability for detecting images with high resemblance, it also offers a robust rotation invariant detection for low-intensity images in the presence of high intensity patterns to be rejected.

As in rotation invariant linear correlation, the technique is based on the expansion of a function expressed in polar coordinates into circular harmonics (CH)^{52,53}. The expansion of a function *f*(*r, θ*) can be written as

where

*m* is the order of the CH component (CHC). The circular harmonic filter (CHF) introduced by Hsu et al.^{52,53} provides full rotation invariance in addition to shift invariance. The detection of a target in a scene is achieved by camping out the correlation between a scene *g*(*r, θ*) and a CHC, *f*_{m}(*r, θ*)

where * denotes linear correlation.

Because the MC yields better pattern recognition performance than the linear correlation, and because it is suitable for optical implementation, we will use it to recognize objects with different orientations. In Eq. (24), assume that each *q*-binary slice of the reference object, *f*_{q}(*r, θ*), can be decomposed into circular harmonic functions

Rotation invariant pattern recognition can be achieved by using only one component of the CH decomposition and then performing the linear correlation. If only one CHC is used for each *q*-correlation in Eq. (29), the nonlinear correlation will allow the detection of a target for any angular orientation. This means that other nonlinear correlations based on the morphological correlation could be devised. So taking into account the previous argument and using only one *m*-order CHC in Eq. (29), we call this correlation the RIMC defined as

The correlation, is also a nonlinear correlation, inspired by the MC. The difference between (RIMC) and *µ*_{gf}(*r,θ*) (MC) is that for the RIMC only part of the thresholded reference object *f*_{q}(*x,y*) is used, instead of the total information of *f*_{q}(*x,y*) that is used for the MC.

## 3.4

### Simulation Results

In this section we present some computer simulations to compare the linear correlation and the morphological correlation and its applications to rotation invariant pattern recognition.

Figure 12 shows an input scene that is made up of two different kinds of objects: the reference object is the dark one, and the bright lower one is an object to be rejected. In order to perform pattern recognition with rotation invariance, the object to be detected appears with two orientations rotated each other by 90°.

The choice of the order and of the expansion center is important for both the CHC in the linear case and for the RIMC case. In Eq. (28), the function *f*_{m}(*x,y*) takes into account the choice of the CH order, and of the expansion center as discussed elsewhere^{54,55}. In the latter paper, both the energy and PCE maps of the gray-level reference object are calculated and the maximum of the PCE map that also has a high value in the energy map is selected. This ensures both a sharp correlation peak shape and a high correlation energy value.

For the RIMC, the CHCs of the binary *q*-th slices of the reference object versions are required. We chose the same expansion order as for the linear case, because that parameter is strongly dependent on the geometry of the reference object and also the shape of each binary slice depends strongly on the shape of the reference object. For the expansion center, we assumed that all of the expansion centers occur around the center of mass of the reference object, because the energy distribution is strongly related to that mass center. So, to perform the RIMC we select the center of mass of the reference object as the expansion center for all the *q-t*h binary slice CHCs.

The linear correlation for an *m*=4 CHF is shown in Fig. 13. Note that this operation is not able to detect the reference object and a false alarm appears. On the other hand, if we use the same input scene with the RIMC, we obtain the correlation output shown in Fig. 14. We used the *m*=4 CHC order for each binary slice. Now we detect both the rotated version of the reference object and a threshold value of 50% is enough to reject the false alarm.

These computer simulations demonstrate the improved performance of the RIMC over the CHC linear correlation.

## 3.5

### Modified morphological correlation

The optoelectronic implementation of MC needs a considerable amount of computational efforts. Some methods can be used in order to reduce the number of correlated slices obtained after either uniform or nonuniform threshold decomposition, so reducing that computational time.

In a recent paper^{56} we proposed an alternative procedure for increasing the selectivity and decreasing the computational requirements of the optoelectronic implementation of the MC. This method which provides a modified version of the MC, the modified morphological correlation (MMC), is not based on threshold decomposition but on bit-representation decomposition. The image is decomposed into a set of binary slices, each corresponding to a specific bit in the binary representation of the image pixels. Thus, an image with 256 gray-levels is decomposed into only 8 binary slices. By applying a linear correlation between the binary slice of the reference-image set and the associated slice from the input-scene binary set, one obtains

where m is the number of binary slices.

The optical implementation can be done with a JTC too. In order to show the performance of the MMC, let us consider Fig. 15, which is the input to the JTC. Fig 15(a) shows the entrance scene and Fig. 15(b) corresponds to the reference target to be detected. The objects show only 16 gray levels, so we work with 4 bit planes. Fig. 16 shows the decomposition of the joint input scene into the 4 bit maps.

In Fig. 17 we have the detection of the target using the MMC. In Fig. 17(a) the captured output correlation plane is presented, while in Fig. 17(b) a profile of the line passing through the two correlation peaks is shown. A thresholded of 24 % is enough to reject the undesired peak. This threshold is lower to that corresponding to MC, close to 45 % in this case.

## 4.

## FINAL REMARKS

It has been demonstrated the feasibility of optoelectronic image processing with a variety of systems, including shadow casting correlators and joint transform correlators. These systems have been experimentally applied to image processing (detail enhancement, median filtering) and to pattern recognition (bit plane and morphological correlation).

Despite of the achievements in the field, several issues remain unexplored. In the next future the effect of noise is to be analyzed. From the theory it has been established that the nonlinear correlations will perform better than classical, but the extent of the enhancement is to be checked. Also there are promising lines in cases where the chromatic information is relevant, and multichannel decompositions are being checked.

Finally, there is a trend to enhance the performance in the systems, allowing higher resolution images and faster processing. This depends on the availability of faster optoelectronic hardware, mainly SLM’s.

## ACKNOWLEDGMENTS

We would like to thank Dr. Thomas Szoplik for his joint work and introduction to the morphological image analysis. We also acknowledge the financial support of Spanish DGES (Dirección General de Enseñanza Superior), project PB96-1134-C02-02. José J. Esteve-Taboada acknowledges a grant from the Generalitat Valenciana.

Part of the text and figures have been reproduced from articles by the authors published in O.S.A. journals and in Optics Communications. Both Optical Society of America and Elsevier Science B.V. have been asked for permission of reproducing.