## 1.

## Introduction

The introduction of optical correlators [1] has stimulated the study of pattern recognition filters based on correlation [2] [3]. Linear filters, such as the matched filter, have been designed to detect a target in the presence of additive noise. However, there exists a different type of noise which is inherent to image processing. It appears as soon as the problem of locating a target appearing on a random background is addressed. In such cases, the background itself must be regarded as non-overlapping noise [4] [5]. In the presence of such noise, it has been shown that linear filters can fail in locating the target [6].

Furthermore, the efficiency of correlation based techniques can drastically decrease if the object to be detected, located or recognized becomes different from the reference used in the correlation operation. This occurs for example in target tracking applications where the target’s attitude in the scene varies or when recognition has to be performed with a large amount of invariance capabilities. A classical solution to this issue has been to design composite filters [7] which allow one to store different attitudes of the target in a single filter.

Recently, algorithms optimal in the Maximum Likelihood (ML) sense [8] for location of an object embedded in non-overlapping noise have been proposed [9, 10] and a unified method has also been designed [11]. In these approaches, the input image is considered to be composed of two independent random fields and the corresponding methods are thus denominated Statistically Independent Region (SIR) methods. A new technique, based on the SIR model, and which allows one to segment an object in an input image has also been recently proposed [12] [13]. This technique is complementary to the correlation methods and is analogous to recently proposed approaches of active contours (snakes) [14] [15] [16]. However, our proposed approach presents clear optimal properties in the context of statistical estimation theory.

In this paper we propose a unified approach for the SIR models which we have presented in the past. We will thus enlarge the field of applications of these techniques in two directions. Firstly, we will be able to consider a large number of input noise statistics which correspond to different physical situations. Secondly, we will analyze these models in the general context of the estimation theory. This general approach will enable us to include such applications as detection, recognition, location, tracking, estimation of unknown parameters (for example the orientation of the object) and shape estimation or, in other words, segmentation.

We will also show that this approach can enlarge the field of application of optoelectronical correlators. As a matter of fact, SIR-based techniques consist of a preprocessing of the analyzed image followed by correlations with binary masks. A simple optoelectronical architecture could thus perform detection, tracking, estimation and segmentation with the same hardware, thus achieving efficient target tracking or recognition.

This paper is organized as follows. In Section 2, we present the mathematical SIR model and study its general solutions in the framework of the statistical theory of estimation for probability density functions (pdf) which belongs to the exponential family. In Section 3, we analyze the optimal solutions when the unknown parameters are estimated in the Maximum Likelihood (ML) sense for location applications. In Section 4, we discuss estimation problems and more particularly segmentation applications. Finally, in Section 5, we illustrate on synthetic and real-world images the efficiency of the proposed algorithms for location, segmentation and tracking applications.

## 2

## The SIR model

## 2.1

### Image model

The SIR model is a probabilistic framework to determine algorithms for detection, recognition, location, parameter estimation or segmentation of an object in an image. We assume that the observed scene is composed of two zones: the target and the background. Furthermore, the target’s and the background’s gray level are supposed to be unknown and we model their values as independent random variables.

In the following mathematical developments, one-dimensional notations are used for simplicity, and bold font symbols will denote N-dimensional vectors. For example s = {*s*_{i} ∣*i* ∈ [1, *N*]} denotes the input image composed of N pixels. For each considered case, the purpose of the image processing algorithm is to estimate an unknown parameter which will be denoted symbolically *θ*. For example, in detection applications, *θ* is a binary value, for recognition (or more precisely discrimination), it is a value belonging to a discrete set, and for location, it is the position of some characteristic points of the object (for example the center of gravity). For orientation estimation, *θ* is a set of possible angles. For segmentation purpose, *θ* is the shape of the object. In the latter case, if the shape is approximated by a polygonal contour, *θ* is the set of coordinates of the nodes of the polygon (see table 1).

## Table 1:

Examples of application and nature of the parameter θ.

Application | nature of θ |
---|---|

Detection | 0 or 1 |

Discrimination | discrete set |

Location | (x,y) |

Attitude estimation | angles |

Segmentation | Node coordinates |

Let denote a binary window function that defines a certain location, orientation or shape for the target, so that is equal to one within the target and to zero elsewhere. Note that in the following, we will use the same notation **w**^{θ} for the previously defined binary function and for the set of pixels for which this function is 1. Let us consider the different hypotheses *H*_{θ} that consist in assessing a binary window **w**^{θ} to the target in the input image s, so that we can write:

where the target’s gray levels a and the background noise **b** are random variables. These random variables are characterized by their respective pdfs and where *μ*_{a} and *μ*_{b}, are the parameters of the pdfs which will be considered as a priori unknown. These parameters can be scalars or vectors if more than one scalar parameter is needed to determine the pdf.

Equation 1 with the pdfs and is the image model. The parameter of interest is *θ* while the parameters *μ*_{a} and *μ*_{b} are nuisance parameters. We will consider the maximum a posteriori (MAP) estimation of the parameter *θ.* The optimal estimate is thus obtained by maximizing the conditional probability *P*[*H*_{θ}∣s]. This conditional probability can be obtained by using Bayes law [8]:

Considering that all hypotheses *H*_{θ} are equiprobable, the MAP estimation of *θ* is equivalent to the Maximum likelihood estimation obtained by maximizing *P*[s∣*H*_{θ}]. In the following, we will analyze the ML estimation since generalization to the MAP estimate is obvious using Eq. 2. With the image model of Eq. 1, the likelihood is:

where we have explicitly denoted the dependence on the unknown parameters *μ*_{a} and *μ*_{b}*.* Please remember that we use the same notation **w**^{θ} (or **w̄**^{θ}) for binary support functions and for the set of pixels for which the value of these functions is 1.

The question is now how to deal with the nuisance parameters *μ*_{a} and *μ*_{b} in order to express the likelihood as a function of the only parameter of interest *θ* and of the input image s. There exists several methods to deal with nuisance parameters and the three most frequently used are the Maximum Likelihood (ML) estimation, the Maximum A Posteriori (MAP) estimation and the marginal Bayesian approach. With the marginal Bayesian and the MAP approaches, the nuisance parameters are considered as random variables and prior density probability functions have to be chosen [8, 17].

Let *π*_{a}(*μ*_{a}) and *π*_{b}(*μ*_{b}) denote these priors, the marginal Bayesian approach is simple from a theoretical point of view and is based on the Bayes relation:

where a symbolic notation has been used for the multidimensional integrals:

and

if *μ*_{a} and *μ*_{b} are *n* dimensional parameters. With Eq. 4 the likelihood *P*[s∣*H*_{θ}] is obtained and the problem is solved from a theoretical point of view.

With the MAP approach, instead of eliminating the nuisance parameters as with the marginal Bayesian approach, one considers estimates of their values. If we are not interested in the nuisance parameter’s values, this approach is suboptimal (see [18] for a discussion in an analogous situation). Nevertheless, this method can be of interest from a practical point of view, since it can be easier to determine the MAP estimates of the nuisance parameters than integrating as in Eq.4. The MAP estimates of the nuisance parameters are the values which maximize *P*[s, *μ*_{α}*, μ*_{b}∣ *H*_{θ}]*.* They are obtained by the following equation:

where *argmax* (*Z*) is the value of the parameters *y* which maximizes *Z.* The estimate of can be obtained by maximizing the pseudo-likelihood:

It is worth noting that since *µ*_{a}^{MAP}[s] and *µ*_{b}^{MAP}[**s**] are functions of s, the considered criterion *P* [s, *μ*_{α}^{MAP}[s], *μ*_{b}^{MAP}*[s]*∣*H*_{θ}] with the MAP approach is not a likelihood, as it is the case with the marginal Bayesian approach (see Eq.4).

The ML method is analogous to the MAP approach from a technical point of view but is not based on the modelization of the nuisance parameters as random variables. The important consequence is that no prior has to be introduced as for the marginal Bayesian and the MAP approaches. The ML estimates are given by:

The estimation of *θ* is obtained by maximizing the pseudo-likelihood:

One can note that the ML approach is analogous to the MAP approach if a uniform (also denominated non informative) prior for the nuisance parameters is considered. The ML approach is simpler since no prior pdfs are needed (although the obtained solution can be unstable [19]) and in the following we will mainly discuss results for the ML solutions.

## 2.2

### The exponential family

Members of the exponential family include the Bernoulli, Gamma, Gaussian, Poisson, Rayleigh and many other familiar statistical distributions [20]. These distributions can be used to describe realistic situations. The case of binary images is simple to handle since the probability law for the gray levels can be described with a Bernoulli pdf. It is also well known that at low photon levels, the noise present in images is described by Poisson pdf (due to the discrete nature of the events, which are the arrivals of photons on the sensor). This situation occurs for example, in astronomical imagery when the exposure time of the sensor is short. Synthetic Aperture Radar (SAR) intensity images are corrupted by a multiplicative noise, also known as speckle [21], which can be described with Gamma pdf. This issue has been widely studied over the past years and it is now well known that in order to obtain efficient algorithms, the statistical properties of the speckle have to be taken into account in the design of image processing algorithms (see [22], [23] and references therein). Ultrasonic medical images correspond to amplitude detection of the incident acoustic field and the speckle noise can be described with a Rayleigh pdf [24]. Finally, we will also discuss the case of optronic images and the relevance of normal laws when a whitening preprocessing is used [25].

Probability density functions (pdf) which belong to the exponential family are defined by [20]:

where *μ =* [*μ*_{1}*,μ*_{2},…,*μ*_{n}]^{Τ} is the vector of parameters of the pdf, *κ*(*x*) is a scalar function of *x* while *α*(*μ*) and **f** (*x*) are p-component vector functions of respectively *μ* and *x.* We summarize in table 2 the pdfs of the exponential family which will be discussed in the following as well as the parameters which will be considered unknown.

## Table 2:

pdf of the considered laws of the exponential family and their corresponding parameters. δ(x) is the Dirac distribution, N is the set of integers and n! = n(n − 1)..2.1.

Law | pdf : P(x) | Parameters : μu |
---|---|---|

Bernoulli | pδ(x) + (1 − p)δ(1 − x) | p |

Gamma | p | |

Gaussian | m, σ | |

Poisson | p | |

Rayleigh | p |

These pdf possess simple sufficient statistics [20]. Let us consider a sample *χ*_{u} of *n*_{u} random variables distributed with a pdf . A sufficient statistic Τ[*χ*_{u}] for *μ*_{u} is a function of the sample *χ*_{u} that contains all the information relevant to estimating the parameter *μ*_{u} in the ML sense. If the pdf belongs to the exponential family, the likelihood is:

The ML estimate of *μ*_{u} is thus:

which can also be written:

with:

which clearly defines the sufficient statistics of the exponential family. In table 3 we provide the sufficient statistics for the pdfs of the exponential family which will be discussed in the following. For that purpose, let us define *S*_{u} the set of *n*_{u} random variables from which the parameters are inferred (i.e. the set of pixels from which the unknown parameters are estimated). In particular *n*_{a} is the number of pixels in **w**^{θ} and *n*_{b} is the pixel number in the background region **w̄**^{θ}.

## Table 3:

Mathematical expressions of the sufficient statistics for the parameters defined in Table 2.

Law | Parameters : | Sufficient statistics:T[χu] |
---|---|---|

Bernoulli | p = T1/nu | |

Gamma | p = T1/nu | |

Gaussian | m = T1/num2 + σ2 = T2/nu | |

Poisson | p = T1/nu | |

Rayleigh | p = T2/nu |

For the image processing problems we consider, the likelihood is a function of **w**^{θ}*.* Let us denote *L*(s,**w**^{θ}) the likelihood obtained with the marginal Bayesian approach or the pseudo likelihood obtained with the MAP or the ML approach, and ℓ(s, **w**^{θ}) its logarithm. It is easy to show the following property.

Property:

*Whatever the adopted approach to deal with the nuisance parameters, the loglikelihood of an hypothesis H*_{θ} *is:*

*with:*

where . Functions *F*_{a} and *F*_{b} depend on the considered pdf and on the prior on the nuisance parameters for the marginal Bayesian and MAP approaches. They are equal in the case of a ML estimation of the nuisance parameters.

It is clear that the last term *G*(**s**) is independent of the hypotheses *H*_{θ} if M̄^{θ} is the image to be analyzed or a subwindow in this image which is chosen independently of *H*_{θ}*.* We will see in the following that the estimation of the likelihood of *H*_{θ} is a function of the input image s through the determination of the sufficient statistics. In table 4, the expressions of the varying part of the loglikelihood defined in Eq.14 are provided when the nuisance parameters are estimated with the ML method. We propose in the next sections to illustrate these concepts with different kinds of applications.

## Table 4:

Mathematical expressions which define the varying part of the loglikelihod (see Eq. 14) in terms of the sufficient statistics defined in Table 3. ln(z) is the neperian logarithm.

Law | Fu(z) | z |
---|---|---|

Bernoulli | z ln[z] + (1 − z) ln[1 − z] | z = T1/nu |

Gamma | ln[z] | z = T1/nu |

Gaussian | ln[z] | z = T2/nu − [T1/nu]2 |

Poisson | −z ln[z] | z = T1/nu |

Rayleigh | ln[z] | z = T2/nu |

## 3

## Application to object location

## 3.1

### Introduction and limitations of the ML approach

In order to perform the important task of detecting and locating a target appearing on a random background, a pattern recognition system must discriminate between the background and the target. The background can thus be considered as noise. This noise is not additive, since it does not affect the target: it is said to be non-overlapping. Classical linear filters have been shown to often fail in presence of such noise [4], and an explanation of this phenomenon has been presented in [6]. Different techniques [26] [9] have been proposed in the past for detection and location of a target with a known internal structure and an unknown uniform illumination in presence of non-overlapping background noise. When the target’s gray levels are unknown it is necessary to introduce different approaches [10, 27, 25, 28, 11]. Such a situation can happen when the target is subject to sun reflections in optical images, when temperature changes in infrared images or when only a shape model is available for the location of the target in the input image. In this case, with the proposed solutions [10, 27], the pixel values of both the target and the background have been modeled as random variables with Gaussian pdfs but with different parameters. The only *a priori* knowledge is thus the silhouette of the target, which defines the frontier between the target and the background. These models have been recently generalized to Gamma pdfs [27] and to binary images [29]. We discuss in the following the general solution for the exponential family which includes the previous cases as particular cases.

With the SIR approach the input image model is:

a and b represent the gray levels of, respectively, the target and the background zone. The unknown parameter *θ* is now simply the position of the object in the scene. The ML solution for the estimation of the location *θ* can be written (see Eq. 14):

Here again, the functions *F*_{a} and *F*_{b} depend on the considered pdf and on the prior on the nuisance parameters for the marginal Bayesian and MAP approaches but are equal in the case of the ML estimation of these parameters. The mathematical equations for Bernoulli, Gaussian, Gamma, Poisson and Rayleigh pdfs can be easily obtained from tables 2, 3 and 4.

The main practical problem with the SIR models is that the input image is assumed to be composed of two homogeneous random fields *a*_{i} and *b*_{i}*.* Furthermore, the random variables are assumed to be independently distributed. These conditions may not be fulfilled in real-world images. We discuss in the following two techniques in order to overcome these limitations.

## 3.2

### The maximum likelihood ratio test (MLRT) approach

In the SIR image model, the background region **w̄**^{θ}, that is, the whole image but the target, is considered to have homogeneous statistics. This is often a non-realistic assumption since real-world backgrounds are in general better modeled with several zones having different average values. In order to overcome this problem, we will estimate the statistics in a small subwindow M^{θ} centered on the assumed target location *θ* (see Fig. 1). Indeed, if we consider a sufficiently small subwindow, the hypothesis that the background is homogeneous becomes a better approximation.

In order to better understand the method we propose, let us temporarily set aside the object location problem and let us consider the simpler problem of object detection. It consists in determining if there is an object of shape **w** in the center of the sub-image M^{θ} or not. More precisely, we want to discriminate between the two following hypotheses:

• Hypothesis

*H*_{0}*:*the window contains only background noise**b**, so that• Hypothesis

*H*_{1}*:*the target is present in the center of the window M^{θ}, so thatNote that in this section,

**w̄**^{θ}will denote the part of the complementary of**w**^{θ}belonging to M^{θ}. In other words, M^{θ}=**w**^{θ}+**w̄**^{θ}.A very classical method for determining the best choice between these two hypotheses is the maximum-likelihood ratio test [30]. It consists in computing the likelihoods

*L*(*H*_{0},*θ*) and*L*(*H*_{1},*θ*) of both hypotheses, and taking their ratio*τ*(*θ*) =*L*(*H*_{1},*θ*)/*L*(*H*_{0},*θ*). Then select a threshold value*τ*_{0}, and perform the following test:• if

*τ*(*θ*) >*τ*_{0}, there is a target in the center of M^{θ},• else there is no target.

The value of the threshold *τ*_{0} sets a compromise between the probability of detection and the probability of false alarm. Using this method, we can determine if the target is present or not at each location *θ.* If there may be several targets in the image, it is thus possible to determine their locations.

Let us now return to the problem of object location, which is slightly different : We assume that we know that there is only one target in the image (which can be the case in tracking applications for example), and we want to determine its location. In order to do so, we can extend the previously described detection algorithm to a location algorithm by choosing as the estimate of the target location the position which maximizes *τ*(*θ*). In other words, the estimated location will be:

In the following, we will call this estimation approach the ”*maximum likelihood ratio test”* (MLRT). This procedure is a heuristic extension of the optimal detection algorithm. Note that similar procedures have been used for locating edges in optical images (with Gaussian grey level statistics) [31] and in Synthetic Aperture Radar images (with Gamma grey level statistics) [22].

We shall now specify the expression of the likelihood ratio *τ*(*θ*) for a SIR image belonging to the exponential family:

where *n*_{c} is the number of pixels of the scanning subwindow M^{θ} and thus *n*_{c} *= n*_{a} + *n*_{b}*.* Here again, the functions *F*_{a} and *F*_{b} are dependent of the considered pdf and the prior on the nuisance parameters for the marginal Bayesian and MAP approaches but are equal in the case of a ML estimation of these parameters.

We now specialize Eq. 24 to particular pdf’s belonging to the exponential family when the ML estimation of the nuisance parameters is considered. For simplicity reasons, let us introduce the following notations:

where *ℓ =* 1 or 2.

In the Bernoulli case, Eq. 24 becomes

in the Gaussian case,

in the Gamma case,

in the Poisson case

and in the Rayleigh case

## 3.3

### The whitening process

In some real-world images, the statistics of both the target and the background cannot be approximated with good precision with uncorrelated random fields. In these situations, the SIR filter is then suboptimal and can fail. In figure 2, two scenes and the maximum of each line of their respective generalized correlation planes obtained with the SIR technique (i.e. ℓ(s∣*H*_{θ})) are shown. The maximum of the correlation plane represents the estimated location of the target. In scene (a), the pdf’s of both the target and the background are white and Gaussian whereas those of scene (b) are also Gaussian but correlated. One can note that the SIR algorithm adapted to white Gaussian statistics fails on scene (b) whereas it is able to locate the target on scene (a).

We want to design an optimal algorithm for the location of a random correlated target appearing on a random correlated background. The main problem consists in finding texture models that characterize real situations and for which the optimal solution is mathematically simple. Such a method has been recently designed using the same random Markov field model for both the target and the background [32]. This case represents a difficult, but particular situation.

In this subsection, we propose to apply a preprocessing to the input image in order to obtain an image with white Gaussian textures and then to apply the SIR method which is optimal in that case. However, as soon as the textures of both the target and the background are strongly correlated, the preprocessing introduces a third region in the preprocessed image, which characterizes the frontier between the target and the background. Following reference [25] we will thus discuss how to model this region with a white Gaussian random field and we design a SIR filter that takes into account the three regions (i.e. the background, the target and the frontier).

The Fourier Transform of s is denoted ŝ (or *ŝ*(*v*) at frequency *v*), *z** is the complex conjugate value of *z* and |*z*| its modulus. We define the whitening filter in the Fourier domain by:

where *ϵ* is a small positive constant introduced as a regularization parameter which avoids divergence when |*ŝ*(*v*)| is close or equal to zero. The Fourier Transform v̂ of the preprocessed image **v** is thus:

One can note that since s and **ĥ** are real, **v** is also real. It is easy to show that the square modulus of **v̂** is approximately constant. We can conjecture that the pixels of the preprocessed image **z** are approximately Gaussian uncorrelated variables. In figure 3, we show a target with a correlated texture which appears on a random correlated background and the obtained preprocessed image. One can show that describing the pixel values of the preprocessed image as Gaussian random variables is a good approximation. If we model the preprocessed image with two independent regions and if the nuisance parameters are estimated in the ML sense, the SIR method leads to (see Eq. 14):

where

where *ℓ* = 1 or 2.

However, as one can remark in figure 3, the preprocessing can introduce three regions in the preprocessed image. Indeed, as soon as the textures are strongly correlated, a frontier appears between the target and the background.

Let **f** denote that frontier (and respectively **a** the target and **b** the background) and let **w ^{f}** (resp.

**w**and

^{a}**w**define the new disjoint window functions composed of

^{b})*n*

_{f}(resp.

*n*

_{a}and

*n*

_{b}) pixels so that (resp. and ) is equal to one within the frontier (resp. the target and the background) and to zero elsewhere when the target is located at the center of the image. We thus propose to describe the preprocessed image

**z**in the following way [33]:

when the target is supposed to be centered on the *θ*^{th} pixel of the image.

Using an analogous approach as previously, we can design a SIR filter that takes into account three regions. This leads to:

where , and are defined as is Eq.34. All these quantities can be determined by correlating binary masks with images **z** and **z**^{2} [25]. They can be obtained with a simple optoelectronical architecture or using FFT algorithm applied to the images *z*_{i} and *z*_{i}^{2}.

## 3.4

### The implementation issue

An interesting point is that *ℓ*[**s**|*H*_{θ}] in the standard SIR approach and *ln*[*τ*(*θ*)] in the MLRT approach can easily be rewritten using correlation operations. Let us consider the MLRT approach, and let [*f***g*]*i* denote the correlation between *f* and *g*:

and let **w** = **w**°. Eq. 25 becomes:

where we remember that **M** = **w** + **w̄**. Since the most intensive computations are involved in with *u* = *a*, *b*, *c* or *f*, this new formulation is very attractive because it is closely connected to the detection architecture described in [10]. Indeed, the detection and location steps require the same correlation functions. A simple optoelectronical architecture could thus perform the detection and/or location with this kind of hardware.

## 4

## Application to segmentation

## 4.1

### Introduction

An important goal of computational vision and image processing is to automatically recover the shape of objects from various types of images. Over the years, many approaches have been developed to reach this goal and in this section, we focus on the segmentation of a unique object in the scene. The unknown parameter *θ* is now the shape of the object in the scene.

A classical approach consists in detecting edges and linking them in order to determine the shape of the object presents in the image. However this approach does not use the knowledge that the object is simply connected. On the other hand, deformable models (also called ”snakes”) incorporate knowledge about the shape of the object from the start. Broadly speaking, a snake is a curve which has the ability to evolve (under the influence of a mathematical criterion) in order to match the contour of an object in the image. The first “snakes” [14] were driven by the minimization of a function in order to move them towards desired features, usually edges. This approach and its generalization [34] [35] [36] are edge-based in the sense the information used is strictly along the boundary. They are well adapted to a certain class of problems, but they can fail in presence of strong noise.

The SIR-based snake we will describe in the following belongs to the deformable template methods, which are parametric shape models with relatively few degrees of freedom. They constitute another interesting approach to recover the shape of an object [37] [38] [16]. The template is matched to an image, in a manner similar to the snake, by searching the value of a parameter vector *θ* (i.e. the node positions) that minimizes an appropriate mathematical criterion. One can cite for example strategies based on the consideration of the inner and the outer regions defined by the snake, which have been recently investigated [15] [16] [39] [40]. It is interesting to note that a statistical processing method can take full advantage of many suitable descriptions of the measured signals (see for example [41] [12] [24] [42] [43]).

The SIR approach allows one to determine this appropriate criterion. First, we generalize the approaches proposed in [12] [42] [44] to different statistical laws which belong to the exponential family and which are well adapted to describe physical situations. This technique is actually an extension of the optimal detection approach introduced in [10] and generalized in the previous sections.

## 4.2

### The SIR snake model

The purpose of segmentation is therefore to estimate the most likely shape for the target in the scene. Note the difference with the optimal location problem of section 3 where the silhouette of the target was known whereas its position had to be found. To achieve the shape estimation issue, we use a *k*-node polygonal active contour that defines the boundary of the shape. **w**^{θ} is now a polygon-bounded support function, one-valued on and within the snake and zero-valued elsewhere and *θ* is the set of the positions of each node of the contour. Let us consider the different hypotheses *H*_{θ} that consist in assessing a shape **w**^{θ} to the target by assessing a position to each node of the contour, so that we can write:

The optimal choice for **w** is the one which maximizes the conditional probability *P*[*H*_{θ}|s]. The ML estimation of the shape (i.e. *θ*) is obtained by maximizing the *a priori* probability *P*[s|*H*_{θ}]; which corresponds to the likelihood of the hypothesis.

Under the previous assumptions we can now specify the expression of the likelihood *ℓ*[s|*H*_{θ}] for a SIR image which belongs to the exponential family.

Here again, the functions *F*_{a} and *F*_{b} are dependent of the considered pdf and the prior on the nuisance parameters for the marginal Bayesian and MAP approaches but are equal in the case of a ML estimation of these parameters.

We now illustrate this result for some particular cases of the pdf family and we use the same notations as in Eq. 25.

In the Bernoulli case, Eq. 40 becomes:

in the Gaussian case,

in the Gamma case,

in the Poisson case,

and in the Rayleigh case,

One can note that the whitening process introduced in the previous section can be also used in order to obtain white random fields well described with Gaussian pdf [13].

## 4.3

### The implementation issue

The window function **w**^{θ} that optimizes the criterion *ℓ*[s|*H*_{θ}] realizes the ML optimal segmentation of the target in the scene. The technical problem is thus to find the value of *θ* which maximizes *ℓ*[s|*H*_{θ}](also denoted *l*(*θ*) in the following):

We use a stochastic iterative algorithm to perform the optimization of *l*(*θ*) and thereby the segmentation. At each iteration *m* of the process, carry out the following steps:

This process is continued until *l*(*θ*^{m}) does not increase anymore.

An interesting point is that *l*(*θ*^{m}) can easily be rewritten using correlation operations. Let [*f* * *g*]_{0} denote the central value of the correlation between *f* and *g*:

One thus has:

Note the similarity between Eqs. 48 and 38. The detection, location and segmentation steps require the same correlation functions. A simple optoelectronical architecture could thus perform these tasks with the same hardware. As will be shown in Section 5.5, joint utilization of location and segmentation algorithms enables us to perform efficient target tracking.

## 4.4

### Generalization of the SIR segmentation approach

#### Constrained deformation

In the previous section, the shape of the snake was not constrained, and could converge to any arbitrary polygon. In some applications, one may have some *a priori* knowledge about the object’s shape, and thus constrain the evolution of the snake to a smaller class of possible shapes. This enables faster and more robust segmentation.

To formalize this approach, let *θ* be the node locations and let us consider a set of transformations *Κ*_{α}(*θ*) with *α*∈ where is the set of possible values of *α*. In the case of the location task, the transformation is a translation of parameter *α* and thus:

More interesting can be the case of in-plane rotation *R*_{α} since the estimation of the target orientation can be performed with a more efficient technique than the general snake algorithm of the previous subsections. Indeed, instead of randomly moving the nodes one can select among all the rotated versions of w the angle *α* which maximizes the likelihood:

This concept can be generalized to other transformations such as isotropic or anisotropic scaling.

#### Recognition

Let us assume that the purpose is to recognize the object or, in other words, to discriminate between different classes. For example, one can imagine that the purpose is to determine whether the target is a car, a truck or a bus. One can now consider that *θ* belongs to the discrete set *Ɗ* of the possible classes of objects. For the above considered example one has *Ɗ* = {*car*, *truck*, *bus*}. The recognition is thus obtained with:

where *ℓ*[s|*H*_{α}] is a nonlinear function of the intercorrelation of *s*_{i} and of (*s*_{i})^{2} with the shapes *w*^{α} of the reference objects.

## 5

## Simulation results

We propose in this section some numerical simulations to illustrate the performance of the location and segmentation algorithms described in this paper. We consider different noise statistics belonging to the exponential family and demonstrate the efficiency of the proposed algorithms on synthetic and real-world images. We also show how the location and the segmentation algorithms can be used together to efficiently track objects in images sequences.

## 5.1

### Binary images

The image in figure 4.a represents an object (a bird) appearing against a complex background. Suppose that this image is to be processed with an optical correlator in which the input image is displayed on a binary spatial light modulator. We need to binarize the image before processing it. In many instances, it has been noticed that it was more efficient to edge-enhance an image before binarizing it; this operation increases its contrast, making it easier to find a good threshold. The result of edge-enhancing and binarizing figure 4.a is represented in figure 4.b. Note that in binarized real-world images, the background noise is often non-homogeneous. For this type of images, the MLRT is thus more efficient than the ML algorithm [29],

The result of processing figure 4.b with the MLET algorithm adapted to Bernoulli statistics is shown in Figure 4.d.

By looking more accurately at the binarized image in Figure 4.b, we can see that it is very noisy. This is because a low threshold has been chosen. It is better to have a low threshold since almost all the information-carrying edges are included in the images. Very little information is thus lost, but the drawback is that a lot of spurious edges remain after the binarization step. These edges are in general non-homogeneously distributed over the image, and this makes it important to use location algorithms robust to non-homogeneous background noise, such as the MLET.

Figure 5 represents segmentation results on two binarized real-world images corrupted with additive Gaussian noise. The images in the left column display the initial shape of the snake. The images in the right column represent the snake after convergence. We can see that the searched shape has been correctly segmented. Note that the initial shape does not need to be very close to the true one for the snake to converge properly. This robustness to snake shape initialization is an important feature of the proposed algorithm in real-world applications.

## 5.2

### Speckled images

Figure 6.a displays two tank-shaped small targets (78 pixels) appearing on a non-homogeneous background with exponential statistics. This background has been generated with the method described in Ref. [29], where we have replaced the Bernoulli variates with exponential variates. The tank in the upper right quadrant has been rotated by 10° with respect to the reference object. The result of processing the scene with the MLET algorithm adapted to speckle statistics appears in Figure 6.b. We can see that the MLET, as most correlation-based algorithms, is robust to small deformations of the target with respect to the reference object.

## 5.3

### Low flux images

Figure 7.a displays a real-world image containing a boat appearing against a mountain background with atmospheric blurring. Figure 7.b represents the same image synthetically perturbed with Poisson noise, simulating for example photon-limited imaging. Figure 7.d shows the result of processing this image with the MLRT algorithm adapted to Poisson noise.

Figure 8 also represents a real image synthetically perturbed with some amount of Poisson noise. The car is segmented using the snake energy adapted to Poisson noise.

## 5.4

### Optronic images

Figure 9.a is a synthetic image representing an airplane on a contrasted urban background. The whole scene is severely blurred. Note that the target gray levels are nonuniform. They are not known *a priori*, since the only information used by the algorithm is the binary shape displayed in figure 9.c. Fig 9.b represents a whitened version of the scene. It can be shown that the statistics of the gray levels in the whitened scene are approximately uncorrelated and Gaussian. The ML algorithm adapted to white Gaussian statistics is applied to the whitened image. The obtained result is displayed in figure 9.d. We can see that we are able to correclty locate the target despite its low contrast and the severely cluttered background.

Figure 10.a also represents an airplane on a contrasted urban background. The whole image is severely blurred. This means that edge-based snake techniques [14, 34, 35, 36] would not be efficient, since the edges between the target and the background are not sharper than the edges internal to the background. The proposed region-based snake method, which relies on all target and background pixels is able to segment the image, as can be seen in figure 10.b. Note that the snake has been applied to the whitened version of figure 10.a.

Figure 10.c represents a real-world image of a car on a road. Here again, the snake is applied to the whitened version of the scene. Note that the the snake has correctly converged although its initial shape (see figure 10.c) was very different from the true one. This is a further proof of the robustness of the proposed algorithm to the initial shape of the snake.

## 5.5

### Applications to tracking

We now illustrate the feasability of using cooperatively location and segmentation approaches to achieve efficient target tracking on image sequences, even if the shape of the target changes during the sequence. Assume that on the image acquired at time *t*, the target has been segmented using the proposed snake method. This segmentation produces a binary reference shape that enables us to locate the target in the image acquired at time *t* +1, e.g. with the MLRT algorithm. This is possible since the MLRT is robust to limited deformation of the target with respect to the reference object. It can thus locate the target even if its shape has slightly changed compared to the previous frame. We then use the obtained position estimate for centering the binary reference. This centered reference is used as the initial shape of a snake which has to converge to the new shape of the object. This process is repeated until the end of the sequence. In summary, using jointly MLRT and snake algorithms consists in first determining the object location (which corresponds to very constrained variation of the shape), and then in segmenting the shape whose position is approximately known. In many instances, the shape variations from one image to the next are small, and only few snake iterations are needed to converge to the new shape.

Let is first consider the problem of tracking walking persons. Typical images can be seen in figure 11. We can note that due to the walk, the apparent shape of the person changes during the sequence. In order to get rid of the influence of the structured background, we make an acquisition of the scene without people. We then substract each frame to this reference frame after having registered them. The MLRT algorithm for location and the snake are applied to this difference image. Such a procedure can be useful in surveillance applications for example. We can see in figure 11 that the object is correctly located and segmented in the image sequence.

The second example consists in tracking a car driving on a highway, and moving away from the camera. Due to this movement, the shape of the target varies during the sequence, and the binary reference used by the location algorithm must be periodically refreshed using the snake segmentation algorithm. We can see in figure 12 the result of applying the proposed tracking method to this sequence. Here again, each frame is substracted to an image of the highway without cars, and the segmentation algorithm is applied to a partially whitened version of this difference image. In order to show the robustness of the proposed algorithm, the snake has been initialized in each image to a square approximately centered on the object. We can see that the car is correctly tracked.

## 6

## Conclusion and perspectives

We have presented a generic approach to parameter estimation in image processing using SIR models. Possible applications include object detection and location, attitude and scale estimation, segmentation and recognition. This approach is based on a simple statistical modeling of the image. This enables us to adapt the algorithms to the statistics of the noise actually present in the image, while keeping the same algorithmic architecture. When the considered model is not sufficient to describe the observed scene, we have described methods for adapting the image (whitening preprocessing) or the algorithm (MLRT approach). The proposed technique is thus flexible, in the sense it can solve a variety of image processing tasks on a variety of images types, while keeping the same basic structure. This technique has been proven efficient on different types of synthetic and real-world images.

There are numbers of perspectives to this work. The unified algorithmic structure of SIR-based methods makes it possible to combine several tasks in a single application. An example has been given of the cooperation between the location and segmentation approaches for target tracking. The inclusion of attitude estimation (as a particular case of constrained-shape segmentation) and of recognition in such systems would be useful in many applications.

Another interesting development of this work is optical implementation of the described algorithms. We have shown that they are based on correlation operations with binary references. This makes it possible to benefit from the speed of binary SLM-based optical correlators. Note that in this case, the optical correlator would constitute the main building block of a system that would not only be able to perform location or recognition of a known target, but also segmentation.

## REFERENCES

^{2}filtering method and application to targets and backgrounds with random correlated gray levels,” Opt. Lett., 22 630 –632 (1997). Google Scholar