Most of the computer systems developed for the detection or diagnosis of breast lesions share some common steps. Indeed, the typical phases needed for a computer-aided detection (CAD) program are shown in Fig. 7.1. The first operations aim to reduce the noise of the digital mammogram and to isolate suspicious lesions. After that, a feature extraction step is usually carried out. Automatic feature extraction procedures utilize image analysis techniques for the computation of feature vectors characteristic of the segmented lesions. Typically, the lesions are then identified, or characterized, by exploiting the information of the calculated features. Then, a classification phase is performed, where the extracted features are provided as inputs to a classifier. In detection schemes, this phase is known as false-positive reduction. Here, it is necessary to set up a classifier that, hopefully, maintains all the true detected signals, and at the same time, rejects almost all the false positive signals. For diagnosis schemes, this phase aims to characterize the lesion as benign or malignant, on the basis of its features. Several types of classifiers have been investigated, such as artificial neural networks, linear discriminant analysis, and decision trees. Most of the techniques are supervised methods.
Recently, a new family of classifiers has appeared in CAD schemes: the support vector machine (SVM). SVMs have been introduced as a technique that relies on statistical learning theory (SLT). Whereas other techniques, e.g., multilayer perceptrons (MLPs), are based on the minimization of the empirical risk, which is the minimization of the number of misclassified vectors of the training set, where SVMs minimize a function that is the sum of two terms. The first term is the empirical risk, the second term (confidence term) controls the ability of the machine to learn any training set without error. SVMs are attracting increasing attention because they rely on a solid statistical foundation and appear to perform quite effectively in many different applications. After training, the separating surface is expressed as a certain linear combination of a given kernel function centered at some of the data vectors (named support vectors). All the remaining vectors of the training set are effectively discarded and the classification of new vectors is obtained solely in terms of the support vectors (SVs). Usually, the smaller the percentage of SVs, the better the generalization of the machine. The interpretative key lies in the fact that, for SVMs, the modeling of the classes is based on the properties of the example vectors at the boundary edge between the two classes. Indeed, what the SVM adds to the other classifiers is a better check of the boundary cases, the ones where it is more difficult to decide whether they belong to one class or to the other. Since it has a sample of examples representing the distribution at the edges of the two classes, the SVM uses these examples for drawing a boundary map between the classes. In this way, much fewer examples are required for carrying out generalizations than would be necessary if it was required to model the entire distribution of the vectors of the class in order to draw out the mean properties. SVMs have been used both in detection and in diagnosis tasks. In both cases, the SVM acts as a classifier that tends to separate two classes of objects: lesions and nonlesions in detection schemes, and benign and malignant (or normal and abnormal) in diagnostic programs. Several studies have demonstrated that in most conditions, the SVM classifier, thanks to its properties, outperforms other type of classifiers.