*x*,

*y*, and longitudinal

*z*axes. If WB calibration data are applied to NB images, the measurement bias increases from the middle of the working spectral range to its edges and can reach significant values: up to 0.1 mm along

*x*axis and 0.15 mm along

*z*axis in 10 to 25 mm distance range. To overcome this, we propose the calibration and image processing procedures based on a proper choice of a few spectral bands for calibration and interpolation of the calculated calibration parameters. Results of multiple experiments using stereo video endoscope confirmed that the proposed technique allows a decrease in the measurement bias by three times in comparison to conventional WB calibration for all wavelengths of the visible range, which essentially improves the measurement accuracy. The impact of WB calibration on random errors of measurements and the quality of image rectification was also analyzed and shown to be insignificant.

## 1.

## Introduction

Stereoscopic video endoscopes have become one of the main tools for nondestructive testing (NDT) in aircraft, automobile, energy, and other industries. These tools are widely used for quantitative characterization of hard-to-reach elements and defects inside complex objects without their disassembly. Remote visual inspection (RVI) using stereoscopic video endoscopes allows noncontact three-dimensional (3-D) geometrical measurements of the structural elements accessible only by small-diameter (4 to 8 mm) probes.^{1}^{,}^{2}

The mostly used approach to the implementation of this technique is the utilization of a miniature prism-based optics for acquisition of two images from two different points on a single sensor (Fig. 1). In this case, an interchangeable stereo tip adapter has to be attached to the distal end of the video probe in front of the integrated lens.^{2}^{,}^{3} Illumination from the white-band (WB) light source is delivered to the inspected object via the fiber optic light guide located inside the probe.

To compute 3-D coordinates of the object points, conventional stereoscopic technique may be implemented.^{4}^{–}^{6} It employs a calibration procedure, image rectification, and other processing algorithms for pixel matching prior to the start of the measurement process resulting in a full 3-D surface map of the inspected object. An advanced ray tracing camera model allows to take into account a specific distortion introduced by the prism.^{7}^{–}^{10}

In some applications, contrast visualization of the inspected surface using WB illumination and, therefore, defects localization and quantitative characterization is not effective or even impossible. In these cases, fluorescence, Raman, and other narrow-band (NB) spectral imaging techniques may be helpful to increase the performance of the RVI procedure.^{11}^{,}^{12} For example, fluorescent penetrant inspection requires selection of ultraviolet light to ensure good contrast between the glow emitted by the penetrant in the defected areas and the unlit surface of the material.^{13} For the material identification, the reflectance spectra in the wide range are effective.^{14} In this case, it is especially important to analyze both local spectral properties of the object and spatial distribution of these properties. Therefore, spectral imaging feature integrated into the stereoscopic video endoscopes makes these tools more informative and flexible for real-time NDT applications. This feature may be implemented by means of the spectral filtration of the illumination irradiated by the WB light source, for example, using the tunable filter or a filter wheel conjugated with this source. Figure 2 shows examples of the effectiveness of spectral imaging for contrast visualization of the defect.

Conventional geometrical calibration procedure is implemented for white-light illumination limited only by spectral width of the light source and sensitivity of the sensor. Such calibration is not optimal for NB illumination and may limit the achievable measurement accuracy due to inevitable chromatic aberrations of the prism-based optical system.^{15} Dispersion of light caused by the prism leads to the spectral dependence of the focal distance, distortion coefficients, and other parameters of the whole optical system. This must be taken into account at all stages of stereoscopic image processing: calibration, rectification, etc. Therefore, for proper 3-D measurements under NB illumination, it is necessary to modify conventional algorithms.

In this paper, we experimentally demonstrate this limitation and describe the calibration procedure proposed in our previous work^{16} and based on a proper choice of a few spectral bands to decrease the measurement errors provided by stereo video endoscopes in NB light. We analyze a ray tracing model and spectral dependence of its parameters, show that for operation in any selected spectral band within visible range, the system may be calibrated in a few NBs with further interpolation of calibration parameters, compare the effectiveness of the proposed approaches to calibration and image rectification procedures with the conventional ones.

## 2.

## Mathematical Models and Data Processing

Processing of stereoscopic images includes stereo matching, calculation of 3-D point coordinates, geometrical measurements and/or reconstruction of 3-D surface. In order to increase the accuracy and speed of image matching, some preprocessing techniques, such as rectification, distortion correction, and noise reduction may be implemented. For precise rectification as well as for calculation of 3-D point coordinates, one needs a mathematical model, which converts 2-D image coordinates into 3-D ray coordinates in the object space. Preliminary calibration of the system allows obtaining the parameters of this model. Since the ray tracing camera model provides better measurement accuracy for prism-based stereoscopic systems,^{10} all theoretical considerations below are based on this model.

## 2.1.

### Ray Tracing Camera Model

The image acquired by the prism-based stereoscopic system consists of two parts. Each of them is formed by the rays passing through one part of the biprism. To make the mathematical description applicable to multiocular prisms,^{7} we consider that the image consists of $N$ parts obtained via $N$ parts of the prism and further refer to them as “$i$’th image part” and “$i$’th prism part,” $i=\mathrm{1,2}\dots N$. We formulate the model using a ray tracing from the image plane to the object space.^{8}^{,}^{10} We define the vector of ray coordinates $\mathbf{l}={({\mathbf{c}}^{T},{\mathbf{v}}^{T})}^{T}$ as the concatenation of 3-D coordinates of the origin point $\mathbf{c}$ and the direction vector $\mathbf{v}$ and use ${\mathbf{l}}_{m,i}={S}_{m,i}({\mathbf{l}}_{m-1,i})$ notation to describe the refraction of each ray ${\mathbf{l}}_{m,i}$ on the $m$’th surface for $i$’th image part. Each transformation ${S}_{m,i}$ is derived according to the vector form of Snell’s law.^{10} To find the ray coordinates ${\mathbf{l}}_{2,i}$ in the object space for each image point ${\mathbf{p}}_{i}$, we consider the pinhole model with radial distortion for the main lens, apply it to define ray coordinates ${\mathbf{l}}_{0,i}$, and further use ${S}_{1,i}$ and ${S}_{2,i}$ to trace this ray through the prism (see Fig. 3).

The forward pinhole model is usually formulated as the combination of basic transformations:

## Eq. (1)

$$\mathbf{p}=P\circ E({\mathbf{x}}_{\mathrm{w}})=A\circ D\circ F\circ E({\mathbf{x}}_{\mathrm{w}}),$$^{4}

^{,}

^{17}Here, we need the inverse pinhole model in the form:

## Eq. (2)

$${\mathbf{l}}_{0,i}=P({\mathbf{p}}_{i})={F}^{-1}\circ {D}^{-1}\circ {A}^{-1}({\mathbf{p}}_{i}),$$## Eq. (3)

$${\mathbf{x}}^{\u2033}={A}^{-1}({\mathbf{p}}_{i}):\text{\hspace{0.17em}\hspace{0.17em}}\left(\begin{array}{c}{x}^{\u2033}\\ {y}^{\u2033}\end{array}\right)={(\frac{u-{u}_{0}}{{f}_{\mathrm{u}}},\frac{v-{v}_{0}}{{a}_{\mathrm{f}}{f}_{\mathrm{u}}})}^{T},$$## Eq. (4)

$${\mathbf{x}}^{\prime}={D}^{-1}({\mathbf{x}}^{\u2033}):\text{\hspace{0.17em}\hspace{0.17em}}\left(\begin{array}{c}{x}^{\prime}\\ {y}^{\prime}\end{array}\right)=(1+{\rho}_{1}{r}^{2}+{\rho}_{2}{r}^{4}+{\rho}_{3}{r}^{6})\left(\begin{array}{c}{x}^{\u2033}\\ {y}^{\u2033}\end{array}\right),\phantom{\rule{0ex}{0ex}}{r}^{2}={x}^{\u20332}+{y}^{\u20332},$$## Eq. (5)

$${\mathbf{l}}_{0,i}={F}^{-1}({\mathbf{x}}^{\prime}):\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{l}}_{0,i}={((\mathrm{0,0},0),({x}^{\prime},{y}^{\prime},1)/|({x}^{\prime},{y}^{\prime},1)|)}^{T}.$$The set of intrinsic camera parameters ${f}_{\mathrm{u}}$, ${k}_{\mathrm{f}}$, ${u}_{0}$, ${v}_{0}$ and the distortion parameters ${\rho}_{1}$, ${\rho}_{2}$, ${\rho}_{3}$ are used to describe the inverse pinhole model. Since the inverse transformation ${P}^{-1}$ depends only on the parameters of the image sensor and the main lens, it is the same for all image parts. As a result, the transformation for the ray tracing model can be represented in the form:

## Eq. (6)

$${\mathbf{l}}_{\mathrm{w}\text{\hspace{0.17em}}i}=E({\mathbf{l}}_{2,i})={E}^{-1}\circ {S}_{2,i}\circ {S}_{1,i}\circ {F}^{-1}\circ {D}^{-1}\circ {A}^{-1}({\mathbf{p}}_{i}),$$In contrast to the pinhole model, the ray tracing model cannot provide closed-form solution for the forward transformation ${\mathbf{p}}_{i}={P}_{i}\circ {E}_{i}({\mathbf{x}}_{\mathrm{w}})$ because it requires initially unknown direction vector of the line ${\mathbf{l}}_{\mathrm{w}\text{\hspace{0.17em}}i}$ from 3-D point ${\mathbf{x}}_{\mathrm{w}}$. This problem is usually solved by the iterative technique called ray aiming or by look-up-table interpolation. We can formally use the notation for the forward transformation if we consider that this notation stands for the iterative solver or the interpolation technique.

We assume that the prism is located in the air and use the same description as in the paper.^{10} Hence, we need nine parameters for three faces of the prism and one more parameter for the refractive index $n$. The complete vector of parameters $\mathbf{k}$ includes 4 intrinsic camera parameters, 3 distortion parameters, and 10 prism parameters. From the theory of optics, we can conclude that the focal distance ${f}_{\mathrm{u}}$ and the refractive index $n$ should significantly vary for different wavelengths, the coordinates of principal point ${u}_{0}$, ${v}_{0}$, and the distortion parameters ${\rho}_{1}$, ${\rho}_{2}$, and ${\rho}_{3}$ should slightly vary (actually, this depends on peculiarities of the chromatic aberration correction for particular optical system design), but the ratio of focal lengths ${a}_{\mathrm{f}}$ and 9 parameters for the location of prism faces should not change. Consequently, vector $\mathbf{k}$ should be divided into two parts: the first one is common for all spectral bands and the second one is different. The parameters of the second part should either be calibrated separately for every spectral band or represented as a function of wavelength.

## 2.2.

### Calculation of 3-D Point Coordinates

After point matching has been finished, the reconstruction of 3-D point coordinates ${\mathbf{x}}_{\mathrm{w}}$ from $N$ corresponding image points ${\mathbf{p}}_{i}$ may be considered as the optimization problem:

## Eq. (7)

$${\widehat{\mathbf{x}}}_{\mathrm{w}}=\underset{{\mathbf{x}}_{\mathrm{w}}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}(C({\mathbf{x}}_{\mathrm{w}},\mathbf{p},\mathbf{k})),$$For simplicity, we assume that a priori information is not available and assign equal weights to all points and coordinates. Since the ray tracing model provides closed-form solution for the inverse projection, we use the cost function based on the sum of distances $d$ from the point ${\mathbf{x}}_{\mathrm{w}}$ to rays ${\mathbf{l}}_{\mathrm{w}\text{\hspace{0.17em}}i}$:

## Eq. (8)

$$C({\mathbf{x}}_{\mathrm{w}},\mathbf{p},\mathbf{k})=\sum _{i=1}^{N}{d}^{2}({\mathbf{x}}_{\mathrm{w}},{\mathbf{l}}_{\mathrm{w}\text{\hspace{0.17em}}i})=\sum _{i=1}^{N}{{\mathbf{b}}_{i}}^{T}{\mathbf{b}}_{i},\phantom{\rule{0ex}{0ex}}{\mathbf{b}}_{i}=({\mathbf{Id}}_{3\times 3}-{\mathbf{v}}_{\mathrm{w}\text{\hspace{0.17em}}i}{{\mathbf{v}}_{\mathrm{w}\text{\hspace{0.17em}}i}}^{T})({\mathbf{c}}_{\mathrm{w}\text{\hspace{0.17em}}i}-{\mathbf{x}}_{\mathrm{w}}),$$In this case, the solution of the minimization problem formulated in Eq. (7) can be found by the direct least-squares method.^{4} The method for $N=2$ is reduced to determining the midpoint of the common perpendicular and does not require iterations.

In most applications of 3-D measurement endoscopic systems, the calculated 3-D point coordinates are necessary to measure the geometric parameters, such as segment length, point-to-line distance, point-to-plane distance, surface area, etc.^{1} Details on equations used for geometric measurements and uncertainty estimation may be found in the book.^{18}

## 2.3.

### Image Rectification

Conventional rectification methods for pinhole camera models use the assumption that each camera should be rotated around the center of projection. Calculation of the rectification transformation parameters for $i$’th camera can be done by determining the rotation transformation ${E}_{i}$ and the projective transformation ${P}_{i}$ for the new (virtual) camera.^{4}^{,}^{19}^{,}^{20} Unlike the pinhole model, the ray tracing one has no single center of projection,^{10} which makes pinhole rectification techniques inapplicable. The proposed method for the ray tracing model includes the following stages.

I. Applying the inverse ray tracing [Eq. (6)] for point grid on both halves of the image. The maximal convergence points of rays ${\mathbf{l}}_{\mathrm{w}\text{\hspace{0.17em}}i}$ are considered to be the projection centers ${O}_{1}$ and ${O}_{2}$ of virtual cameras (Fig. 4).

II. Rotation of the ${x}_{i}$ axis of each virtual camera until it becomes parallel to the line ${O}_{1}{O}_{2}$. The direction of the ${y}_{i}$ axis has to be defined from the condition of perpendicularity to the axes ${x}_{i}$ and ${z}_{\mathrm{w}}$. The direction of the ${z}_{i}$ axis is assigned perpendicular to the axes ${x}_{i}$ and ${y}_{i}$. After the algorithm calculates the orientation and the position of virtual cameras’ CS, it is possible to determine the parameters of the Euclidean transformation ${E}_{i}$. These transformations translate 3-D point ${\mathbf{x}}_{\mathrm{w}}$ from the world CS to the CS of $i$’th virtual camera: ${\mathbf{x}}_{i}={E}_{i}({\mathbf{x}}_{\mathrm{w}})={\mathbf{R}}_{i}{\mathbf{x}}_{\mathrm{w}}+{\mathbf{t}}_{i}$.

III. Calculation of the parameters of the projective transformations ${P}_{1}$ and ${P}_{2}$ corresponding to the virtual stereopair, which define the coordinates of rectified points. In the first approximation, these transformations are considered to be equal to the transformation $P$ of the real camera, but without the polynomial distortion approximation. ${P}_{1}$ and ${P}_{2}$ need to be adjusted in order to maximize the useful area on rectified images and to make the scale of these images close to the initial one.

Since rays ${\mathbf{l}}_{\mathrm{w}\text{\hspace{0.17em}}i}$ have no single intersection point, we should specify a position of the rectification plane in order to define points ${\mathbf{x}}_{\mathrm{p}1}$ and ${\mathbf{x}}_{\mathrm{p}2}$ used for the projection. This plane is assumed to be parallel to the ${x}_{i}$ ${O}_{i}$ ${y}_{i}$ planes of both virtual cameras. After the rectification transformation is defined, one can calculate the coordinates of the point ${\mathbf{p}}_{i}^{\prime}$ on the rectified image for every point ${\mathbf{p}}_{i}$ on $i$’th half of the initial image. First, the inverse ray tracing [Eq. (6)] is applied for every point ${\mathbf{p}}_{i}$. Next, the point ${\mathbf{x}}_{\mathrm{p}i}$ is calculated by intersecting the refracted ray ${\mathbf{l}}_{\mathrm{w}\text{\hspace{0.17em}}i}$ with the rectification plane. Finally, this point is projected to the point ${\mathbf{p}}_{i}^{\prime}$ on the virtual camera’s image plane by applying transformations ${E}_{i}$ and ${P}_{i}$. Similarly, we can find the inverse rectification transformation from ${\mathbf{p}}_{i}^{\prime}$ to ${\mathbf{p}}_{i}$ and use it to calculate look-up-tables for image interpolation as in conventional pinhole rectification technique.

In order to achieve higher accuracy of rectification, the plane must be placed as close to the real object position as possible. The impact of the distance to the rectification plane needs additional analysis and is outside of the scope of this paper. Based on multiple experiments, we can conclude that one rectification distance is usually enough to cover the working range typical for prism-based stereoscopic systems.

We should notice that the proposed image rectification technique uses the vector of parameters $\mathbf{k}$, which may vary for each spectral band. Hence, the impact of this factor on the accuracy of image rectification should also be estimated as a part of our analysis of WB and NB calibration applicability in this paper.

## 3.

## Calibration

## 3.1.

### Standard Calibration Procedure

The aim of the calibration procedure is to determine the parameter vector $\mathbf{k}$ using captured images of a calibration target. These targets may be manufactured as flat, step-like, or cube objects with many contrast markers, with centers that can be easily determined on the image (chessboard pattern, circles, line grid, or other geometric patterns). We consider that the calibration target has $M$ points and 3-D coordinates of the points ${\mathbf{x}}_{\mathrm{t}j}$, $j=\mathrm{1,2}\dots M$ are known with a certain accuracy in the CS of the calibration target. During the image capturing, the target is placed at $R$ different positions to acquire images. Next, the image coordinates ${\mathbf{p}}_{i,j,k}$, $k=\mathrm{1,2}\dots R$ are calculated for each point and each position using image processing. We should notice that for $i$’th camera (i.e., for $i$’th image part) and $k$’th position, only a part of $M$ points will be visible, so the following description assumes that the point may be detected in the image and its image coordinates may be calculated. The complete projection transformation for these points can be written as ${\mathbf{p}}_{i,j,k}={P}_{i}\circ {E}_{i}\circ {E}_{k}^{\prime}({\mathbf{x}}_{\mathrm{t}j})$, where ${E}_{k}^{\prime}$ stands for the Euclidean mapping from the CS of the calibration target to the WCS. In addition to the parameter vector $\mathbf{k}$ for transformations ${P}_{i}$ and ${E}_{i}$, we also need to find vector ${\mathbf{k}}_{\mathrm{t}}$ describing transformations ${E}_{k}^{\prime}$ for each $k$. If we introduce the composite vector ${\mathbf{x}}_{\mathrm{t}}$ of all 3-D points and the composite vector $\mathbf{p}$ of all projections, we can write the optimization problem for calibration as follows:

## Eq. (9)

$$\widehat{\mathbf{k}},{\widehat{\mathbf{k}}}_{\mathrm{t}}=\underset{\mathbf{k},{\mathbf{k}}_{\mathrm{t}}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}(C({\mathbf{x}}_{\mathrm{t}},\mathbf{p},\mathbf{k},{\mathbf{k}}_{\mathrm{t}})).$$The cost function for the ray tracing model should use the sum of squared distances from 3-D points to backprojected rays: ${d}^{2}({\mathbf{x}}_{\mathrm{t}j},{\widehat{\mathbf{l}}}_{\mathrm{t}i,j,k})$, where ${\widehat{\mathbf{l}}}_{\mathrm{t}i,j,k}={{\widehat{E}}_{k}^{\prime}}^{-1}({\widehat{\mathbf{l}}}_{\mathrm{w}\text{\hspace{0.17em}}i,j,k})$, ${\widehat{\mathbf{l}}}_{\mathrm{w}\text{\hspace{0.17em}}i,j,k}$ is calculated from ${\mathbf{p}}_{i,j,k}$ using Eq. (6) with current estimation of $\widehat{\mathbf{k}}$ and ${{\widehat{E}}_{k}^{\prime}}^{-1}$ is described by current estimation of ${\widehat{\mathbf{k}}}_{\mathrm{t}}$. If we use a flat calibration target, we can replace rays ${\widehat{\mathbf{l}}}_{\mathrm{t}\text{\hspace{0.17em}}i,j,k}$ by points ${\widehat{\mathbf{x}}}_{\mathrm{t}\text{\hspace{0.17em}}i,j,k}$ of ray intersections with XOY plane in the CS of the target and write the cost function as follows:

## Eq. (10)

$$C({\mathbf{x}}_{\mathrm{t}},\mathbf{p},\widehat{\mathbf{k}},{\widehat{\mathbf{k}}}_{\mathrm{t}})=\sum _{i,j,k}{({\mathbf{x}}_{\mathrm{t}j}-{\widehat{\mathbf{x}}}_{\mathrm{t}\text{\hspace{0.17em}}i,j,k})}^{T}{\mathbf{\Sigma}}_{\mathrm{x}\text{\hspace{0.17em}}i,j,k}^{-1}({\mathbf{x}}_{\mathrm{t}\text{\hspace{0.17em}}j}-{\widehat{\mathbf{x}}}_{\mathrm{t}\text{\hspace{0.17em}}i,j,k}),$$^{4}to solve the minimization problem Eq. (9). In order to improve stability and avoid local minimum, one may apply global search optimization techniques, such as particle swarm optimization and genetic algorithms.

^{21}

^{,}

^{22}

## 3.2.

### Modification for Multiple Spectral Bands

Here, we consider that the calibration problem is solved for multiple spectral bands simultaneously. Hence, we acquire images of the calibration target in $L$ spectral bands and calculate the image coordinates ${\mathbf{p}}_{i,j,k,l}$ for each band $l=\mathrm{1,2}\dots L$. If we assume that the vector of calibration parameters ${\mathbf{k}}_{l}$ is unique for each band, the composite vector $\mathbf{k}$ should be defined as $\mathbf{k}={({\mathbf{k}}_{1}^{T},{\mathbf{k}}_{2}^{T},\dots ,{\mathbf{k}}_{L}^{T})}^{T}$. According to our analysis above, vector ${\mathbf{k}}_{l}$ should be divided into two parts: ${\mathbf{k}}_{\mathrm{c}}$ is common for all spectral bands and ${\mathbf{k}}_{\mathrm{v}\text{\hspace{0.17em}}l}$ is different. Then, we define the composite vector $\mathbf{k}$ as $\mathbf{k}={({\mathbf{k}}_{\mathrm{c}}^{T},{\mathbf{k}}_{\mathrm{v}1}^{T},{\mathbf{k}}_{\mathrm{v}2}^{T},\dots ,{\mathbf{k}}_{\mathrm{v}\text{\hspace{0.17em}}L}^{T})}^{T}$. Depending on image capturing procedure, number of positions $R$ can also be different for each band. If we ensure that the calibration target does not move when switching spectral bands, we can use single vector ${\mathbf{k}}_{\mathrm{t}}$ to describe transformations ${E}_{k}^{\prime}$. Otherwise, we need vector ${\mathbf{k}}_{\mathrm{t}\text{\hspace{0.17em}}l}$ to describe ${R}_{L}$ transformations ${E}_{k,l}^{\prime}$ for $l$’th spectral band and the composite vector ${\mathbf{k}}_{\mathrm{t}}={({\mathbf{k}}_{\mathrm{t}\text{\hspace{0.17em}}1}^{T},{\mathbf{k}}_{\mathrm{t}\text{\hspace{0.17em}}2}^{T},\dots ,{\mathbf{k}}_{\mathrm{t}\text{\hspace{0.17em}}L}^{T})}^{T}$. Using these notations, we can write the optimization problem as in Eq. (9). The cost function can also be formulated similar to Eq. (10) replacing indices $i$, $j$, $k$ by $i$, $j$, $k$, $l$.

In this work, we assume that ${\mathbf{k}}_{\mathrm{v}l}$ includes two variables: the focal distance ${f}_{\mathrm{u}}$ and the refractive index $n$. All other parameters of the ray tracing model are considered to be the same for all spectral bands, so they are included in ${\mathbf{k}}_{\mathrm{c}}$. Our equipment allows us to capture images in all spectral bands for each position of calibration target, which makes it possible to use single vector ${\mathbf{k}}_{\mathrm{t}}$ for all bands. If the number of spectral bands is high, it seems impractical to capture images in all these bands for calibration. In this case, we should select small number of bands for calibration and interpolate parameters ${\mathbf{k}}_{\mathrm{v}l}$ by wavelength for other bands. The choice of this number and the type of interpolation is determined by the assumed dependence of the parameter on the wavelength. Since the focal distance ${f}_{\mathrm{u}}$ and the refractive index $n$ are usually smooth continuous functions of wavelength, we suppose that three properly chosen bands will be sufficient for calibration in the visible range.

## 4.

## Experimental Verification

## 4.1.

### Experimental Setup

To implement a multispectral calibration, we have assembled a setup shown in Fig. 5. We used a self-made video endoscopic probe with prism-based stereoscopic system, which has field of view $40\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}\times \text{\hspace{0.17em}}45\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$ in each channel and range of working distances 5 to 40 mm. Video camera at the distal end of the probe has 1/6” color image sensor with resolution $1920\times 1080\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$. The probe was fixed on the stage. For illumination of the calibration target, we used a white-light LED. For spectral selection, we inserted a filter wheel between the target and the LED. The wheel contained 6 NB filters with center wavelengths 460 nm, 503 nm, 550 nm, 600 nm, 650 nm, 704 nm, and a FWHM 10 nm and an empty aperture for WB image acquisition. Figure 6 shows the spectral dependency of LED relative power distribution and NB filters transmission curves.

To acquire images for calibration and tests, we utilized a plane calibration target with black and white chessboard pattern. Due to sufficient range of distances and magnifications, three samples of calibration target with 0.5 mm (small-sized), 1 mm (middle-sized), and 2 mm (large-sized) chessboard square size have been manufactured. Each target had a grid of $25\times 25$ squares. A chessboard pattern was produced by chrome etching on glass with inaccuracy about $1\text{\hspace{0.17em}\hspace{0.17em}}\mu \mathrm{m}$.

Without filters, i.e., in WB mode, and with each NB filter, we have captured two image series: calibration sequence and test sequence. The calibration sequence included images captured at different positions of small-sized, medium-sized, and large-sized calibration targets covering the total range of distances from 8 to 32 mm. Each of these targets has been placed approximately perpendicular to $z$-axis of the stand at the distance, which allows to see at least eight markers in a horizontal row on both image halves. Then, the target was consequently rotated by 30 deg and 45 deg around $x$ and $y$ axes and by 45 deg around $z$ axis to capture images at oblique angles. Finally, we have positioned the target perpendicular to $z$-axis at approximately $1.5\times $ longer working distance than the initial position. As a result, we have acquired seven images (corresponding to WB light and 6 NB filters) for each of 32 positions of calibration targets.

Test sequences were collected for small-sized and medium-sized calibration targets in the following way. Again, the calibration target has been placed approximately perpendicular to $z$-axis of the stand at a reasonably close distance and oriented so that horizontal rows of markers were approximately horizontal on the image. We used translation stage to shift the calibration target along $z$-axis and captured images with 1 mm step. Thus, we have obtained series containing 6 images for small-size target and 15 images for medium-sized one in the range from 10 to 24 mm for WB light and each NB filter. The range of distances was limited by the depth of field of the optical system and the condition of minimal chessboard square scale suitable for robust image processing.

## 4.2.

### 3-D Geometrical Measurements

After obtaining WB and all NB calibration images, we processed them to calculate image coordinates for each chessboard node. The developed software for image processing allows a user to set the initial correspondence between image points and points on calibration target using two specific chessboard nodes (see the image of the calibration target in the right part of Fig. 5) and further to refine image coordinates using the technique described in the paper.^{23} All automatically processed images of calibration and test sequences have been checked manually afterward to delete points in low-illuminated areas, around glare, etc. Since the matching between image points and points on calibration target has been done for both image parts, the stereo correspondence for left and right half-images was obtained automatically. Using these coordinates and the calibration technique for the ray tracing camera model,^{10} we obtained the set of calibration parameters for WB image series as well as for each of NB series independently. The images of the test sequences have been used to calculate the 3-D coordinates for each node applying the ray tracing model with different calibration sets. To characterize and compare the measurement accuracy achieved using different camera models and calibration procedures, a few criteria may be used. The straightforward approach is to carry out multiple measurements in each point and calculate mean value and standard deviation (STD). In this case, mean value characterizes the systematic measurement error caused by nonideal calibration and some random factors that vary from point to point, but do not change when several measurements are made at the same point, for example, the errors of test-object manufacturing. STD characterizes only random error caused by noise in the image and varies from point to point due to illumination nonuniformity, vignetting and aberrations. It is time-consuming and challenging to show and analyze these 3-D distributions within the whole field of view and for several spectral bands. Therefore, in the field of endoscopic measurements and other imaging applications, accuracy is normally characterized by the mean value and STD calculated over the field of view at the distance $z$ (i.e., over all points of calibration target at each position). These values depend only on the distance $z$ to calibration target and are easy to illustrate.

We chose orientation of the calibration targets so that the grid lines were approximately parallel to $x$- and $y$-axis of the stand. Thus, we used the distance between chessboard nodes as $x$- and $y$-segments. Points to measure the segment along $z$ axis were taken either from the previous or from the next image of the series, i.e., when the test chart is shifted by 1 mm. We should note that the measured segments are not exactly aligned along $x$, $y$, and $z$ axes in any CS, but we consider this data to be sufficient to analyze measurement uncertainty for different orientation of segments.

Figure 7 shows the mean value and STD of difference between reference and measured lengths of the 1 mm segment along $x$, $y$, and $z$ axes. If WB calibration data are applied to NB images from the test sequence, the measurement bias increases from the middle of the working spectral range (550 nm) to its edges (460 and 704 nm) and can reach significant values: up to 0.1 mm along $x$ axis and 0.15 mm along $z$ axis [Fig. 7(a)], which is more than three times larger than STD for the WB calibration applied to WB measurements (about 0.03 mm along $x$ axis and up to 0.05 mm along $z$ axis). Applying the WB calibration data to NB images in the green part of the spectrum, where the sensor sensitivity and LED intensity are maximal, always provides almost the same mean value and STD as for WB images. As expected, the best results are achieved, when the spectral ranges of calibration data and test images are the same [Fig. 7(b)]. In this case, mean value error does not exceed 0.025 mm along $x$ axis, 0.01 mm along $y$ axis, and 0.05 mm along $z$ axis, so it is about three times smaller than corresponding STD. One more interesting example is applying all the obtained calibration packs to a particular NB image series. We chose NB filter C2 (650 nm) for illustration [see Fig. 7(c)]. One can see that the greater the distance between the spectral range used for calibration and the band used for the measurement, the larger the measurement bias. At the same time, the values of STD are almost independent of the choice of spectral band for calibration.

Actually, the mean values shown in Fig. 7 correspond to systematic error of 3-D coordinates mainly caused by uniform or nonuniform scaling. The effect of scaling is clearly seen from Figs. 7(a) and 7(c) as the movement of data points along the $z$ axis of the plot. In contrast, STD values are affected by both random errors originated from errors of image coordinates and systematic errors caused by more complex distortion of 3-D coordinates than scaling (such as adding curvature to originally flat objects). According to Fig. 7(c), applying different calibration parameters to one NB image set does not lead to significant variation of STD values, hence, the impact of systematic error on STD is rather small. Slight variations of STD values are mainly the result of the overall scaling. Hence, the STD values actually represent random errors. The increase with distance is typical for stereoscopic measurements.^{4}

As can be seen from Fig. 7(b), the STD values for independent NB calibrations are quite different. The main reason is the Bayer color mask, which actually leads to two times higher image resolution for C4 and C5 filters (using G image channel) than for C1, C2, C3 filters (using R) and C6 (using B). Additionally, the combination of white LED spectrum, NB filter transmission curve, and camera spectral sensitivity provides lower signal-to-noise ratios for C1 and C6. As a result, the STD of random errors for C4 and C5 is smaller in comparison to other bands.

Even performing independent calibrations for all NB channels does not allow to exclude bias completely [Fig. 7(b)]. Small residual values are probably induced by the limitation of the applied camera model, which does not separately consider the pupil aberration of the main lens. This is the subject of a future research.

We have applied the calibration technique for all six NBs and WB jointly, as proposed in Sec. 3.2 (denoted further as *All*). Since this approach requires too many images, it is barely applicable in practice, but it allows to estimate the spectral dependencies of the calculated values ${f}_{u}$ and $n$ and find the optimal type of interpolation (Fig. 8). Possible candidates are linear interpolation using values for C1 and C6 *C1C6* or C2 and C5 *C2C5* and Conrady’s equation ^{24} for interpolation using three values: C1, C4, and C6 *C1C4C6*. As shown in Fig. 8, the latter one provides a very close approximation.

For practical testing, we utilized two series for C1 and C6 filters for joint calibration and calculated ${f}_{\mathrm{u}}$ and $n$ for other filters using linear interpolation *C1C6*. Next, we repeated this method for C2 and C5 filters *C2C5* and three series for C1, C4, and C6 filters using Conrady’s equation *C1C4C6*.

To estimate the effectiveness of these approaches, we compared them with the calibration in WB *White* and the case of independent calibration for all spectral ranges *Own*. Figure 9 illustrates the results for the NB spectral range C2 (650 nm). One can see that the measurement bias may be radically decreased by simultaneous calibration. The results for *C1C4C6* and *All* almost coincide with the independent calibration *Own*. Since the parameters for this spectral band (C2) were not interpolated for *C2C5*, the results for *C2C5* are also the same as for *All*. The STD values are mainly caused by random errors and are not significantly affected by the choice of calibration technique.

In order to choose the optimal calibration technique, we summed the absolute values of the mean values across all spectral bands and considered the resulting values as the overall “score” for accuracy assessment focused on measurement bias (Fig. 10). Since the score for *C1C4C6* matches the score for the joint calibration using all series *All* and the independent calibration *Own*, we can conclude that three NBs located in the middle and on the edges of the working spectral range are enough to provide the desired measurement accuracy. The linear interpolation using two nonedge NBs *C2C5* is slightly worse but it still significantly increases the accuracy compared to WB calibration.

## 4.3.

### Image Rectification and 3-D Shape Reconstruction

Since the proposed image rectification technique uses the vector of calibration parameters, it may also be affected by the choice of the calibration technique. In order to evaluate the rectification accuracy, we assessed the vertical mismatch $\mathrm{\Delta}y$ for each pair of corresponding points in rectified images. We introduced the criterion $R=A/B$, where $A$ is the number of pixels with $\mathrm{\Delta}y$ less than $\mathrm{\Delta}{y}_{\mathrm{max}}$ pixels, $B$ is the total number of pixels having the corresponding points. Proportional to the radius of the point spread function, the value of $\mathrm{\Delta}{y}_{\mathrm{max}}$ was chosen equal to 4 pixels. We assume that this value is the maximum tolerable one for point matching based on the experimental results with images of various real objects.

To analyze the sensitivity of the rectification to the deviation of ${f}_{\mathrm{u}}$ and $n$, we have obtained the dependencies of $R$ on these parameters using the computer simulation. For this purpose, we have used the ray tracing model with *White* set of parameters to acquire the synthetic image of the point grid located at various distances ${Z}_{\mathrm{obj}}={Z}_{\mathrm{rect}}$ (10, 20, 30, and 40 mm). Next, we acquired synthetic images with deviated values of ${f}_{\mathrm{u}}$ and $n$, rectified these images using the original *White* set of parameters and estimated $\mathrm{\Delta}y$ and $R$ (see Fig. 11). We have chosen *White* set because it corresponds approximately to the center of the working spectral band. As shown in Fig. 8, ${f}_{\mathrm{u}}$ varies in the range $\pm 10\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$ and $n$ varies in the range $\pm 0.006$ across the wavelength band 450 to 750 nm. Figure 11 demonstrates that under these variations, $R$ is stable and equal to 1. Hence, we can predict that the usage of WB calibration should not significantly lower image rectification accuracy.

To compare the effectiveness of WB and NB calibration for the rectification of real images, we have used the test image sequence of the calibration targets and calculated the vertical deviation ($\mathrm{\Delta}y$) maps of the chessboard nodes (Fig. 12). One can see that the difference between WB calibration *White* and independent calibrations for each NB *Own* does not exceed 1 pixel.

Finally, we have calculated the 3-D structure of the metal test specimen at 460 nm, 503 nm and 704 nm using WB and NB calibration to show the influence of rectification on the quality of the reconstructed 3-D surfaces (Fig. 13). We have used normalized cross-correlation^{5}^{,}^{8} for stereo matching on rectified images. Distance to the object was 12.5 mm.

Figure 13 indicates that WB calibration almost does not lead to the rectification errors and, therefore, does not influence the accuracy of point matching. However, it causes significant geometrical distortion of the reconstructed 3-D shape, which corresponds well to the results of Sec. 4.2. The bottom images in Fig. 13 show that under WB calibration, 3-D models obtained at different wavelengths have different depth and scale. The results for different variants of joint calibration (*All*, *C1C6*, *C2C5*, *C1C4C6*) confirm that the impact of the calibration technique on the quality of rectification is insignificant.

## 5.

## Conclusion

NB imaging provides a reduction of chromatic image aberrations introduced by a prism-based optical system and, therefore, better accuracy of stereo matching. The experimental results have confirmed the theoretical assumption that applying WB calibration parameters is not optimal if NB images are used for the measurements. Using WB calibration for NB images leads to significant bias and has almost no effect on STD across field of view. For accurate quantitative characterization of the inspected objects and 3-D shape reconstruction, it is necessary to calibrate the system either in a certain NB, where contrast visualization is achieved, or in a few NBs with further interpolation of calibration parameters for operation in any selected spectral band. We have shown that for measurements in any NB within the whole visible range with the accuracy comparable to the accuracy of independent NB calibrations, it is enough to calibrate the system in three NBs corresponding to its center (in the green part of the spectrum) and edges (in the blue and red parts of the spectrum).

Applying WB calibration to NB images does not lead to significant rectification errors and, therefore, does not lower the accuracy of point matching in comparison with the accuracy obtained using NB calibration.

The results of our experiments may be useful for the development of RVI tools for spectral detection and quantitative characterization of low-contrast objects with increased sensitivity and measurement accuracy.

## Acknowledgments

The Russian Science Foundation (project #17-19-01355) financially supported the work. In the part of rectification algorithms analysis, the research is supported by Russian Fund for Basic Research (project # 17-29-03469).

## References

**,” https://www.qualitymag.com/articles/93455-videoscope-trends-improvements-and-new-developments Google Scholar**

*Videoscope trends: improvements and new developments***,” in Proc. 18th World Conf. Nondestr. Test., (2012). Google Scholar**

*Advances in three dimensional measurement in remote visual inspection***,” Proc. SPIE, 10466 104664D (2017). https://doi.org/10.1117/12.2288390 PSISDG 0277-786X Google Scholar**

*Stereoscopic tip for a video endoscope: problems in design***,” J. Electron. Imaging, 14 043020 (2005). https://doi.org/10.1117/1.2137654 JEIME5 1017-9909 Google Scholar**

*Virtual stereovision system: new understanding on single-lens stereovision using a biprism***,” J. Opt. Soc. Am. A, 29 1828 –1837 (2012). https://doi.org/10.1364/JOSAA.29.001828 JOAOD6 0740-3232 Google Scholar**

*Accurate geometrical optics model for single-lens stereovision system using a prism***,” Appl. Opt., 54 7842 –7850 (2015). https://doi.org/10.1364/AO.54.007842 APOPAI 0003-6935 Google Scholar**

*Single-lens 3D digital image correlation system based on a bilateral telecentric lens and a biprism: validation and application***,” J. Opt. Soc. Am. A, 33 2213 –2224 (2016). https://doi.org/10.1364/JOSAA.33.002213 JOAOD6 0740-3232 Google Scholar**

*Biprism distortion modeling and calibration for a single-lens stereovision system***,” Comput. Opt., 41 (4), 535 –544 (2017). https://doi.org/10.18287/2412-6179-2017-41-4 Google Scholar**

*Optimal calibration of a prism-based videoendoscopic system for precise 3D measurements***,” Proc. SPIE, 106770 106770F (2018). https://doi.org/10.1117/12.2307381 PSISDG 0277-786X Google Scholar**

*Calibration of a stereoscopic video endoscope for precise three-dimensional geometrical measurements in arbitrary spectral bands***,” IEEE Trans. Pattern Anal. Mach. Intell., 28 1335 –1340 (2006). https://doi.org/10.1109/TPAMI.2006.153 ITPIDJ 0162-8828 Google Scholar**

*A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses***,” Mach. Vision Appl., 12 (1), 16 –22 (2000). https://doi.org/10.1007/s001380050120 Google Scholar**

*A compact algorithm for rectification of stereo pairs***,” Image Vision Comput., 23 643 –650 (2005). https://doi.org/10.1016/j.imavis.2005.03.002 Google Scholar**

*Projective rectification from the fundamental matrix***,” Neurocomputing, 174A 456 –465 (2016). https://doi.org/10.1016/j.neucom.2015.03.119 NRCGEO 0925-2312 Google Scholar**

*A novel camera calibration technique based on differential evolution particle swarm optimization algorithm***,” J. Mod. Opt., 63 (13), 1219 –1232 (2016). https://doi.org/10.1080/09500340.2015.1130271 JMOPEW 0950-0340 Google Scholar**

*Binocular self-calibration performed via adaptive genetic algorithm based on laser line imaging***,” Pattern Recognit. Lett., 28 921 –930 (2007). https://doi.org/10.1016/j.patrec.2006.12.008 PRLEDG 0167-8655 Google Scholar**

*Which pattern? Biasing aspects of planar calibration patterns and detection methods*## Biography

**Alexander S. Machikhin** is an assistant professor in the National Research University “Moscow Power Engineering University” and a leading researcher in the Scientific and Technological Center of Unique Instrumentation of Russian Academy of Sciences. He received his BEng and MEng degrees from Bauman Moscow State Technical University in 2005 and 2007, respectively, and his PhD degree in optics from the Scientific and Technological Center of Unique Instrumentation of Russian Academy of Sciences in 2010. His current research interests include spectral imaging, acousto-optics, holography, and optoelectronic systems. He is a member of SPIE.

**Alexey V. Gorevoy** is an engineer in the National Research University “Moscow Power Engineering University” and a researcher in the Scientific and Technological Center of Unique Instrumentation of Russian Academy of Sciences. He was graduated from Bauman Moscow State Technical University in 2010. His current research interests include optical design, digital image processing, 3-D imaging, and computer graphics.

**Demid D. Khokhlov** is an engineer in the National Research University “Moscow Power Engineering University” and a junior researcher in the Scientific and Technological Center of Unique Instrumentation of Russian Academy of Sciences. He received his BEng and MEng degrees from Bauman Moscow State Technical University in 2015 and 2017, respectively.