An imaging lidar (light detection and ranging) system based on scanning a single beam needs to operate at high pulse rate to record a point cloud quickly. However, if the lidar transmits a new pulse before the last return pulse from the previously transmitted pulses has arrived, the range measurements will not be unique. The signal processing system has to make some assumption to associate a return pulse with one of the transmitted pulses and compute the correct distance. One solution, which can work well for airborne sensors, is to use a digital elevation model to obtain the approximate distance.1 Another solution is to label the transmitted pulses. This can be done by a sequence of micropulses within each macropulse,2 or, in coherent systems, by frequency stepping3 or phase modulation.4 A third solution is to transmit pulses with varying intervals between them. This principle has been employed in radar (Ref. 5, Chapter 15 and Ref. 6, Chapter 4.4) and in many photon-counting lidar systems.7–9 These lidars transmit multiple pulses for each point and apply time correlation to find the correct distance. Varying pulse intervals has also been used in classical lidars, which transmit a single, relatively high-power, pulse for each direction,10,11 but the method was restricted to a single return pulse for each transmitted pulse. That restriction may be acceptable in the context of airborne laser scanning, but in other cases, one may be interested in detecting multiple return pulses from a single transmitted pulse. Furthermore, in order to maximize the detection range, it is desirable to set the detection threshold to a low level, where there will also be some false detections. In this paper, we describe a processing method that combines range ambiguity resolution and noise reduction and also allows multiple return pulses from each transmitted pulse. The method is not constrained to a specific application or range of spatial resolutions, so it should be applicable for a wide range of single-beam lidars that operate with more than one pulse in the air.
The process to extract a point cloud from a detector signal consists of three main stages: filtering, pulse detection, and point detection. Filtering handles saturated pulses and reduces noise. Pulse detection finds all the pulses in the filtered signal that exceed a threshold, and produces a list with time and power for each detected pulse. The filtering and/or pulse detection stages can also include measures to suppress strong backscatter from common optics or from the atmosphere close to the lidar, which can dominate the received signal immediately after each transmitted pulse. The topic of this paper is the point detection stage, which takes as input of the list of detected pulses and the list of times and directions for the transmitted pulses, and produces a point cloud as output.
As in former work,7–11 our proposed point detector is based on varying the interval between transmitted pulses. For each received pulse, it constructs a set of point candidates by associating the pulse with each of the most recently transmitted pulses, for some determined by the maximum likely delay of a returned pulse. In practice, may be limited by the scan speed and the receiver field of view. A point candidate is represented by its spherical coordinates (), where is the range and the angles (pitch) and (azimuth) correspond to the direction of the transmitted pulse. (The reason for using the transmitted direction is that the divergence of the transmitted beam is typically smaller than the field of view of the receiver, so the transmitted beam defines the direction more precisely.)
Point candidates belonging to real objects tend to cluster in space, whereas false point candidates tend to be scattered because of the uneven pulse intervals. Therefore, each point candidate is assigned a figure of merit (FOM) that depends on the other point candidates in its neighborhood. In the simplest form, which is used in most of the examples, the FOM is just the number of point candidates found in a neighborhood in the ()-space, chosen as described below. A more general FOM measure, which takes the power of the pulses into account, is discussed in Sec. 5.
After computing the FOM values, the algorithm selects the point candidate with the highest FOM. This is taken to represent the correct position of a return pulse, so it is added to the output point cloud. All other point candidates arising from the same pulse are removed and so are their contributions to the FOM of neighbor points. The algorithm then selects the next point candidate with the highest FOM, and so on until no point candidate has FOM above some threshold value. This FOM threshold reflects the assumption that a point candidate with no or few neighbors is probably noise.
Figure 1 shows an example of how the point candidates corresponding to four consecutive return pulses are distributed in the range-azimuth plane. The only place where the neighborhood-boxes of multiple point candidates overlap is at the correct range of 526 m, corresponding to the return pulse delay of . These points increase the FOM of each other so the algorithm can resolve the ambiguity. In practice, it is of course possible that some of the return pulses fail to be detected. This loss can be compensated using a greater number of pulses, but the figure shows only four for the sake of clarity.
Because the method is based on finding clusters of point candidates, an object too small to give rise to a cluster will not be detected. Furthermore, if the shape and orientation of an object is such that the lidar sees it as separate parts, then only sufficiently large parts will be detected. The number of point candidates required for detection depends on the parameter settings. If the amount of noise is small, two correct point candidates will be sufficient to determine the range correctly.
Choice of Parameters
We denote the pulse intervals , where are integers, , and is the chosen minimum difference between pulse intervals. The sequence of pulse intervals is repeated periodically so that , and we call the periodic unit of pulses a pulse group. The pulse intervals should be chosen so false point candidates have minimal probability of forming clusters. For this purpose, it is not enough that each is unique; sums of adjacent intervals must also be unique for . The sums can cross pulse groups, so the requirement can be stated as
For small , it is easy to find sequences satisfying this condition by trial and error. If is prime, it can be done more rigorously by taking for and fixed . This ensures that for [because ]. To make the sums unique for different , the constant can simply be taken large enough to make for all .
For the method to be able to determine range uniquely, the duration of a pulse group, , should be greater than the maximum delay of a returned pulse. It is not straightforward to define a maximum range based on the transmitter power and receiver sensitivity, because a strong reflector somewhere in the scene can return a detectable pulse from a distance much greater than the normal operating range of the lidar. However, the maximum delay is eventually limited by the time it takes to scan the receiver field of view past the direction of a returned pulse.
The choice of is a trade-off between performance of the laser and the point detection algorithm. A pulsed laser typically has a limited range of pulse rates where it can operate, and the pulse rate affects the energy and possibly other properties of the pulses. Therefore, the difference between the shortest and longest pulse interval cannot be too large, and this places an upper limit on . On the other hand, an object will give rise to both correct and incorrect point candidates, and to avoid that the latter fall close enough to reinforce the FOM of each other we must haveFig. 2.
There is no hard limit for the neighborhood size in the angle directions, but some guidelines can be found based on object size and density of noise points. If the probability of a detecting a noise point (after suitable processing of the raw signal) is independent of time, the number of noise points within the neighborhood of a point candidate will be approximately Poisson-distributed with mean proportional to the neighborhood volume. Let be a random variable representing the number of such noise points. The mean , where is the solid angle of the neighborhood, is its size in the range dimension, and is a constant characterizing the noise. Let be another random variable representing the number of detected object points, in the same solid angle , from an object large enough to fill this solid angle. We do not assume a specific distribution for , but we assume that the mean value . also depends on the distance and on the reflection properties of the object. The quantity that the point detection algorithm actually works with is the total number of point candidates in the neighborhood, .
For mathematical simplicity, we now take the FOM to be the number of point candidates in the neighborhood. For the FOM threshold to distinguish between noise points and true points, it should be set so that is small and is large when a real object is present. We define the signal-to-noise ratio (SNR) of the FOM as
To set the FOM threshold automatically, we estimate the density of noise points and choose the threshold to make the probability of accepting noise points suitably small. Even if the field of view is completely filled by objects, they will give rise to return pulses from only a small fraction of the scanned volume, because the objects are localized in the range direction. Thus, if we divide the volume into small cells, we can assume that most of them will be empty or contain only noise points, i.e., for these cells. We assume, conservatively, that the 80% of the cells with fewest point candidates do not contain real objects and estimate by fitting a Poisson distribution to this tail of the distribution of . The approximation here is that a few of these cells may in fact contain real return pulses, and end up in the lower tail because they happen to contain few noise pulses. If the noise level is low, it is possible that most of the point candidates arise from real objects, and that the lower tail of the distribution is all zero. In this case, we cannot estimate by fitting, so instead we find an upper bound for it, based on the observed probability of finding zero points in a cell. Once has been estimated, the cumulative distribution function can be used to set the FOM threshold, , so the probability of accepting a noise point has a desired value
In other words, is the probability that a noise point has enough noise neighbors to bring its FOM above . Because of the approximations in the distribution and in the estimate of , this error probability cannot be taken literally, but it is a useful way to parameterize the FOM threshold. In the examples in the next section, we set . Some experiments with a more general FOM measure are described in Sec. 5.
Although the threshold for pulse detection is not a parameter to the point detector, it must be taken into account because it effectively determines the number of point candidates. If the objects of interest are known to have high SNR (defined as the ratio of return pulse amplitude to RMS noise), the detection threshold can be set so high that noise points are eliminated, and the task of the point detector is reduced to solving the range ambiguity. In the more interesting case, where it is desirable to maximize the range for a given transmitted power, the detection threshold must be set to a low value, where many noise pulses will appear. A method for automatic setting should not make any assumption about the number of true return pulses in the scene. However, as explained above in the context of setting the FOM threshold, true return pulses can be expected to appear in only a small part of the scanned volume. Thus, if the detection threshold is set to a value such that most of the scanned volume is filled by a relatively uniform density of point candidates, then these can be assumed to represent noise. A large number of noise points increase the run time of the point detector, so the actual setting of the detection threshold is a trade-off between sensitivity and run time. A useful parameter is the number of detected pulses divided by the number of transmitted pulses. Various detection thresholds, giving up to five detected pulses per transmitted pulse, are tested in the examples.
For comparison, we note that lidars that transmit multiple pulses for each point7–9 often use a pseudorandom pulse sequence and correlation detection. Our reason for not adopting that approach is that our method is designed for lidars that transmit a single pulse for each point, and when the beam is scanned the range difference between adjacent pulses will often be too large for direct correlation with the return signal to work well. The pulse sequence described above is designed for minimum ambiguity in realistic scenes, whereas a random or pseudorandom sequence might be suboptimal.
Although the method has been used successfully in tests on real data, we prefer to use simulated data in the examples because they allow the results to be compared to the exactly known scene under fully controlled conditions. The simulator works in two stages. First, it propagates the beam from the lidar to the target, assumes diffuse reflection, and propagates back to the plane of the lidar receiver. This is repeated for various distances and for multiple random realizations of the diffuse reflecting surface, which give rise to different speckle. The result is a catalog of realistic return beams, represented as matrices of complex amplitudes, for various ranges. The second stage traces each transmitted laser pulse through the scene, and when a pulse hits an object, it randomly draws one of the return beam realizations for the corresponding range. The received power is calculated based on the reflectance of the target and the integral of the intensity of the selected return beam over the receiver aperture, and a pulse with this power is added to the simulated received signal. More complicated propagation effects, such as beam distortion by atmospheric turbulence and pulse distortion by reflection from surfaces not perpendicular to the beam, are omitted because they are not important for the short ranges considered in our examples. Thus, the shape of the received pulse is identical to the shape of the transmitted pulse.
In a real lidar, there are multiple noise sources such as backscatter from the optics and the atmosphere, background light, detector noise, and electronic noise. For the purpose of this paper, it is not important to model these in detail, so the simulator simply adds generic Gaussian noise to the electronic signal. As mentioned in Sec. 2, scattering from common optics (if the transmitter and receiver share the same aperture), and from the close part of the atmosphere, can dominate immediately after each transmitted pulse. Instead of modeling this scattering, we make the worst case assumption that the lidar has to discard the signal in an interval after each transmitted pulse. In the examples, we set this interval to 50 ns, which corresponds to 4.2% of the mean interval between transmitted pulses.
In order to facilitate the analysis, the first scene simply consists of nonoverlapping, rectangular planes. A more complex scene, approximating two buildings partially obscured by forest, is treated in Sec. 4.2.
This scene contains four rectangular planes: size at 200-m distance, at 380 m, at 650 m, and also at 650 m. The three large objects have a reflectance of 0.1 and the small one 0.8. The laser pulse intervals are 1.0, 1.1, 1.2, 1.3, and . The field of view, which is 250 mrad in azimuth by 150 mrad in pitch, is raster scanned with an azimuth scan speed of and a pitch interval between adjacent scan lines of 0.5 mrad. Scanning the whole field of view takes 253 ms, so the number of transmitted pulses is about . The duration of the transmitted pulses is 4 ns (FWHM), and the receiver sample rate is 1 GHz. The signal from the detector is processed by a matched filter for the laser pulse shape. The actual value of the simulated transmitted power is not meaningful in itself, so to compare examples with different power we simply define the highest power to be 0 dB and give the power in the other examples as relative values. Similarly, we give dimensionless, normalized values for the detector signal and detection threshold.
The point clouds obtained under different conditions are compared to the known scene to count the points on each object, noise points close to objects, and noise points unrelated to objects. The point counts are compared to the corresponding numbers for a reference point cloud from a simulation without noise.
Simulation without noise
The pulse detection threshold in this case is set sufficiently low to detect all the return pulses. Because the objects in the scene do not overlap, it is convenient to display the point cloud as a color-coded range image, as shown in Fig. 3. With the chosen pulse intervals and scene, some of the return pulses from object 1 happen to fall in the masked zones after some of the output pulses, and this explains why object 1 does not look dense. The neighborhood size (in either direction from a point candidate) is set to 1.5 mrad in each of the angles and 5 m in range, and the FOM threshold automatically determined from Eq. (4) is 4. FOM thresholds in the range from 2 to 30 work well, although the small object is lost when the FOM threshold exceeds 5. Table 1 summarizes the number of points on each object, as well as the number of noise points, for this and the following examples. We have verified that the point cloud is not very sensitive to neighborhood size, provided that the FOM threshold is adjusted correspondingly.
Summary of the number of correctly identified object points and the number of noise points in point clouds. P is the transmitted peak power in dB, TD is the detection threshold (arbitrary units). Bin is the angular bin size in mrad, and TF is the FOM threshold. Values calculated from Eq. (4) are marked by *. Total pulses are the number of detected pulses. Ni is 100 times the number of correct points on object i, divided by the corresponding number in the reference cloud. The objects are numbered as in Fig. 3. The tolerance for a point to be accepted as correct is 0.4 m. Ne,i is 100 times the number of noise points between 0.4 and 8 m from object i, divided by the number of points on the corresponding object in the reference cloud. The number of such noise points can be relatively high because their FOM is increased by the real points on the close object. Ne is the number of other noise points, that is, noise points that are not associated with one of the true objects. The first row of numbers corresponds to the noiseless reference point cloud. The actual point counts for the reference cloud are given in the bottom row.
|Parameters||Total||% Correct points||% Noise points|
|3||Actual point counts in ref. cloud:||5383||6597||5958||6||0||0||0||0||0|
Transmitted power 0 dB
In this and the following simulations, noise is included. We define the SNR for an object as the mean return pulse amplitude in the absence of noise divided by the RMS noise after the matched filter. For objects 1 to 4, as defined in Fig. 3, the SNR is 37, 11, 3.5, and 28, respectively.
The detection threshold is first set to 1, which equals the mean return pulse amplitude from the far object. This yields about 75,000 detected pulses, per transmitted pulse. Figure 4(a) shows the range image for FOM threshold 8, which is the value from Eq. (4). The small object is not visible in this case. Its angular extent is only 1.2 mrad, much smaller than the total neighborhood size of , so it does not give rise to enough point candidates to overcome the FOM threshold. It can be recovered by reducing the FOM threshold to 6, as shown in Fig. 4(b), but this also introduces more noise points, as seen in Table 1. The detected parts of the far object depend on speckle patterns and noise, so the details will vary with different realizations of the simulated signal.
To find more points on the far object, the detection threshold is reduced to 0.8. The number of detected pulses rises to , or 2.2 per transmitted pulse, and Fig. 4(c) shows the corresponding range image. As expected when the number of point candidates increases and the parameter for error probability is fixed, the number of noise points increases, as shown in Table 1.
For illustration of the noise suppression, Fig. 5(a) shows the point cloud of a volume around the far object with the same parameters as in Fig. 4(c). This can be compared to Fig. 5(b), where the FOM threshold is 1 so all detected pulses are identified as points. It is clear that the algorithm removes most of the noise points. However, noise points close to the object remain because they acquire a high FOM from the neighboring, correctly detected, points on the object. The number of such noise points can be reduced using a smaller neighborhood size in the range direction, but as explained in Sec. 3, this would also reduce the detection performance for objects with a large variation in range.
Transmitted power −3 dB
At this power level, the SNR for the return pulses from object 3 is 1.75, and they are almost indistinguishable from the noise spikes, as shown in Fig. 6. Nevertheless, the proposed method yields a point cloud where parts of the far object can be seen, as shown in Fig. 7. The pulse detection threshold is 0.8 or 0.7, and the corresponding mean number of detected pulses per transmitted pulse is or 5, respectively. In Fig. 7(d), the angular neighborhood size is increased to 3 mrad in azimuth and pitch. As expected from Eq. (3), using a larger neighborhood improves the SNR, but compared to Fig. 7(a), some more noise points appear along the edges of objects. Furthermore, the small object is difficult to recover in this case because it is much smaller than the neighborhood size.
Transmitted power −4.8 dB
This power corresponds to an SNR of 1.2 for object 3. Figure 8 shows corresponding range images with different FOM thresholds. In Fig. 8(a), with detection threshold 0.8, only small patches of object 3 are detected. Reducing the FOM threshold below the automatic value, in Fig. 8(b), introduces a lot of noise while detection of object 3 is still poor. In Fig. 8(c), the detection threshold is reduced to 0.7. The result is not very much better than in Fig. 8(a), but nevertheless, the fact that a significant part of the far object is correctly detected even under these conditions illustrates the power of the algorithm.
In this section, we show a somewhat more realistic scene consisting of two identical box-like “buildings,” in some cases partially obscured by “forest.” Figure 9(a) shows the point clouds of the buildings when the forest is absent. The two long sides of the buildings are 20 m, and they are 7 m high. The left and right buildings lie in the range intervals 680 to 700 m and 650 to 670 m, respectively, and the lidar looks down from an altitude of 100 m. The reflectance of the buildings and the ground is 0.1. The lidar field of view is 250 mrad in azimuth by 30 mrad in pitch, and other parameters are the same as for scene 1. The transmitted power is 0 dB. The bin size is 1.5 mrad in pitch and azimuth and 5 m in range.
The left building is deliberately placed at a distance where some of the return pulses fall in the discarded intervals after the transmitted pulses. Specifically, the range 690 to 697.5 m corresponds to delays of 4.6 to . Return pulses originating from the first pulse in a pulse group (the pulse followed by the interval) will fall in the interval discarded after the fourth pulse (the pulse before the interval). Thus, one in five return pulses from this range interval will be discarded. Comparison of the two buildings in Fig. 9(a) does indeed show that there are fewer points on the left part of the left building. Because of the large angle of incidence to the roof, this part of the object has relatively few points even under the best conditions, so it is sensitive to a few missing return pulses.
In Fig. 9(b), a partially transmitting “forest” is placed between the lidar and the buildings. The forest is modeled as a screen with holes such that the incident laser beam has equal probability of being transmitted or reflected (with reflectance 0.1). Furthermore, the reflected pulses are given a random delay corresponding to a range variation uniformly distributed in a 10-m interval. With only half of the pulses reaching the buildings, the two walls facing the lidar are still detected clearly, but the point clouds are sparse on the other parts of the buildings. When the FOM threshold is reduced below the automatic setting, as in Fig. 9(c), denser point clouds are obtained.
Generalized Figure of Merit
The FOM measure described in Sec. 2 and used in Sec. 4, which is simply equal to the number of point candidates in the neighborhood, has the advantage of being mathematically simple so a threshold value can be calculated from the statistics, Eq. (4). However, from a practical point of view, it would make sense to give more weight to pulses well above threshold than to pulses just above, which are more likely to be noise. To implement this idea, each point candidate is assigned a quality value , where is the peak voltage of the pulse and is the detection threshold. To avoid that a very strong pulse overwhelms a lot of weaker pulses, is clipped at an upper limit . The new FOM is the sum of the -values from the point candidate itself and its neighbors. It is much more complicated to derive a distribution for the FOM values in this case, so as a practical solution we calculate the threshold as before and multiply it by a correction factor
Consider again Fig. 4(b). The small object is visible, but it is difficult to distinguish from the noise that also appears when the FOM threshold was reduced. However, the small object has higher reflectance than the other objects, so when the pulse power is included in the FOM, it can be recovered, as shown in Fig. 10(a). and the FOM threshold was increased to 9, by Eq. (5).
Similarly, Fig. 10(b) can be compared to Fig. 7(b). The small object is not recovered, but for the other objects the numbers of correct points and close noise points increase while the number of other noise points decreases. To suppress noise, the FOM threshold in Fig. 10(b) was manually tuned to 42 instead of 39 from Eq. (5). Therefore, this example illustrates the limit of the method rather than practical performance.
The angular neighborhood size is a trade-off between sensitivity for large objects and risk of suppressing smaller objects. The improved FOM measure can help detecting small objects, but only if they have relatively high reflectance and not if they are too small compared to the neighborhood size. Detection of both small and large objects could be improved by running the algorithm multiple times with different parameters. Some of the processing could be common, so the run time would not scale linearly with the number of passes. It must be noted that the size of the neighborhood boxes does not limit the resolution or accuracy of the point cloud. The boxes are used to solve ambiguity, whereas the actual point positions are entirely determined by the input data.
As shown in Sec. 4.2, the automatic setting of the FOM threshold can be too high in scenes where objects are partially obscured. The setting in such cases should include application-specific information such as whether objects of interest are likely to be obscured. It would also be possible to detect obscuring objects such as trees and adjust the FOM setting in their shadow adaptively, but this would be a topic for future investigation.
The observed number of noise points outside the neighborhood of any objects, in Table 1, can be compared to the expected number , where is the error probability in Eq. (4), is the number of detected pulses, and is the number of different pulse intervals, i.e., the number of point candidates created for each detected pulse. For detection threshold 0.7 or 0.8 and values calculated with Eq. (4) (marked by * in Table 1), the observed values are indeed consistent with . For higher detection thresholds, the number of noise points is too small to estimate accurately.
Operation with a very low SNR, as in some of our examples, may seem contrived. If the pulse energy from the laser could be adjusted freely, only constrained by a maximum average power, it would be better to transmit stronger pulses at a lower rate. However, real lasers have a maximum pulse energy and a limited range of pulse rates, so operation close to the noise limit may be of interest in practical lidar applications.
The method can of course also be used for noise suppression in a system with fixed pulse intervals. If the pulse interval is long enough compared to the maximum delay of a return pulse, there will be only one point candidate for each pulse and no risk of placing an object at the wrong distance.
It should be mentioned that an additional reason to use varying pulse intervals is to ameliorate the problem with scattering by common optics or by the atmosphere close to the lidar. Such scattering can dwarf weak return pulses arriving immediately after an output pulse. If the pulse interval is fixed, all the lost return pulses will come from distances corresponding to multiples of the pulse interval, so there will be more or less blind zones around these distances. On the other hand, with varying pulse intervals, the lost return pulses will be spread over different distances and not give rise to blind zones.
The current implementation runs on a PC and can take tens of seconds to process images with millions of initial point candidates, such as Fig. 8, but this program has not been optimized. In addition to improving the code, there is potential to save time by simplification and parallelization. One possible simplification is to group point candidates into cells in a 3-D grid and work with the grid cells instead of individual points. A disadvantage of this method is that the ability to detect an object will depend on how it is placed and aligned with respect to the grid. For parallelization, the 3-D grid could be partitioned into blocks, which could be processed almost independently.
We thank our colleague Gunnar Rustad and the reviewers for valuable suggestions to improve the paper.
Gunnar Arisholm received his BSc degree in computer science from University of Strathclyde in 1987 and joined FFI the same year. At FFI he changed subjects to lasers and nonlinear optics, and in 2000 he received his PhD in physics from the University of Oslo. His current interests include optical frequency conversion, lidar and atmospheric beam propagation.