This paper proposes an algorithm that is based on the application of Algebraic Integer (AI) representation of numbers on
the AAN fast Inverse Discrete Cosine Transform (IDCT) algorithm. AI representation allows for maintaining an error-free
representation of IDCT until the last step of each 1-D stage of the algorithm, where a reconstruction step from the AI
domain to the fixed precision binary domain is required. This delay in introducing the rounding error prevents the
accumulation of error throughout the calculations, which leads to the reported high-accuracy results. The proposed
algorithm is simple and well suited for hardware implementation due to the absence of computationally extensive
multiplications. The obtained results confirm the high accuracy of the proposed algorithm compared to other fixed-point
implementations of IDCT.
A fast model-order reduction algorithm is proposed for microelectromechanical devices. By breaking the system
matrices obtained from FEM methods down into smaller ones, the proposed algorithm will reduce computational
cost and memory requirements required by the inversion operation of the system matrix. As an example,
experimental studies are presented for a linear-drive multiple-mode resonator demonstrating that predicted
results are in very good agreement with results from previous publications.
The Continuous Valued Number System (CVNS) is a novel analog digit number system which employs bit level analog residue arithmetic. The information redundancy among the digits, makes it easy to perform the required binary operations in higher radices, and reduces the implementation area and the number of required interconnections. CVNS theory can open up a new approach for performing digital arithmetic with simple and elementary analog elements, such as current comparators and current mirrors, and with arbitrary precision. In this paper we discuss the design of 16-bit radix-4 CVNS adder with controlled precision, and a two operand binary adder designed, in TSMC CMOS 0.18μm technology, is used to illustrate the techniques.
Multidimensional logarithmic number system (MDLNS) is a recently developed number representation that is very efficient for implementing the Inner Product Step Processor (IPSP). The MDLNS provides more degrees of freedom than the classical LNS by virtue of the orthogonal bases and ability to obtain reduction of hardware complexity from the use of multiple digits. This paper presents an analysis of errors introduced in data mapping from real numbers to 2-dimentional LNS (2-DLNS). Due to non-uniform error distribution, mapping space is divided by pre-assigned segments, where error performance can be uniquely characterized. Mapping errors are collected piece-wisely over all of the segments. In 1-digit 2-DLNS, error collection can be simplified by using pattern-matching scheme. Expressions for error variance are derived. It is shown that the use of a 2-DLNS representation results in significant lower error variance compared to floating-point number systems. The hardware complexity required with the error performance comparable to classic LNS can be significantly reduced due to smaller size of ROMs compared with LNS. The results of the error analysis have been verified by numerical simulations.
Quantum-dot Cellular Automata (QCA) is a nanotechnology which has
potential applications in future computers. In this paper, a
method for reducing the number of majority gates (a QCA logic
primitive) is developed to facilitate the conversion of SOP
expressions of three-variable Boolean functions into QCA majority
logic. Thirteen standard functions are proposed to represent all
three-variable Boolean functions and the simplified majority
expressions corresponding to these standard functions are
presented. By applying this method, a one-bit QCA adder, with only
three majority gates and two inverters, is constructed. We will
show that the proposed method is very efficient and fast in
deriving the simplified majority expressions in QCA design.
Time delay and integration (TDI) is a technology used in line-scan cameras to improve moving image quality. As an image sweeps over the sensor array, the pixels collect charge; at certain intervals the charge in the wells in each of the rows is moved to the adjacent rows, in the same direction and velocity as the moving image. TDI sensors help provide high quality and contrast images even under low illumination providing the image speed is same as the speed of the charge movement. In this paper we model the TDI process by treating it as a discrete time sampler and, using this model, we develop several simple algorithms that are able to self-synchronize the TDI row charge movement based solely on the output of the TDI sensor itself rather than on an external encoder. The algorithms are simple enough to be implemented on a small size FPGA.
The conventional trend in algorithm implementation has been the reliance on advancements in process technology in order to satisfy the ever-increasing demand for high-speed processors, and computational systems. As current device technology approaches sub-100nm minimum device size, not only does the device geometry decrease, but switching times, and operating voltages also scale down. These gains come at the expense of increased layout complexity, and a greater susceptibility to parasitic effects in the interconnections. In this paper we will briefly overview the challenges that digital designers will have to face in the imminent future, and will provide suggestions on algorithmic measures which may be taken in order to overcome some of these obstacles. To
illustrate our point, we will present an analysis of a digital multiplication algorithm, which is predicted to outperform current
schemes, for future technologies.
The 2-Dimensional Wavelet Transform has been proven to be a highly effective tool for image analysis and used in JPEG2000 standard. There are many publications which demonstrate that using wavelet transform in time and space, combined with a multiresolution approach, leads to an efficient and effective method of compression. In particular, the four and six coefficient Daubechies filters have excellent spatial and spectral locality, properties which make them useful in image compression. In this paper, we propose a multiplication-free and parallel VLSI architecture for Daubechies wavelets where the computations are free from round-off errors until the final reconstruction step. In our algorithm, error-free calculations are achieved by the use of Algebraic Integer encoding of the wavelet coefficients. Compared to other DWT algorithms such as: embedded zero-tree, recursive or semi-recursive and conventional fixed-point binary architecture, our technique has lower hardware cost, lower computational power and optimized data-bus utilization.
Proc. SPIE. 4791, Advanced Signal Processing Algorithms, Architectures, and Implementations XII
KEYWORDS: Optical filters, Digital signal processing, Signal attenuation, Digital filtering, Computing systems, Finite impulse response filters, Quantization, Associative arrays, Computer architecture, Binary data
We introduce the use of multidimensional logarithmic number system (MDLNS) as a generalization of the classical 1-D logarithmic number system (LNS) and analyze its use in DSP applications. The major drawback of the LNS is the requirement to use very large ROM arrays in implementing the additions and subtraction and it limits its use to low-precision applications. MDLNS allows exponential reduction of the size of the ROMs used without affecting the speed of the computational process; moreover, the calculations over different bases and digits are completely independent, which makes this particular representation perfectly suitable for massively parallel DSP architectures. The use of more than one base has at least two extra advantages. Firstly, the proposed architecture allows us to obtain the final result straightforwardly in binary form, thus, there is no need of the exponential amplifier, used in the known LNS architectures. Secondly, the second base can be optimized in accordance to the specific digital filter characteristics. This leads to dramatic reduction of the exponents used and, consequently, to large area savings. We offer many examples showing the computational advantages of the proposed approach.
A new and efficient number theoretic algorithm for evaluating signs of determinants is proposed. The algorithm uses computations over small finite rings. It is devoted to a variety of computational geometry problems, where the necessity of evaluating signs of determinants of small matrices often arises.
Proc. SPIE. 4116, Advanced Signal Processing Algorithms, Architectures, and Implementations X
KEYWORDS: Signal to noise ratio, Digital signal processing, Signal attenuation, Interference (communication), Chromium, Calculus, Signal processing, Nonlinear optics, Very large scale integration, Binary data
This paper discusses the use of a recently introduced index calculus Double-Base Number System (IDBNS) for representing and processing numbers for non-linear digital signal processing; the target application is a digital hearing aid processor. The IDBNS representation uses 2 orthogonal bases (2 and 3) to represent real numbers with arbitrary precision. By restricting the number of digits to one or two, It is possible to efficiently represent the real number using the indices of the bases rather than the distribution of the digits. In this paper we discuss the use of the two-digit form of this representation (2-IDBNS) to efficiently perform arithmetic associated with the non-linear processing required to correct the usual forms of hearing loss in a digital hearing aid. The non-linear processing takes the form of dynamic range compression as a function of frequency band. Currently developed digital hearing instrument processors require large dynamic range representations (20 - 24 bits) in order to accurately generate the dynamic range compression associated with typical hearing loss. We show that the natural non-linear representation afforded by the IDBNS provides both a more efficient signal representation and a more efficient technique for processing the dynamic range compression. We pay particular attention to a novel technique of converting from a linear binary input directly to the 2-IDBNS representation using an observation of partial cyclic repetition in the indices along with near unity approximants.
The design of two microelectromechanical (MEMS) devices that form pat of a micro acousto-magnetic transducer for use with a hearing-aid instrument is described in this paper. The transducer will convert acoustical energy into an electrical signal using a MEMS realization of a capacitive microphone. The output signal from the microphone undergoes signal conditioning and processing in order to drive a MEMS electromagnetic actuator. The resultant magnetic fid is used to exert a force on a high coercivity permanent micro magnet that has been implanted on the round window of the cochlea. The motion of the implanted magnet will develop traveling waves on the basilar membrane inside the cochlea to give a hearing capability. A high-sensitivity MEMS based capacitor microphone is designed using a polysilicon Germanium diaphragm. The microphone is constructed using a combination of surface and bulk micro machining techniques, in a single wafer process. The microphone diaphragm has a proposed thickness of 0.7 micrometers , an area of 2.6 mm2, an air gap of 3.0 micrometers and a 1 micrometers thick silicon nitride backplate with acoustical ports. An output voltage signal is obtained from the capacitor microphone using a capacitive voltage divider network and amplified by a simple source follower circuit. D
This paper presents novel methods of designing analog Cellular Nonlinear (Neural) Networks (CNNs) to implement very low-noise binary addition. In these techniques the continuous characteristic of the current that charges (discharges) the load capacitor, leads to a virtually switching free addition process that significantly reduces the switching noise. This switching mechanism also leads to higher slew of output voltage during the transitions which in turn reduces the cross talk. Simulation results demonstrate a three orders of magnitude reduction in the noise generated by this structure compared to that generated by a digital adder running at the same speed. This very good noise performance of these new adder structures makes them suitable choices for low to moderate speed high precision mixed signal applications.
In this paper, we propose a training algorithm for VLSI neural networks with digital weights and analog neurons using in-the-loop training strategy. The use of digital weights in a neural network implementation imposes new issues that are not present in simulation environments. One of the problems is that a neural network implementation will not work properly when using the digitized version of the continuous weight solution. This phenomenon is especially evident when the digital weight resolution is very low due to some fabrication constraints. In this paper the training strategies for dealing with digital weights are investigated. The proposed training algorithm is by measuring the sensitivity of each weight to its error function and then by perturbing the weights of higher sensitivity values to perform retraining process. Our experimental results indicate that the algorithm is feasible and particularly suitable for the digital weights with low number of bits.
In this paper a new approach to image recognition using feature extraction based on a revised nearest neighbor clustering method is described. A set of candidate feature vectors are formed by using the Gabor transform of the sample image to compute a number of Gabor kernels with different frequency and orientation parameters. Each of the candidate feature vectors is then sequentially inputted to a self- organizing neural network architecture that is used in conjunction with a revised nearest-neighbor algorithm. The revised nearest-neighbor method assigns an input vector to the nearest prototype (code book vector) when the distance between them is found to be within a preset threshold, and creates a new prototype when the distance is larger than the preset threshold value. The distance computation is conducted by measuring the saliency among the vectors of interest, which differs from traditional norms (e.g. Euclidean norm). Simulation results show that the proposed method is efficient in extracting feature vectors from images. These feature vectors are representative of the image and can be applied to image identification. The novelty associated with this work lies in the use of the saliency of feature vectors as the distance norm and a growing cell self-organizing structure to capture the feature vectors.
Proc. SPIE. 3205, Machine Vision Applications, Architectures, and Systems Integration VI
KEYWORDS: Digital signal processing, Defect detection, Inspection, Computing systems, Field programmable gate arrays, Control systems, Process control, Machine vision, Human vision and color perception, Environmental sensing
One of the aims of industrial machine vision is to develop computer and electronic systems destined to replace human vision in the process of quality control of industrial production. In this paper we discuss the development of a new design environment developed for real-time defect detection using reconfigurable FPGA and DSP processor mounted inside a DALSA programmable CCD camera. The FPGA is directly connected to the video data-stream and outputs data to a low bandwidth output bus. The system is targeted for web inspection but has the potential for broader application areas. We describe and show test results of the prototype system board, mounted inside a DALSA camera and discuss some of the algorithms currently simulated and implemented for web inspection applications.
In this review paper we discuss selected issues associated with the implementation of arithmetic for VLSI Digital Signal Processors. We start with a Silicon Technology Roadmap view of the next decade, in order to grasp some of the issues facing the next generation of VLSI designers, particularly associated with high performance DSP systems. We use this roadmap to open the discussion on the role basic arithmetic operations play in the construction of DSP systems; in particular we look at the interplay between algorithms, architecture, arithmetic representation and circuit implementation. Many of the illustrative examples are taken from work conducted in the VLSI Research Group, University of Windsor over the past few years, including on- going work.
In this paper we explore a new number system which uses a double base. The representation of the numbers has a very simple geometric interpretation, allowing potentially fast implementation of the basic arithmetic operations. The transformation of the integers into minimal form, however, leads to some problems associated with transcendental number theory, and we identify and open the discussion on these problems. An intriguing implementation vehicle, which has some of the properties associated with symbolic substitution in optical computing, is the use of Cellular Neural Network (CNNs) to perform digital reduction. Brief details are presented on CNN implementation, and a system-level example is shown in order to justify the applicability of the proposed theory in digital signal processing.
Automated machine vision systems are now widely used for industrial inspection tasks where video-stream data information is taken in by the camera and then sent out to the inspection system for future processing. In this paper we describe a prototype system for on-line programming of arbitrary real-time video data stream bandwidth reduction algorithms; the output of the camera only contains information that has to be further processed by a host computer. The processing system is built into a DALSA CCD camera and uses a microcontroller interface to download bit-stream data to a XILINXTM FPGA. The FPGA is directly connected to the video data-stream and outputs data to a low bandwidth output bus. The camera communicates to a host computer via an RS-232 link to the microcontroller. Static memory is used to both generate a FIFO interface for buffering defect burst data, and for off-line examination of defect detection data. In addition to providing arbitrary FPGA architectures, the internal program of the microcontroller can also be changed via the host computer and a ROM monitor. This paper describes a prototype system board, mounted inside a DALSA camera, and discusses some of the algorithms currently being implemented for web inspection applications.
Inspection systems for wide web materials have been unable to effectively image fine defects as they are detected. The amount of data produced by highly parallel video inspection cameras can exceed 400 MBytes/sec. The system described in this paper is capable of analyzing and displaying a detected image within seconds of the event using a single frame grabber and a 386 computer. The system can operate at processing speeds of greater than 400 MBytes/sec since it makes use of a novel post processing algorithm within the camera itself. The video cameras are based on Time Delay and Integration technology to provide high grey scale resolution at high data rates and low light levels. The system has an adjustable resolution ranging from 256 to 24,000 pixels per line scanned. The scanning rate is adjustable to a maximum of 20,000 line scans per second.
Redundant Residue Number Systems (RRNS) have been proposed as suitable candidates for fault tolerance in compute intensive applications. The redundancy is based on multiple projections to moduli sub-sets and conducting a search for results that lie in a so-called illegitimate range. This paper presents RRNS fault tolerant procedures for a recently introduced finite polynomial ring mapping procedure (modulus replication RNS). The mapping technique dispenses with the need for many relatively prime ring moduli, which is a major draw-back with conventional RRNS systems. Although double, triple, and quadrupole modular redundancy can be implemented in the polynomial mapping structure, polynomial coefficient circuitry, or the independent direct product ring computational channels, for error detection and/or correction, this paper discusses the implementation of redundant rings which are generated by (1) redundant residues, (2) spare general computational channels, or (3) a combination of the two. The first architecture is suitable for RNS embedding in the MRRNS, and the second for single moduli mappings. The combination architecture allows a trade-off between the two extremes. The application area is in fault tolerant compute intensive DSP arrays.
Inspection systems for wide web materials have been unable to effectively image fine defects as they are detected. The amount of data produced by highly parallel video inspection cameras can exceed 400 MBytes/sec. The system described in this paper is capable of analyzing and displaying a detected image within seconds of the event using a single frame grabber and a 386 computer. The system can operate at processing speeds of greater than 400 MBytes/sec since it makes use of a novel post processing algorithm within the camera itself. The video cameras are based on TDI (Time Delay and Integration) technology to provide high grey scale resolution at high data rates and low light levels. The system has an adjustable resolution ranging from 256 to 24,000 pixels per line scanned. The scanning rate is adjustable to a maximum of 20,000 lines scans per second.
PC-based inspection systems for wide web materials have been unable to effectively image fine defects as they are detected. The amount of data produced by highly parallel video inspection cameras can exceed 400 MBytes/sec. The system described in this paper is capable of analyzing and displaying a detected image within seconds of the even using a single frame grabber and a 386 computer. The system can operate at processing speeds of greater than 400 MBytes/sec since it makes use of a novel post processing algorithm within the camera itself. The video cameras are based on TDI (Time Delay and Integration) technology to provide high grey scale resolution at high data rates and low light levels. The system has an adjustable resolution ranging from 2000 to 24,000 pixels per line scanned. The scanning rate is adjustable to a maximum of 20,000 line scans per second.
This paper explores novel techniques involving number theoretic concepts to perform real-time digital signal processing for high bandwidth data stream applications in digital signal processing. Often the arithmetic manipulations are simple in form (cascades of additions and multiplications in a well defined structure) but the numbers of operations that have to be computed every second can be large. This paper discusses ways in which new number theoretic mapping techniques can be used to perform DSP operations by both reducing the amount of hardware involved in the circuitry and by allowing the construction of very benign architectures down to the individual cells. Such architectures can be used in aggressive VLSI/ULSI implementations. We restrict ourselves to the computation of linear filter and transform algorithms, with the inner product form, which probably account for the vast majority of digital signal processing functions implemented commercially.