This paper presents an efficient VLSI architecture for the intra prediction of the H.264 video compression standard. To
address the computational complexity issue, we propose a dedicated processor that can compute multiple intra prediction
modes in parallel. The proposed architecture accelerates the intra coding process. It can support large video format at
high frame rate in real-time.
This paper introduces an adaptive approach for image scaling. In addition, we present an efficient VLSI architecture to implement the proposed algorithm in hardware. The proposed architecture is designed to address the real-time constrain for high performance consumer products. A case study for printer application is presented.
H.264 is the latest video compression standard. Its rate distortion is greatly improved comparing to the MPEG-1, MPEG-2, MPEG-4, H.261 and H.263. Among many features of H.264, sub-pixel motion compensation is one of the factors that make H.264 a better coding scheme. H.264 implements both half-pixel interpolation and quarter-pixel interpolation. The computational complexity of sub-pixel motion compensation is therefore high. This paper presents an efficient VLSI architecture for fast implementation of sub-pixel interpolation of H.264. Several techniques are designed to reduce the number of memory access and accelerate the interpolation computations.
This paper presents an efficient VLSI architecture and a low complexity implementation of BinDCT coprocessor for wireless video application. The coprocessor architecture was implemented in VHDL and was synthesized with 0.18 mm CMOS technology. The footprint of the 2-D BinDCT coprocessor, which includes memory buffer, is 0.1173 mm2. The BinDCT coprocessor can calculate video in CIF format at 30 frames per second at 5 MHz clock rate with 1.55-volt power supply. The BinDCT coprocessor dissipates 12.05 mW. With its fast transform, compact size and low power consumption, the BinDCT coprocessor is an excellent candidate for DCT-based wireless multimedia coding systems.
This paper presents a VLSI architecture and an efficient implementation of an embedded transform coprocessor for H.264 video compression standard. The proposed coprocessor was designed to work with an ARM946E-S processor. To enhance the performance, both data parallelism and pipelined architecture are utilized in the design. In this study, coprocessor was synthesized with 0.18 μm CMOS technology and its footprint is only 0.0838 mm2. Coprocessor can calculate 2-D transform for a macroblock in 30 clock cycles. The 2-D transform coprocessor dissipates 529 μW with 1.55-volt power supply at 10 MHz clock rate.
In this paper, we present an architecture of a color halftoning coprocessor. The design is based on a software/hardware design approach in which the flexibility and adaptability of the programmable processor and the high performance, low power of ASIC design are utilized. We employ the concurrency and locality concepts in computer architecture to address the computational intensive and data intensive issues of the color halftoning algorithm. Both instruction parallelism and data parallelism are exploited to speed up the performance. In addition, the fine-grain and middle-grain
instruction level parallelism (ILP) are utilized to accelerate the computation in the color error diffusion halftoning process.
This paper presents an efficient VLSI implementation of a lifting coprocessor for mobile multimedia applications. To reduce the hardware complexity, we designed and implemented rational lifting coefficients. This approach allows the floating-point arithmetic units to be replaced by the integer arithmetic units in the design. Consequently, footprint and power consumption of the coprocessor are reduced. To improve the throughput of system, a fully pipelined parallel architecutre is designed. With the rational coefficients and parallel approaches, the proposed lifting coprocessor provides efficient computing power but requires very low power consumtion. The lifitng scheme coprocessor was implemented in VHDL using HCMOS8D 0.18 μm technology. It can run at 25 MHz withthe power supply of 1.55 volt and requires only 1.191 mW.
In this paper, we present an implementation of the IDEA algorithm for image encryption. The image encryption is incorporated into the compression algorithm for transmission over a data network. In the proposed method, Embedded Wavelet Zero-tree Coding is used for image compression. Experimental results show that our proposed scheme enhances data security and reduces the network bandwidth required for video transmissions. A software implementation and system architecture for hardware implementation of the IDEA image encryption algorithm based on Field Programmable Gate Array (FPGA) technology are presented in this paper.
Information technology has made major strides in the past decade. As results, there have been widespread applications of data storage and transmission. The valuable multimedia information in digital forms, however, is vulnerable to unauthorized access while in storage and during the transmission. Network security and image encryption become important and high profile issues. Image encryption requires manipulating massive amounts of data at high speeds. The use of software in image encryption provides flexibility for manipulation but may not meet some timing constraints. In this paper, we present a novel technique for image encryption using block cipher. The primary concept is based on the implementation of 3-Way encryption algorithm. Experiment results show that our proposed method significantly enhances the security for transmission images over network as well as for storage. Beside the software implementation, we present the hardware implementation of 3-way image encryption algorithm based on FPGA technology. With the flexibility of software implementation and the high performance of microprocessor, FPGA-based cryptosystem is a promising technology for the future network security.
The transmission of real-time images and video over wireless communication channels is still a challenge problem. Digital compressed images are sensitive to bit errors which are typical in wireless communications. Moreover, the bandwidth at the air interface is currently a limiting factor because the first and second generation of mobile phone standards mainly support voice communications. In this paper, we present our study of real-time image traffic over a radio link -- the aim of this research is for videophone applications. In this study, we use the Discrete Wavelet Transform (DWT) to compress images and a Code Division Multiple Access (CDMA) link to transfer images over wireless communication channels. The results of the experiment show that it is possible to transfer 4 QCIF images per second over a CD MA link with minor degradation in image quality. This study was investigated by the VLSI Signal, Image and Video Processing Research Laboratory at the University of California, San Diego (UCSD).
In the last few years, there has been a great deal of effort invested in the fields of discrete wavelet transform (DWT) by the scientific community. DWT associated with vector quantization has been proved to be an invaluable tool for image compression. The DWT, however, is very computationally intensive process. There is a need to investigate innovative and computationally efficient architectures to obtain the image compression in real time. In this paper, we present a novel, robust, and regular architecture to implement the DWT for image compression. Beside the performance, the architecture takes into account data format, power, hardware cost and scalability issues rising form realistic operating conditions.
Proc. SPIE. 3663, Medical Imaging 1999: Image Perception and Performance
KEYWORDS: Digital signal processing, Clocks, Image processing, Digital filtering, Field programmable gate arrays, Linear filtering, Gaussian filters, Image filtering, Parallel computing, Nonlinear filtering
Digital images corrupted with noise regularly require different filtering techniques to optimally correct the image. Software provides convenience for implementing a variety of different filters, but suffers a speed penalty due to its serial nature of the filter calculations. In converse fashion, implementation using ASIC technology allows for a speed advantage due to parallel processing but at the cost of increased hardware overhead for implementing a variety of filters individually. Advances in Field Programmable Gate Array (FPGA) technology offers a middle ground in which the speed advantages of an ASIC and the reprogrammable aspect of a general purpose conventional CPU or DSP software approach are combined. In this paper, we present an FPGA-based, reconfigurable system, that can perform an assortment of noise filtering algorithms using the same hardware. Implementation of Gaussian and salt-and-pepper noise are evaluated for this system.
Proc. SPIE. 3652, Machine Vision Applications in Industrial Inspection VII
KEYWORDS: Target detection, Edge detection, Digital signal processing, Detection and tracking algorithms, Sensors, Image segmentation, Image processing, Field programmable gate arrays, Image analysis, Intelligence systems
Deconstructing an image based upon it parts poses a challenge to image analysis that may be solved using adaptive algorithms. The presence of occlusion or image rotation makes template matching difficult. Image segmentation techniques can be used to discriminate between objects via feature synthesis using deformable templates. This paper describes modifications to existing techniques commonly used to do real-time image segmentation for efficient hardware implementation. Edge detection and edge direction finding techniques may be used within the context of deformable templates for real-time automatic target recognition and tracking.
Image processing algorithms are suitable for reconfigurable architectures due to their matrix structures, inherent parallelism and need for flexibility and processing speed. This paper describes a method to implement feature detection on the ReConfigurable Processor (RCP). The RCP is an FPGA- based system, which was built by the VLSI-RCP Research Group at UCSD and L3 Communications. The design is based on the Altera FLEX 10K70. The architecture used to implement feature detector on RCP, software and hardware implementation will be discussed.