29 September 2021 Distributed compression and decompression for big image data: LZW and Huffman coding
Rohan Kishor Netalkar, Hillol Barman, Rushik Subba, Kandula Venkata Preetam, Undi Surya Narayana Raju
Author Affiliations +
Abstract

In today’s era, digital data are being created and transmitted primarily in the form of images and videos. Storing such a huge number of images and transmitting them requires a lot of computer resources such as storage and bandwidth. So, instead of storing the image data as is, if we compress and store it, it saves a lot of resources. Image compression is the act of removing the maximum possible redundant data from an image and maintaining only the non-redundant data. To compress and decompress such big image data, a distributed environment with a map-reduce paradigm using Hadoop distributed file system and Apache Spark is used. In addition to these, Microsoft Azure cloud environment with infrastructure as a service is also used. Various setups such as a single system, 1 + 4 node cluster, 1 + 15 node cluster, and 1 + 18 node cluster cloud infrastructure are used to show the time comparisons among these setups with the self-created large image dataset. On these four self-made clusters, more than 100 million (109,670,400) images are compressed and decompressed; the execution times are compared with two of the traditional image compression methods: Lempel-Ziv-Welch (LZW) and Huffman coding. Both the LZW and Huffman coding are lossless image compression techniques. LZW removes both spatial and coding redundancies and whereas the Huffman coding removes only coding redundancy. These two compression techniques: LZW and Huffman are just placeholders, these can be replaced with any other compression technique for large image data. In our work, we have used compression ratio, average root mean square error (ARMSE), and average peak signal to noise ratios to validate that the compression and decompression process for each technique is exactly the same irrespective of the number of systems used, distributed or not.

© 2021 SPIE and IS&T 1017-9909/2021/$28.00© 2021 SPIE and IS&T
Rohan Kishor Netalkar, Hillol Barman, Rushik Subba, Kandula Venkata Preetam, and Undi Surya Narayana Raju "Distributed compression and decompression for big image data: LZW and Huffman coding," Journal of Electronic Imaging 30(5), 053015 (29 September 2021). https://doi.org/10.1117/1.JEI.30.5.053015
Received: 1 July 2021; Accepted: 15 September 2021; Published: 29 September 2021
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image compression

Image processing

Data processing

Image storage

Clouds

Data storage

Distributed computing

Back to Top