Distributed compression and decompression for big image data: LZW and Huffman coding

Rohan Kishor Netalkar; Hillol Barman; Rushik Subba; Kandula Venkata Preetam; Undi Surya Narayana Raju

doi:10.1117/1.JEI.30.5.053015

29 September 2021 Distributed compression and decompression for big image data: LZW and Huffman coding

Rohan Kishor Netalkar, Hillol Barman, Rushik Subba, Kandula Venkata Preetam, Undi Surya Narayana Raju

Author Affiliations +

Journal of Electronic Imaging, Vol. 30, Issue 5, 053015 (September 2021). https://doi.org/10.1117/1.JEI.30.5.053015

Abstract

In today’s era, digital data are being created and transmitted primarily in the form of images and videos. Storing such a huge number of images and transmitting them requires a lot of computer resources such as storage and bandwidth. So, instead of storing the image data as is, if we compress and store it, it saves a lot of resources. Image compression is the act of removing the maximum possible redundant data from an image and maintaining only the non-redundant data. To compress and decompress such big image data, a distributed environment with a map-reduce paradigm using Hadoop distributed file system and Apache Spark is used. In addition to these, Microsoft Azure cloud environment with infrastructure as a service is also used. Various setups such as a single system, 1 + 4 node cluster, 1 + 15 node cluster, and 1 + 18 node cluster cloud infrastructure are used to show the time comparisons among these setups with the self-created large image dataset. On these four self-made clusters, more than 100 million (109,670,400) images are compressed and decompressed; the execution times are compared with two of the traditional image compression methods: Lempel-Ziv-Welch (LZW) and Huffman coding. Both the LZW and Huffman coding are lossless image compression techniques. LZW removes both spatial and coding redundancies and whereas the Huffman coding removes only coding redundancy. These two compression techniques: LZW and Huffman are just placeholders, these can be replaced with any other compression technique for large image data. In our work, we have used compression ratio, average root mean square error (ARMSE), and average peak signal to noise ratios to validate that the compression and decompression process for each technique is exactly the same irrespective of the number of systems used, distributed or not.

Citation Download Citation

Rohan Kishor Netalkar, Hillol Barman, Rushik Subba, Kandula Venkata Preetam, and Undi Surya Narayana Raju "Distributed compression and decompression for big image data: LZW and Huffman coding," Journal of Electronic Imaging 30(5), 053015 (29 September 2021). https://doi.org/10.1117/1.JEI.30.5.053015

Received: 1 July 2021; Accepted: 15 September 2021; Published: 29 September 2021

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE