In today’s era, digital data are being created and transmitted primarily in the form of images and videos. Storing such a huge number of images and transmitting them requires a lot of computer resources such as storage and bandwidth. So, instead of storing the image data as is, if we compress and store it, it saves a lot of resources. Image compression is the act of removing the maximum possible redundant data from an image and maintaining only the non-redundant data. To compress and decompress such big image data, a distributed environment with a map-reduce paradigm using Hadoop distributed file system and Apache Spark is used. In addition to these, Microsoft Azure cloud environment with infrastructure as a service is also used. Various setups such as a single system, 1 + 4 node cluster, 1 + 15 node cluster, and 1 + 18 node cluster cloud infrastructure are used to show the time comparisons among these setups with the self-created large image dataset. On these four self-made clusters, more than 100 million (109,670,400) images are compressed and decompressed; the execution times are compared with two of the traditional image compression methods: Lempel-Ziv-Welch (LZW) and Huffman coding. Both the LZW and Huffman coding are lossless image compression techniques. LZW removes both spatial and coding redundancies and whereas the Huffman coding removes only coding redundancy. These two compression techniques: LZW and Huffman are just placeholders, these can be replaced with any other compression technique for large image data. In our work, we have used compression ratio, average root mean square error (ARMSE), and average peak signal to noise ratios to validate that the compression and decompression process for each technique is exactly the same irrespective of the number of systems used, distributed or not. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 1 scholarly publication.
Image compression
Image processing
Data processing
Image storage
Clouds
Data storage
Distributed computing