Research on Convolutional Neural Network-Based Compression Methods for Multispectral Images

Considering the traits of multispectral images, which include numerous bands, high spatial and spectral redundancy, and a large data volume, there is a proposed investigation on compression methods based on convolutional neural networks to reduce the storage space consumed by an individual image and enhance compression effectiveness. This paper first introduces the development of image compression algorithms and deep learning in recent years. Based on these two structures, a framework for lossy compression of multispectral images utilizing an end-to-end convolutional neural network is proposed. A self-encoding structure is used to process three-dimensional hyperspectral images, extracting local spectral features and fusing spectral information using large convolutional kernels. Residual layers are employed to preserve spectral information. Rate-distortion optimization is performed to jointly optimize image distortion and compression bitrate. Finally, a comparison with the traditional JPEG method is conducted experiments to assess the efficacy of the proposed algorithm. The MS-SSIM is improved by nearly 0.08, and the compressed images exhibit no noticeable distortion.


Introduction
The last few years, the advancement of hyperspectral sensor technology has been rapid, multispectral cameras have gradually been applied to our daily lives.Multispectral images contain a wider range of spectral information and can provide observation capabilities beyond human visual abilities.However, this also leads to increasingly large storage space requirements for these images.The huge data volume poses significant challenges to image storage, transmission, management, and applications, which hampers the application and development of multispectral imaging technology.Therefore, highperformance multispectral image compression algorithms have become critically important.Traditional compression algorithms for multispectral images can be broadly classified into three categories: coding method based on prediction [1], methods based on vector quantization [2], and image Compression Methods Based on Transforms [3].Traditional methods all have their limitations.Prediction-based encoding methods have relatively low compression ratios, and the compression ratios may vary significantly between different images.Transform-based compression methods offer adjustable compression ratios and higher compression performance, but they may introduce block artifacts in multispectral images, which affect image quality.The complexity of vector quantization methods is relatively high, limiting their application on spectral images.Recently, driven by the success of deep convolutional networks in lossy compression of natural images [4], studies have begun to explore the use of deep mining in the compression of images.An end-to-end multispectral image compression method based on Convolutional Neural Networks is presented in this paper.In this method, the entire three-dimensional data is input into the network, and the bitstream is obtained through the encoder, quantizer, and entropy encoder.The decompressed image is then obtained through the decoder.The entire network optimization follows a rate-distortion optimization approach.By combining residual layers and convolutional layers, spatial and spectral features are effectively extracted, enabling the learning of compact representations for hyperspectral images.This approach significantly improves the compression ratio.Finally, by comparing the compression ratios and reconstruction quality of different compression methods for multispectral images, the proposed method was validated for its performance.

Research on Convolutional Neural Network-Based Compression Methods for Multispectral Images
The approach of multi-spectral picture compression is implemented using an end-to-end convolutional neural network comprising four components, namely, auto-encoder, a quantized construct, an entrepreneurial entropic editor, and a frequency-distortion curve.In this paper, the improvements of the end-to-end convolutional neural network image compression algorithm over classical convolutional neural network image compression algorithms are reflected in two aspects:  The quantization structure adopts multi-level quantization to integer coefficients, which improves the efficiency of quantization;  A Gaussian mixture model is used for entropy encoding.Compared to a single Gaussian model, it has a more powerful ability to approximate distributions.

auto-encoder
Auto-encoder composed by a code reader and a demodulator.[5][6][7].An encoder has been utilized to retrieve the feature message from the image and reduce its dimensionality.The autoencoder mainly includes convolutional layers, GDN activation functions, and LeakyReLU activation functions.The architecture of the autoencoder system is depicted in Figure 1.

Figure 1.
Network Structure of the Autoencoder.The "input" represents the input data, and "conv" represents the convolutional layer.The convolutional layer performs downsampling operations on the image.The GDN activation function is used to introduce non-linear relationships between the layers of the convolutional neural network, while the LeakyReLU activation function is applied to enhance the non-linear relationships between the convolutional layers.The structure of the decoder in the autoencoder is completely symmetric to that of the encoder.The decoder is responsible for reconstructing the feature image generated by the encoder and converting it back to the original image.

quantized construct
The obtained feature maps of the multispectral image need to undergo quantization.The compression of multispectral images can result in information loss due to the quantization process.Therefore, the quality of reconstructed images greatly depends on an efficient quantization structure.In this study, we utilize a multi-base quantization approach for converting coefficients into integers [8].By employing this method, we aim to minimize information loss during quantization and enhance the efficiency of end-to-end training.Since the quantization process is non-differentiable, the quantization structure incorporates uniform noise to simulate the process.This ensures gradient propagation and enables differentiability throughout the quantization process..

entrepreneurial entropic editor
After feature extraction and quantization through the autoencoder, there may still be residual redundancies in the multispectral image.In order to enhance the coding performance and eliminate redundancies, it becomes imperative to employ an efficient entropy coding stage.This paper explores the utilization of a Gaussian Mixture Model (GMM) is employed for entropy estimation [9].The distribution function of the GMM is represented as equation 1: denotes the power of various Gaussian projects, and K denotes the varying Gaussian targets , , , Σ represents the Gaussian distribution parameters of the models and () represents the entropy coding results.

frequency-distortion curve
In the context of end-to-end coding, the joint tuning of image distortion and compression code rate is referred to as rate-distortion optimization.The effectiveness of the entire structure heavily relies on accurate estimation of code rate and image distortion.Therefore, in order to optimize the compression network for hyperspectral images, we need to carefully address these factors, rate-distortion optimisation is used in this paper to trade-off the bit rate and image quality, as shown in equation 2: D uses the mean square error, which is a balancing factor, and R represents the rate loss.The distortion term is calculated as shown in equation 3: Where x n is the input 3D image, y n is the recovered 3D hyperspectral image and N denotes the batch_size.the entropy rate is calculated as shown below: is the probability density function of the continuous distribution obtained after spline interpolation of the intermediate feature map.The more sampling points, the more accurate the melting rate estimation.

Experimental Results and Analysis
The CAVE multispectral dataset was used in this study.The dataset consists of 31 bands ranging from 400nm to 700nm, with each band having a spatial size of 512 × 512 pixels.In order to conduct our research, we carefully selected 24 scenes from the dataset.These scenes were utilized as the training set for our end-to-end multispectral image compression method, which relies on convolutional neural networks, while the remaining 8 scenes were used as the test set.Examples of some training and test scenes can be seen in Figure 2. As the balancing factor increases, the corresponding S-bpp (bits per pixel) also increases.The Adam algorithm was used to update gradients, and the total number of iterations exceeded 1,000,000 steps.The learning rate decayed slowly from A to B at a rate of 1/10, and a batch size of 2 was utilized.Due to the limited compression capability and poor performance of JPEG compression, the evaluation of the end-to-end convergent neurological net based multispectral computer graphics compressor method only compared the MS-SSIM (Multi-Scale Structural Similarity) index under the extreme compression limit of JPEG.Additionally, visual comparisons were made using the decompressed images.Figure 4 shows the comparison of (a) the original image, (b) the image under JPEG compression at its limit, and (c) the decompressed image using the end-to-end convolutional neural network-based multispectral image compression method with λ=3.1.From the visual observation, it can be seen that JPEG compression at the current S-bpp introduces significant blocking artifacts, resulting in severe distortion and difficulty in distinguishing image content.In contrast, the method used in this paper, at a similar compression ratio, allows for clear visibility of image details and texture information.It effectively eliminates visual artifacts such as ringing and aliasing.The decompressed image exhibits high quality and demonstrates good compression performance at low bit rates.

Conclusion
Inspired by the natural image depth compression framework, this paper proposes an end-to-end convolutional neural network-based approach for multispectral image compression.The multispectral image is inputted as a three-dimensional tensor into the encoder, enabling the learning of spatialspectral fusion features.An arithmetic coding-based entropy encoder is used to further reduce data volume, and the decoded intermediate features yield the decompressed three-dimensional hyperspectral image.The loss function of the proposed network adopts a rate-distortion optimization method.The CAVE dataset is used for both training and testing, and a comparison is made against the JPEG method.Experimental results show significant improvements in MS-SSIM compared to traditional methods at low bit rates.The proposed method preserves more details and texture information without introducing visual issues like blocking artifacts and blurriness, making it closer to the original image.Moreover, compared to traditional methods, the spectral information is better preserved, closely resembling the original image's spectral curve.By combining lossy compression of multispectral images with deep convolutional frameworks, this paper demonstrates the greater potential of deep learning in multispectral image compression.However, it is noted that the proposed framework's generalization performance is limited due to the reliance on the CAVE dataset for training and testing.Future work could involve joint training on multiple multispectral image datasets to enhance the network's performance.

Figure 2 .
CAVE Dataset Illustrative Diagram.The research was conducted using a GPU configuration of NVIDIA GeForce GTX 1650 with 8GB memory.A series of individual compression models were trained within the range of balancing factors [0.005, 3.1].