Scope of Support Vector Machine in Steganography

Steganography is a technique used for secure transmission of data. Using audio as a cover file opens path for many extra features. In order to overcome the limitations of conventional LSB technique, various variants were proposed by different authors. In order to achieve robustness, use of various optimization techniques has been tradition. In this paper the focus is put on use of Genetic Algorithm and Particle Swarm Intelligence in steganography. To list detailed scope, merits and de-merits of the two optimization techniques is the main constituent of this paper. In spite of analyzing the two techniques, the motivation and applicability of machine learning algorithm in the problem statement is also discussed. This paper will guide the path in using Support Vector Machine for optimizing the data hiding.


INTRODUCTION
Steganography has proven its use in securing information. It can be classified on the basis of the cover file used for hiding secret message. The major literature is studied focusing on audio as cover file. Many researchers has combined cryptography along with steganography so as to provide double layer of security. Many other researchers worked on making the position of bit replacement random so as to make it more robust. Embedding secret messages in digital sound seems to be more difficult as compared to other binary files because of higher sensitivity of Human Auditory System(HAS) as compared to Human Visual System(HVS)[10]. The basic requirements of steganography can be described as follows: • Transparency: It is the property by which the stego file does not create any suspicion of presence of any embedded message while compared to original cover file i.e. cover object and stego object must be perceptually indiscernible [7].

• Data Rate:
It is defined as the number of bits of secret message that can be hidden in a byte of cover image. It is required for an application to have high data rate of embedding.

• Robustness:
A technique is said to be highly robust if it can withstand against various attacks. The attacks can be of type intentional and un-intentional [8].

MOTIVATION
Let us re-describe some basic properties of steganography so as to describe the workflow.
Imperceptibility is a measure of the distortion produced in stego audio due to modifications like message embedding. It is essential to keep the distortion at a level below than the threshold which can be estimated on the basis of HAS/HVS and the cover file [1]. Since it is not possible practically to get a technique having high capacity and high robustness simultaneous so the attention moves towards balancing imperceptibility and robustness. Various researchers has done their work in achieving robustness by maintaining imperceptibility. In order to minimize the distortion induced as a result of embedding process, the researchers put their attempts further to optimize the embedding process by using different optimization techniques. In this paper the focus is to summarize the applications and aftereffects of genetic algorithm and particle swarm optimization. Both the techniques will then be compared with the achievements of machine learning so as to justify the scope of machine learning. Due to its uniqueness in giving solutions and less time consumption, Support Vector Machine is now being used as a preferred choice of machine learning 3. LITERATURE REVIEW Majdak in his P.hD work explored the scope of balancing robustness through imperceptibility. He discovered the various limitations of traditional substitution method and provided a path for overcoming them. His solution was based on GA. His work can be concluded as follows[4]: • Adudio file chosen in WAV format.
• To maintain imperceptibility, GA was used as an optimization tool.
• Quality was measured and compared in PSNR metric.
Krishna Bowal et.al. combined cryptography, steganography and Genetic Algorithm to develop a 3-tier protection tool [1].The secret message is encrypted using RSA and then embedded in cover audio using GA based technique. Their work can be concluded as follows: • Along with increasing depth of embedding layer, choice of random layer was also done randomly.
• Multi-objective GA was used to optimize position and distortion.
• Data Rate is comparatively low.
• Imperceptibility measured by subjective audio tests.
• Steganalysis is supposed to be more challenging V.Santhi and Longeswari Govindaraju in their work of providing 3-tier security using Cryptography and GA along with steganography did following variations[12]: • Negative bytes can be included in population generation.
• Negative audio byte processed as 32-bit audio sample while positive as 16-bit audio sample.
• Resulting tool ensured large capacity and drastic reduction in noise.
Rishidas.S et.al. in their work analyzed SVM as a classifier in steganalysis. However the object for classification was images and even images of different formats but still their work provided the measure of strength of SVM as classifier in data hiding or extracting[3]. • Cover file first converted to frequency domain.
• Secret message converted into partitions and optimal matrix substitution was done as a part of embedding process.
• PSO was used to find optimal matrix for substitution Initial work was done on image steganography where text message was hidden into a digital image. Later on the effort to hide image in image and audio into image also succeeded. But the drawback of using image as cover file was due to higher sensitivity of human visual system. It was easy to catch a little distortion introduced in an image by naked eyes and this would arouse suspicion violating the objective of steganography.
The most famous technique to implement steganography was substitution technique in which some of the bits of cover file were substituted with the message bits. The LSB's were chosen for substitution as they contribute least in computation of statistical value of the cover file.
In order to increase the data hiding capacity of cover files, multiple LSB's got substituted. However that type of methods resulted in more distortion produced. Since substitution with LSB's is very susceptible to intentional as well as unintentional attacks thus solution was proposed to hide data in deeper layers. Hiding data in deeper layers of cover led to more distortion in cover file. Some algorithms were proposed then which not only hide data in deeper layers but also do adjustments by willingly modifying certain other bits of cover file in order to minimize the distortion thus occurred.
Due to large availability in practical and its inherent masking effect, audio started to begin as a choice for cover file. More over Human Auditory System suppressed low frequency audio with higher frequency audio; called masking effect. Even after the use of audio as cover the inherent problems of substitution technique remained as it is. Thus lots of variants of conventional LSB method came into practice with an attempt to be more robust.
Another layer of security was introduced by combining cryptography with steganography. Thus message to be hidden in cover was encrypted first and then hidden in cover audio. It was the time then that researchers started combining soft computing techniques with steganography. However the combination of a particular soft computing technique was there to achieve some specific goal. In order to develop optimized technique, GA became the first choice to be combined with steganography. However the different techniques differ in their choice of fitness function and the logic of applying GA. GA is good optimization tool but it is less effective when generalization is required. SVM is proven to be a good classifier algorithm in such cases.

Genetic Algorithm:
Advantages: • Concept easy to understand • Intrinsically parallel • Always an answer; answer gets better with time • Modular-separate from application • Chances of getting optimal solution are more Disadvantages: • No guarantee of finding global maxima • Time Taken for convergence • Calculation of parameters is trial-error based • unguided mutation

4.2.Particle Swarm Optimization
Advantages: • Can be applied into both scientific and engineering research • Only most optimist particle transmit information, thus fast • Parameter calculation is simple • Lower computational cost. Disadvantages: • Less exact at the regulation of speed and direction • Cannot work out on the problems of scattering • Cannot work out on the problems of non-coordinate system such as solution to the energy field • Not exactly repeatable in terms of computational cost, making comparison hard Voratas Kachitvichyanukul did comparison of following evolutionary algorithms and listed as follows:

SUPPORT VECTOR MACHINE
SVM is a regulated machine learning calculation which can be utilized for characterization or relapse issues. It utilizes a procedure called the bit trap to change your information and afterward taking into account these changes it finds an ideal limit between the conceivable yields. Basically, it does some to a great degree complex information changes, then makes sense of how to isolate your information taking into account the names or yields you've characterized.

Advantages of using SVM:
• Suitable for linearly separable as well as non separable data sets. Vector Machines may be used to provide faster and unique results. The technique can be used in overcoming the limitations in traditional substitution technique. The paper may guide so as to develop a technique which is highly reliable, robust and having good data hiding capacity by the inclusion of Support Vector Machine in substitution method.