ABSTRAT:
The prevailing text steganalysis methods detect steganographic communication by extracting hand-crafted features and classifying them using SVM. However, these features are designed based on the statistical changes caused by steganography, thus they are difficult to adapt to different kinds of embedding algorithms and the detection performance is heavily dependent on the text size. In this letter, we propose a novel text steganalysis model based on convolutional neural network, which is able to capture complex dependencies and learn feature representations automatically from the texts. First, we use a word embedding layer to extract the semantic and syntax feature of words. Second, the rectangular convolution kernels with different sizes are used to learn the sentence features. To further improve the performance, we present a decision strategy for detecting the long texts. Experimental results show that the proposed method can effectively detect different kinds of text steganographic algorithms and achieve comparable or superior performance for a wide variety of text sizes compared with the previous methods.
EXISTING SYSTEM :
the steg texts by SVM. Chen et al. [2] use the context clustering to extract statistical features of context fitness values based on the mismatch of the replaced word and its contexts. Xiang et al. [3] utilize relative frequency of attribute pair which consists of the synonym position and the number of its synonyms to construct the feature vector based on the fact that the number of high frequency words is always reduced after embedding. For detecting generation-based steganography, Yang and Cao [4] utilize the meta features (such as word length, space rate, word frequency), and immune mechanism to select the proper features. Chen et al. [5] propose a steganalysis scheme (NFZ-WDA) to model the language structure based on the word distribution in different natural frequency zones. Although many promising steganalysis methods are proposed, most of them are designed for a certain kind of steganographic methods. Furthermore, they rely on domain knowledge such as synonym dictionary and word frequency. Moreover, the above steganalysis methods are based on the statistical characteristics which have to be obtained from a large corpus. As a result, the performance of these methods is poor for short texts. This motivates us to design a universal CNN based steganalysis model which has strong ability to adapt to different types of text steganography and detect texts of various lengths
EXISTING SYSTEM DISADVANTAGES:
1.LESS ACCURACY
2. LOW EFFICIENCY
PROPOSED SYSTEM :
In this section, the overall architecture of the proposed text steganalysis model which can detect both long and short steg texts is introduced first. Then, we further state the structure of the CNN in detail and discuss some design considerations about it. Finally, we demonstrate the decision strategy used in the text steganalysis model to detect the long Steg texts. A. Overall Architecture Fig. 1 illustrates the overall architecture of our text steganalysis model. For short texts, we pre-process them through word segmenting, lowercasing, constructing dictionary based on the training set, and encoding the words into indexes according to the position of words in the built dictionary. Then, the actual index sequences are fed to the CNN (described in Part. B) to learn the feature representations and the predicted labels can be obtained directly. For long texts, considering the wide range Fig. 2. Proposed CNN architecture. of length variation of long texts (paragraphs, chapters, books), which is not conducive to the training of CNN, we tokenize long texts into their sentence components with a relatively consistent length before the data pre-processing, and process each sentence individually. In the training phase, the CNN is trained based on these sequences and their corresponding labels. In the testing phase, a set of labels of a long text is predicted through the trained CNN, then we use a decision strategy (described in Part. C) to make the final decision
PROPOSED SYSTEM ADVANTAGES:
1.HIGH ACCURACY
2.HIGH EFFICIENCY
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10
HARDWARE REQUIREMENTS:
Processor : Core I3
RAM Capacity : 2 GB
Hard Disk : 250 GB
Monitor : 15″ Color
Mouse : 2 or 3 Button Mouse
Key Board : Windows 08/10