ABSTRACT :
This Research to Practice Work in Progress Paper presents a token-based approach to detecting plagiarism in university courses with hardware programming assignments. Detecting plagiarism manually is a difficult and time-consuming work. In the last two decades, various of plagiarism detection tools have been developed. These techniques could be mainly divided into the following categories: Textual Match, Program Dependence Graph Comparison, Abstract Syntax Tree Analysis and Low-Level Form Code Comparison. Although there had been a lot of researches on detecting code clones in software programming languages (e.g. Basic, C/C++, Java, Python, etc.), research that focused on hardware description languages is still lacking. Based on the effective of the locality sensitive hash function (simhash), which was usually used in detecting near duplicates for web crawling, we proposed an improved real-time plagiarism detection approach for Verilog HDL (hardware description language) programming assignments. The core detecting steps are extracting weighted tokens from source code as high-dimensional feature, and mapping it to a f-bit fingerprints with simhash technique. On account of the syntax characteristics of Verilog HDL, a token extraction strategy was designed to maximize the valid information that a fixed length hash value could represent. Experiments over real course data sets were conducted to evaluate the performance of token-based approach comparing with an existing plagiarism detection tool (Moss). The result shows that our token-based approach does qualify the plagiarism detecting job for both online-query and batch-query in digital designs. Furthermore, token-based plagiarism detection approach could enable conduct incremental plagiarism detection for a single submission without excessive overhead. Finally, we also give a discussion of current way limitations and future research directions.
EXISTING SYSTEM :
sources files from each submission are feed into plagiarism detection process as input and the suspected plagiarism pairs are output. The entire process of our token based plagiarism detection approach consists of the following steps: pre-processing source files, extracting tokens, fingerprinting with simhash, indexing and clustering to get result.
PROPOSED SYSTEM:
During the CS education development process, various plagiarism detection tools have been proposed. C. Liu et al. developed the GPLAG, a tool that using program dependence graphs (PDGs) to detect plagiarism [4]. For better search efficiency, they designed a statistical lossy filter also. Theoretically, GPLAG can be easily extended to any programming language with a specific parsing front-end, but it only supports C/C++ and Java currently. L. Precheltetal. developed a web service based on Greedy String Tiling [5] algorithm, called JPlag [6]. It compares programs pairwise by turning them into a string of canonical tokens. C. Zhao et al. proposed a solution called BuaaSim, which combined compiling optimization and disassembling techniques together to detect plagiarism in C/C++ programs [7]. Their experiment showed that it achieved significantly higher performance than JPlag in practice. A. Aiken at Stanford University provided a web-based tool that can analyse code written in 23 languages including VHDL and Verilog HDL, called Moss [8]. The method that it used to measure the similarity between programs is a document fingerprinting algorithm called winnowing [9]. Overall, GPLAG and BuaaSim both require the input programs to be compliable. JPlag using a pairwise comparison strategy, which makes it relatively slow when processing plenty of samples. As far as we know, Moss is the only tool that supports almost all programming languages. However, it’s a closed-source system supporting batch queries only which makes it unsuitable for retrieving real-time information of potential plagiarism after submission.
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10
HARDWARE REQUIREMENTS:
Processor : Core I3
RAM Capacity : 2 GB
Hard Disk : 250 GB
Monitor : 15″ Color
Mouse : 2 or 3 Button Mouse
Key Board : Windows 08/10