BAG-OF-DISCRIMINATIVE-WORDS (BODW)

REPRESENTATION VIA TOPIC MODELING

ABSTRACT:

Many of words in a given document either deliver facts (objective) or express opinions (subjective) respectively depending on the topics they are involved in. For example, given a bunch of documents, the word “bug” assigned to the topic “order Hemiptera” apparently remarks one object (i.e., one kind of insects), while the same word assigned to the topic “software” probably conveys a negative opinion. Motivated by the intuitive assumption that different words have varying degrees of discriminative power in delivering the objective sense or the subjective sense with respect to their assigned topics, a model named as discriminatively objective-subjective LDA (dosLDA) is proposed in this paper. The essential idea underlying the proposed dosLDA is that pair of objective and subjective selection variables is explicitly employed to encode the interplay between topics and discriminative power for the words in documents in a supervised manner. As a result, each document is appropriately represented as “bag-of-discriminative-words” (BODW). The experiments reported on documents and images demonstrate that dos LDA not only performs competitively over traditional approaches in terms of topic modelling and document classification, but also has the ability to discern the discriminative power of each word in terms of its objective or subjective sense with respect to its assigned topic.

EXISTING SYSTEM:

The two most successful and representative works in topic modelling are probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation (LDA). As the first topic model, pLSA evolves from latent semantic analysis (LSA) and is able to capture the hidden semantics conveyed by different words via a probabilistic generative process of the documents. In pLSA, documents are projected into a low-dimensional topic space by assigning each word with a latent topic, where each topic is usually represented as a multinomial distribution over a fixed vocabulary. The LDA model inherits the notion of pLSA, but it employs an extra generative process on the topic proportion of each document and models the whole corpus via a hierarchical Bayesian framework. In fact, pLSA turns out to be a special case of LDA with a uniform Dirichlet prior in a maximum a posteriori model, while LDA has a better ability of modelling large-scale documents for its well defined a priori. In the past decade, the LDA model has been intensively studied and widely applied for many different tasks.

The BoW representation disregards the linguistic structures between the words. In such an unsupervised manner, the learned representations of documents provided by LDA are often found to be not strongly predictive. From a pure viewpoint of prediction, unsupervised LDA unfortunately ignores the nature of the discriminative task of interest such as classification, thus provides no guarantee that the extracted information will be effectual. To alleviate such limitation, many approaches attempt to exploit the useful auxiliary information (e.g., the category labels or the ratings provided by the authors) when modelling of its corresponding documents in a supervised manner. In such variants of LDA, the auxiliary information is usually considered to be a response variable predicted based on the latent representation of the document (i.e., the proportion of topics), where the assignments of topics to each word take effect instead of the words themselves. In other words, the “Bag-of-Topics” (BoT) representation has taken place of the traditional BoW representation to better characterize massive documents in predictive tasks such as regression and classification. The most representative models that proposed in the notion of BoT are the supervised LDA (sLDA)], the scene-understanding model, multi-class sLDA, and τLDA.

DISADVANTAGES:

From a pure viewpoint of prediction, unsupervised LDA unfortunately ignores the nature of the discriminative task of interest such as classification, thus provides no guarantee that the extracted information will be effectual.
The assignments of topics to each word take effect instead of the words themselves.

PROPOSED SYSTEM:

The proposed work is an approach named as discriminatively objective-subjective LDA (dosLDA). The essential idea underlying it is that a pair of objective and subjective selection variables is explicitly employed to encode the interplay between topics and discriminative power with respect to the words in a supervised manner. The dosLDA possesses the attractive power in naturally selecting out those words that are discriminative in delivering either an objective or a subjective sense in one given document, and generates the novel “bag-of-discriminative words” (BODW) representations for each document, which is illustrated in Figure. It is demonstrated via several experiments that our proposed BODW is more predictive for discriminative tasks than the traditional BoW and BoT representations employed in the current methods.

ADVANTAGES

The bag of discriminated words is very effective when it is comes to analysis the document or image itself
For images, the system gets the comments from the user in order to involve the user and get the user view about image and from there they can find the sentiments of the image

SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10

HARDWARE REQUIREMENTS:

 Processor : Core I3
 RAM Capacity : 2 GB
 Hard Disk : 250 GB
 Monitor : 15″ Color
 Mouse : 2 or 3 Button Mouse
 Key Board : Windows 08/10

For More Details of Project Document, PPT, Screenshots and Full Code
Call/WhatsApp – 9966645624
Email – info@srithub.com

Enquire Now

Leave your details here for more details.

Latest post

Telecalling Executive

August 9, 2024

Stock Price Prediction using Twitter Dataset

August 7, 2024

Price Negotiating Chatbot on E-commerce website

August 7, 2024

A Two-Stage Model to Predict Surgical Patient’s Lengths of Stay From an Electronic Patient Database

August 7, 2024

Identifying Bone Tumour using X-Ray Images

August 7, 2024

Spammer detection and fake user identification on social network

August 7, 2024

Team Work