ABSTRACT
While keyword query empowers ordinary users to search vast amount of data, the ambiguity of keyword query makes it difficult to effectively answer keyword queries, especially for short and vague keyword queries. To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. Given a short and vague keyword query and XML data to be searched, we first derive keyword search candidates of the query by a simple feature selection model. And then, we design an effective XML keyword search diversification model to measure the quality of each candidate. After that, two efficient algorithms are proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Two selection criteria are targeted: the k selected query candidates are most relevant to the given query while they have to cover maximal number of distinct results. At last, a comprehensive evaluation on real and synthetic data sets demonstrates the effectiveness of our proposed diversification model and the efficiency of our algorithms.
Existing System with Limitations
Keyword queries are a powerful tool for users to search vast amounts of data. However, existing systems that rely solely on keyword queries face several significant limitations:
- Ambiguity of Keyword Queries: Keyword queries, especially short and vague ones, are often ambiguous. This makes it challenging to return accurate and relevant search results.
- Lack of Context Consideration: Traditional keyword search mechanisms do not adequately consider the different contexts in which a keyword can appear within XML data, leading to less effective search results.
- Limited Search Intent Diversification: Existing systems typically focus on returning the most relevant results based on the keyword query without diversifying the search to cover different possible user intents.
- Inefficiency in Handling Large Data Sets: As the size of the XML data grows, existing keyword search systems struggle with efficiency, making it difficult to provide timely and relevant search results.
- Poor Relevance and Coverage Balance: Current systems often fail to balance relevance and coverage, resulting in search results that may be relevant but lack diversity in capturing different aspects of the query.
Proposed System with Advantages
The proposed system introduces a novel approach to automatically diversify XML keyword search based on different contexts within the XML data, addressing the limitations of existing systems:
- Context-Aware Search Diversification: The system derives keyword search candidates by considering the various contexts in the XML data, enhancing the relevance and accuracy of the search results.
- Feature Selection Model: A simple feature selection model is used to generate keyword search candidates, improving the initial search scope and capturing different query interpretations.
- XML Keyword Search Diversification Model: An effective model is designed to measure the quality of each keyword search candidate, ensuring that the most relevant and contextually diverse candidates are considered.
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Web Technologies : HTML, CSS, JS. JSP
• Programming Language : Java and J2EE
• Database Connectivity : JDBC
• Backend Database : MySQL
• Operating System : Windows 08/10
HARDWARE REQUIREMENTS:
- Processor : Core I3
- RAM Capacity : 2 GB
- Hard Disk : 250 GB
- Monitor : 15″ Color
- Mouse : Two or Three Button Mouse
- Key Board : Windows 08/10