ABSTARCT :
Image question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A image QA system allows direct use of a image to refer to the object. We develop a three-layer system architecture for image QA that brings together recent technical achievements in question answering and im- age matching. The first, template-based QA layer matches a query photo to online images and extracts structured data from multimedia databases to answer questions about the image. To simplify image matching, it exploits the question text to filter images based on categories and keywords. The second, information retrieval QA layer searches an in- ternal repository of resolved photo-based questions to retrieve relevant answers. The third, human-computation QA layer leverages community experts to handle the most difficult cases. A series of experiments performed on a pilot dataset of 30,000 images of books, movie DVD covers, grocery items, and landmarks demonstrate the technical feasibility of this architecture. We present three prototypes to show how photo-based QA can be built into an online album, a text-based QA, and a mobile application. we are using RCNN to identify objects in images and LSTM to build vocabulary sentences in meaningful form.
3.1 EXISTING SYSTEM:
The major reason why you cannot proceed with this problem by building a standard convolutional network followed by a fully connected layer is that, the length of the output layer is variable — not constant, this is because the number of occurrences of the objects of interest is not fixed. A naive approach to solve this problem would be to take different regions of interest from the image, and use a CNN to classify the presence of the object within that region. The problem with this approach is that the objects of interest might have different spatial locations within the image and different aspect ratios. Hence, you would have to select a huge number of regions and this could computationally blow up. Therefore, algorithms like R-CNN, YOLO etc have been developed to find these occurrences and find them fast.
DISADVANTAGES OF EXISTING SYSTEM:
. The problem with this approach is that the objects of interest might have different spatial locations within the image and different aspect ratios.
3.2 PROPOSED SYSTEM:
To bypass the problem of selecting a huge number of regions, Ross Kirchick et al. proposed a method where we use selective search to extract just 2000 regions from the image and he called them region proposals. Therefore, now, instead of trying to classify a huge number of regions, you can just work with 2000 regions. These 2000 region proposals are generated using the selective search algorithm.
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10
HARDWARE REQUIREMENTS:
Processor : Core I3
RAM Capacity : 2 GB
Hard Disk : 250 GB
Monitor : 15″ Color
Mouse : 2 or 3 Button Mouse
Key Board : Windows 08/10