Abstract:
Vision impairment or blindness is one of the top ten disabilities in humans, and unfortunately, India is home to the world’s largest visually impaired population. In this study, we present a novel framework to assist the visually impaired in object detection and recognition, so that they can independently navigate, and be aware of their surroundings. The paper employs transfer learning on Single-Shot Detection (SSD) mechanism for object detection and classification, followed by recognition of human faces and currency notes, if detected, using Inception v3 model. SSD detector is trained on modified PASCAL VOC 2007 dataset, in which a new class is added, to enable the detection of currency as well. Furthermore, separate Inception v3 models are trained to recognize human faces and currency notes, thus making the framework scalable and adaptable according to the user preferences. Ultimately, the output from the framework can then be presented to the visually impaired person in audio format. Mean Accuracy and Precision (MAP) scores of standalone SSD detector of the added currency class was 67.8 percent, and testing accuracy of person and currency recognition of Inception v3 model were 92.5 and 90.2 percent respectively.
Introduction
Visually impaired people face a lot of difficulties in their lives. Recent statistics published by World Health Organization (WHO) in 2019 reveal that globally, around 2.2 billion individuals are affected by vision impairment. Detecting and recognizing common objects in the surroundings seem to be a herculean task for the visually impaired individuals. They rely either on other people, which makes the blind dependent on them, or, on their sense of touch and smell to detect objects, which is highly inaccurate and can be hazardous in some cases.
The white cane is the most popular blind navigating device. This was further improved by adding ultrasonic and IR sensors to detect obstacles in the vicinity of the visually impaired user, and provide feedback in the form of vibration or sound. Though this approach was useful for the mobility of the visually impaired user, it provided little or no information about the surroundings. For the user to have a better understanding of the surrounding, objection detection and classification, followed by recognition and audio feedback is crucial.
Neural networks, particularly, convolutional neural networks have shown promising results particularly in object detection, classification, and recognition tasks from images. In [1], the authors use a feed-forward neural network to provide speech suggestions regarding products of shopping. Real-Time smartphone-based obstacle detection and classification system is implemented in [2]. The detection process involves interest point extraction and tracking through multiscale Leucocyanide algorithm, background motion estimation using homographic transforms and agglomerative clustering technique, followed by classification with the help of Histogram of Oriented Gradients (HOG) descriptor into Bag of Visual Words (BOVW). A survey on Electronic Travel Aids (ETA) designed for visually impaired navigation assistance is presented in [3]. Various ETAs, their strengths, and shortcomings are discussed and compared feature-wise. It also highlights the fact that no current system incorporates all necessary features and any technology should not attempt to replace the cane stick but to complement it by proper alerting and feedback.
A deep novel architecture for visually impaired employing a late fusion of two parallel CNN’s outperforms the state-of the-art methods for activity recognition [4]. The two CNN’s Google Net and Alex Net complement each other in identifying different features of the same class, hence the input video is fed to both of them, and the output class scores are combined using Support Vector Machine (SVM). Yet another novel method proposed in [5] uses CNN followed by a recurrent neural network (RNN) and SoftMax classifier for object detection, and Hue, Saturation and Intensity (HSI) colour thresholding for colour recognition. An approach combining computer vision and deep learning techniques for visually impaired outdoor navigation assistant is shown in [6]. The system uses a regression-based mechanism for object tracking without a priori information, handles sudden camera movements, and exploits You Only Look Once (YOLO) for object recognition. A smartphone app is designed for guiding visually impaired persons in [7]. It can operate in two modes: online and offline based on user network connectivity. The online mode uses Faster RCNN to generate predictions in stable conditions and YOLO for faster results. Whereas, a feature recognition module using Haar features and Histogram of Gradients (HOG) serves this purpose in offline mode. A CNN is designed for pre-trained object recognition using the ImageNet dataset [8]. A novel DLSNF (Deep-Learning-based Sensory Navigation Framework) built on the YOLO architecture is proposed in [9] for designing a sensory navigation device on top of NVIDIA Jetson TX2. Squeeze Net, a light-weight pretrained CNN model, achieved better performance and reduced computational latency per image [10]. Squeeze Net is improved by changing the weights of the last convolutional layer, replacing the Rectified Linear Unit (ReLU) with Leaky ReLU as activation function and addition of batch normalization layer.
Existing System:
Vision impairment or blindness is one of the top ten disabilities in humans, and unfortunately, India is home to the world’s largest visually impaired population. In this study, we present a novel framework to assist the visually impaired in object detection and recognition, so that they can independently navigate, and be aware of their surroundings. The paper employs transfer learning on Single-Shot Detection (SSD) mechanism for object detection and classification, followed by recognition of human faces and currency notes, if detected, using Inception v3 model. SSD detector is trained on modified PASCAL VOC 2007 dataset, in which a new class is added, to enable the detection of currency as well. Furthermore, separate Inception v3 models are trained to recognize human faces and currency notes, thus making the framework scalable and adaptable according to the user preferences
Proposed System
A deep novel architecture for visually impaired employing a late fusion of two parallel CNN’s outperforms the state-of the-art methods for activity recognition [4]. The two CNN’s Google Net and Alex Net complement each other in identifying different features of the same class, hence the input video is fed to both of them, and the output class scores are combined using Support Vector Machine (SVM). Yet another novel method proposed in [5] uses CNN followed by a recurrent neural network (RNN) and SoftMax classifier for object detection, and Hue, Saturation and Intensity (HSI) colour thresholding for colour recognition. An approach combining computer vision and deep learning techniques for visually impaired outdoor navigation assistant.
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10
HARDWARE REQUIREMENTS:
Processor : Core I3
RAM Capacity : 2 GB
Hard Disk : 250 GB
Monitor : 15″ Color
Mouse : 2 or 3 Button Mouse
Key Board : Windows 08/10