Have you ever thought about how much time and effort we could save if we could make the computer do our manual tasks, like if you needed to make an apartment listing from the photos, and the computer was able to recognize the items in the image, and create the list of objects for you? How about all the other possible solutions for the real estate industry using computer vision powered software?
Automated image annotation and semantic segmentation can create immense value in the consumer-oriented business. Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database. It requires advanced computer vision engineering and lots of processing power to recognize objects accurately. Many problems can arise during this process because of a different view of an object from different angles, occlusion of different parts of the objects in different images, shadows, and background mixing with the features etc. That is why the problem of processing images with different types of kitchen furniture and appliances in the photos and correctly naming the objects present in these pictures may become challenging and tricky.
How to perform Kitchen Furniture & Appliances annotation in Photos
Our engineers reviewed, examined and tested a couple of approaches and models that could solve the kitchen object recognition problem. The previous results are shared in the blog article Image Recognition Experiment: Finding Furniture & Appliances in Kitchen Photos. However, it was more of research (academic approach) to find out the possible opportunities and different methods of resolving the task. Now we believe that the project can be implemented on the commercial level. We have thoroughly researched and analyzed the application of VGG, a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in object recognition. This model can achieve 92.7% top-5 test accuracy in ImageNet (overall 1000 classes dataset with over 14 million images). Our computer vision engineers adapted the model and developed an image postprocessing algorithm that has allowed us to significantly increase the object recognition accuracy.
The overall architecture of the proposed network is visualized in Fig.1. During the preprocessing layer, the RGB image is taken and the mean image values are subtracted. Based on the VGG 16-layer net in order to generate the accurate segmentation map of an input image a multilayer deconvolution network is put on top of the convolutional network.
Once the specific dataset is created/selected due to the exact customer’s needs, the algorithm can be adapted to recognize the required object classes. The following object classes were selected to be recognized in kitchen photos:
- Dining table
- Microwave oven
- Washing machine
- Plate rack
- Waffle iron
- Espresso maker
- Cocktail shaker
You can see the samples of output images with the results of all the kitchen objects recognized.
Photo processing result: 47% espresso maker, 44% coffee pot.
Photo processing result: 97% dining table.
Photo processing result: 64% stove, 33% microwave, microwave oven.
Photo processing result: 29% toaster.
We believe that the implementation of the convolutional neural network model for semantic segmentation and automatic image annotation can revolutionize real estate industry solutions. Although such solutions that would provide high accuracy results are rarely available on the market out-of-the-box and are still in the development stage, our experts are ready to share their findings with you:
- An innovative approach to the real estate industry. It can benefit in terms of data loss, timespan and human resources.
- Universal recognition approach. It can be applied to any other sphere that requires specific objects recognition and image annotation.
- High algorithm recognition speed. The algorithm can process frames from your camera in real-time.
- Works on low-resolution images. The algorithm is able to detect the objects even on low-resolution images.
In case it sounds interesting for you or you would like to clarify any questions with our experts, just get in touch with us.
Image Processing, Convolutional Neural Network, Automatic Image Annotation