Image-to-Image translation

Image-to-Image translation

Interested how sophisticated algorithms can turn the same winter landscapes into realistic summer photos? Keep reading.

Abto Software’s R&D engineers conducted comprehensive, meticulous research to investigate the possibilities of modern machine learning and the output quality it can potentially provide for translating winter landscapes into accurate summer photos.

We covered the research and investigation of different generative models and algorithm application methods. With a solid dataset, we adopted the most suitable model to handle the challenges associated with such a complex project.

We see the opportunities of transitioning winter photos in benefiting landscape design and some related areas. This way, landscape architects and designers could see clearer how, for example, snow-covered landscapes could possibly look like in all other seasons.

Conditional generative adversarial network

We examined and tested various models and methods, including CGAN, CVAE, and pixel-to-pixel translation. The CGAN – conditional generative adversarial network –, an extension to GAN, has shown to be the most suitable approach for the set task.

Its advantages can be reduced to:

  • Sharp and realistic synthesis of images
  • Excellent match for high-dimensional visual data
  • Semi-supervised learning
  • Bleeding edge computational technology

The CGAN is simply an extension to GAN that involves conditional generation of images by a generator model. The generative adversarial network is a ML framework commonly utilized to train generative models that relies on a generator designed to create new images, and a discriminator used to distinguish synthetic images.

By adding additional details, we receive faster convergence, which creates some patterns even for fake images. Another thing, this approach helps control the output by labeling the images to be further generated.

Generatior Network - Discriminator Network
cGAN Structure

Machine learning for accurate image-to-image translation

During the project’s scope our team faced the following challenges:

  • The complexity of the selected algorithm
  • The configuration of knobs
  • The aggregation and preparation of datasets
  • Long iterations

The team successfully continued the research and investigation by aggregating and preparing huge datasets. For the better understanding and editing of random outdoor scenes, we chose the Transient Attribute Dataset. But because of the perceptible lack of data, to achieve better outputs, we decided to utilize additional datasets. The details and samples of those can be found below.

The datasets

Dataset №1. Annotated photos from over 100 webcams with various outdoor scenes.

Our team:

  • Downloaded photographs annotated during crowdsourcing campaign
  • Picked and filtered photos that has the most vivid characteristics of the winter and summer seasons
  • Clustered the selected photos as pairs

That made 3000 pairs of both winter and summer landscapes, 640×480, scaled to 256×256.

image3
image8

Dataset №2. Four 10-hour high-definition videos recorded from the train during the Nordland Line Norway trip in all four seasons.

Our team:

  • Downloaded the mentioned videos
  • Cut all four videos into frames using FFmpeg
  • Auto-aligned the best frames using Python and Hugin

This made 9000 pairs of both winter and summer landscapes, 1000×1000, scaled to 256×256.

image12image16
 image4 image13

The results

Harnessing knowledge and experience in leveraging computation technology, in particular machine learning, our engineers successfully delivered a solution performing accurate image-to-image translation as an innovative analogy to automatic language translation.

The results can be found below.

InputOutput Ground truth
image15image9image11
image22image5image18
image2image14image7
image6image20image10
image17image1image19

Training set: 11.500 image pairs, testing set: ~600 image pairs. Here are the best setup details:

The best setup details:

  • 286×286, scaled to 256×256
  • Horizontal mirroring
  • Conditional D model
  • PatchGAN 

The variations

We would also like to share some samples of variations that happened during the training process. 

The most typical variations:

  • Dataset manipulations and mixing
  • Image jitter, random mirroring, the number of epochs
  • Unconditional/conditional D models
  • PatchGAN/PixelGAN/ImageGAN
 InputWithout jitter:
Same detailed patterns everywhere
With jitter. 900×900, scaled to
256×256: Blurred output
 image2 image13 image20 image8 image7
 image9 image4 image21 image10 image22
 InputOverloaded dataset №1:
Mixed illumination and artifacts
Overloaded №2: No clarity
 image2 image3image18 image23 image24
 image9 image14 image11 image19 image16
 InputImageGAN:
Slightly distorted
 PixelGan: Very blurryL1 regularization: Loss of low-level
features
image2image5image6image17
image9image12image1image15

Technical details

Used hardware:

  • AWS p2.xlarge (NVIDIA GK210 12 GB GPU), CUDA Toolkit

Tech stack:

  • Python
  • NumPy 
  • Pandas dataframe
  • Hugin tool
  • Pix2Pix service 
  • OpenCV library
Give meaning to images,
analyze videos, and recognize random objects with the highest accuracy.
READ MORE

How we can benefit your business

Abto Software handles complexities related to computer vision to help mature businesses focus more on data. By harnessing great knowledge and experience in implementing artificial intelligence and its various subsets (machine and deep learning, ANN, NLP), our engineers deliver custom cutting-edge solutions.

We provide:

  • Image e