A magician can make his trick with just a wave of a magic wand, our engineers can make their magic with just one click! Interested how the same winter landscape would look in summer, then keep on reading.   The comprehensive research was completed by our R&D department with the purpose of investigating current possibilities of Machine Learning generators and what quality they can provide for a task of translating winter to summer image as an analogy to automatic language translation. We have conducted the investigation of different generative models and application methods, prepared a solid dataset, and the most suitable model we have adopted to satisfy our requirements. The idea of transitioning the photo taken in winter into a summery image could be applied in a landscape design or related spheres. That would give an opportunity to have a clear vision of the same image in different seasons. The implementation of the same technologies can also enable the various color changes of images.

The secret to our magic is CGAN neural network

A lot of approaches and processing models were researched, analyzed and tested, i.e. neural networks: CGAN, CVAE; pixel-to-pixel translation. CGAN appeared to be the most suitable for our image-to-image translation task because of the following reasons:

  • Sharp & realistic image synthesis
  • Semi-supervised learning
  • Excellent test of our ability to use high-dimensional complicated data
  • Bleeding edge ML technology

Conditional GAN is a subset of GANs – Generative adversarial networks. GAN is a type of generative neural networks that consists of two networks: discriminator – D and generator – G. We can easily understand the key concept behind them by imagining a team of counterfeiters as a G and police as D. Counterfeiters (G) is constantly trying to produce fake currency (images), while police (D) is trying to estimate whether it’s real or fake. As time passes both of them are becoming better in their jobs. Conditional GAN has the same principle, but we have some additional info – condition. So now, instead of asking counterfeiters to just produce some fake currency, we ask them to produce, let’s say, fake 100$. Similarly, instead of just asking the police to judge if it’s fake or real, we ask them to judge is it’s 100$. It allows us to teach G to generate even more realistic currency (images in our case).   Figure 1. cGAN Structure

Scheme (1)

While working on the image-to-image translation of the winter  photos, our image processing engineers have faced numerous challenges, like:

  • dataset aggregation and preparation;
  • long iterations
  • the cost of experiments;
  • the complexity of CGAN, the configuration of many knobs

They continued their research on image-to-image translation with the huge and comprehensive dataset of images collection. At first, our R&D team worked with Transient Attribute Dataset. However, due to the lack of images and in order to reach better results and accuracy, the additional image dataset was needed. For this purpose, we selected “Nordland Line” train video footage. The details and samples of both datasets are provided below.

Dataset №1. Annotated photos from 101 webcams with outdoor scenes.

  • Annotated by people during crowdsourcing campaign (downloaded).
image3
image8
  • Filtered: picked ones that have “most summer” and “most winter” as pairs (Pandas)

3000 image pairs of 2 seasons / 640×480 scaled to 256 x 256.

Dataset №2. Four 10-hour Full HD videos of footage recorded from the train during “Nordland Line” trip in Norway in all four seasons.

  • Time-sync (downloaded)
  • Cut into frames (FFmpeg)
  • Auto align best nearby frames (Python + Hugin)

9000 image pairs of 2 seasons / 1000×1000, scaled to 256×256.

image12 image16
 image4  image13

 

Image-to-image translation results

Our engineers have adopted the model to satisfy their requirements and share their best results with you.

Input Output  Ground truth
image15 image9 image11
image22 image5 image18
image2 image14 image7
image6 image20 image10
image17 image1 image19

Training set: 11.500 image pairs, testing set: ~600 image pairs.Here are the best setup details:

  • 286×286 -> 256×256 jitter, horizontal mirroring
  • Conditional D model
  • PatchGAN.

Variations

We would also like to share some samples of variations that happened during the image-to-image translation process. The most typical were the following:

  • Dataset manipulations and mixing
  • Jittering, random mirroring, amount of epochs
  • Unconditional/conditional D model
  • PatchGAN/PixelGAN/ImageGAN
 Input No jitter:
Same detailed patterns everywhere
Jitter 900×900 -> 256×256:
Blurred
 image2  image13 image20  image8 image7
 image9  image4 image21  image10 image22

 

 Input Too much dataset №1:
Mixed illumination, artifacts
Too much dataset №2:
railway everywhere!
 image2  image18image3  image23 image24
 image9  image14 image11  image19 image16
 Input ImageGAN: a bit distorted  PixelGan: Very blurry No L1 term: Loss of low-level
features
image2 image5 image6 image17
image9 image12 image1 image15

Technical Information

Hardware:

  • AWS p2.xlarge (NVIDIA GK210 GPU)
    12Gb GPU memory. CUDA.

Software:

  • pix2pix – Lua/Torch implementation of cGAN.
  • Hugin – photo-aligning tool
  • Python, pandas, numpy.

To get an objective assessment of the output pictures quality, we need a few people as independent experts to compare 2 images and indicate which of them is a real photo, one of the images will be a real photo and another one just a generated picture. Would you like to take part in the experiment? Get in touch with us!

Tools & Technologies: Deep learning, Convolutional neural networks, adversarial networks, Computer vision, lua, torch, OpenCV, Linux, git.

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert