Video

Project Summary

The image-to-image translation is the task of taking images from one domain and transforming them so they have the style (or characteristics) of images from another domain. Our project’s goal is to implement a cross-domain image transform, in which the senses of the blocks-based world in Minecraft could be converted into an actual photo composed by items and scenes similar to those in the real world. In particular, we used unpaired images captured in the Minecraft world and photos of actual environments to train the networks. Our model is supposed to capturing special characteristics of one image collection and figuring out how these characteristics could be translated into the other image collection. Therefore, we expected to achieve that the generated pictures from Minecraft are composed of recognizable items and scenes as same as those in input image but with smooth edges and authentic textures (and vice versa).

By implementing the translation, Minecraft users are able to enjoy the experience in a more realistic scene. Meanwhile, the possible applications of Minecraft have been discussed extensively, especially in the fields of computer-aided design and education. Well-constructed networks can be applied to these fields as well to convert models from Minecraft into photos of the actual objects even without paired training data.

Approaches

1. Introduction

2. Formulation

3. Network Architectures

4. Training Details

Evaluation

1. Overview

Here is an example of real-time transformation:

2. Quantitative Evaluation

We are not able to utilize some common metrics such as pixel accuracy or Intersection over Union (IoU) to conduct qualitative evaluations due to the lack of ground truth. On the other hand, the perceptual study we mentioned in the proposal is time consuming and difficut to receive the accurate feedbacks. Therefore, we present the plots of loss functions to illustrate the quantitative evaluation on our model. First, the plot of the adversarial losses and :
adv_loss

In addition, the visualization of cycle consistency losses is shown below:
cycle_loss

It is difficult for us to interpret of loss functions for our model, which shares the similarity with other GANs. The convergence of these objectives is unclear.

3. Qualitative Evaluation

The qualitative evaluation is to see if the created pictures is akin to that of the real world objects. Here, we present comparisons between original image (Minecraft image) and generated image (real-world image):

Based on the results shown above, we could see that through the training process, the generated image become more natural and real. The desert and clouds in Minecraft are transparently transformed to those in real world. We also do the experiment on transformation from real-world photo to Minecraft landscape. The comparison is shown below:

Although our method can achieve compelling results in many cases, the results are far from uniformly positive. A failure case is shown below:

References

  1. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Nets. In NIPS, 2014.

  2. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In ICCV, 2017.

  3. S. Wolf. CycleGAN: Learning to Translate Images (Without Paired Training Data).

  4. PyTorch. https://pytorch.org.