Let's consider some fictive task of generating binary images of size 200x200 (each pixel should be either 0 or 1). As far as I understand, the generator will output 200x200 values between 0 and 1 which are the pixel intensities. The discriminator will then take as input those images, as well as real ones and try to distinguish one from the other.

In this case, why isn't the task of the discriminator trivially simple (i.e. just check if the image contains only 0 and 1, as opposed to floating point values)?


Some extra points/thoughts:

  • Usually in the implementations I've seen the discriminator finishes with some sigmoid activation, so achieving outputs containing pure 0/1 should be next to impossible; why isn't sigmoid super problematic here?
  • Thresholding the outputs to be 0/1 should not be viable, as it makes back-propagation from the discriminator to the generator impossible.
  • Maybe the discriminator cannot learn to distinguish between true/fake examples like I've proposed? (this seems very counter-intuitive, given that checking that all input values are either 0 or 1 should be trivially simple to learn, even for the smallest 2-layer MLP)
  • Maybe GANs don't work for binary images? (this also seems weird, as all images have, in essence, discrete values for pixel intensities, only not as discrete)