基于生成对抗网络的文本转图像研究

TEXT-TO-IMAGE BASED ON GENERATIVE ADVERSARIAL NETWORK

  • 摘要: 近几年,生成对抗网络(Generative Adversarial Network,GAN)在文本转图像中已经取得了显著成果,但是当生成复杂图像时,一些重要的细粒度信息常常会丢失,包括图像边缘模糊、局部纹理不清晰等问题。为了解决上述问题,在堆叠式生成对抗网络(Stack GAN)基础上,该文提出一种基于深度注意力的堆叠式生成对抗网络模型(Deep Attention Stack GAN, DAS-GAN),模型第一个阶段生成图像的基本轮廓和颜色,第二个阶段部分外观和颜色的补充和校正,最后一个阶段细化图像的纹理细节。通过在CUB数据集上实验的初始得分发现,DAS-GAN相比StackGAN++和AttnGAN分别提高了0.296和0.078,从而证明了该模型的有效性。

     

    Abstract: In recent years, generative adversarial network (GAN) has achieved remarkable results in text-to-image conversion, but when generating complex images, some important fine-grained information is often lost, including problems such as blurred image edges and unclear local textures. In order to solve the above problems, on the basis of Stack GAN, a deep attention stack GAN (DAS-GAN) is proposed. The first stage of the model generated the basic outline and color of the image, the second stage added and corrected the partial appearance and color, and the last stage refined the texture details of the image. Through the initial scores of experiments on the CUB data set, it is found that DAS-GAN is 0.296 and 0.078 higher than StackGAN+KG-*3+ and AttnGAN, which verifies the effectiveness of the model.

     

/

返回文章
返回