[1]余艳杰,孙嘉琪,葛思擘,等.CycleGAN-SN:结合谱归一化和CycleGAN的图像风格化算法[J].西安交通大学学报,2020,54(05):133-141.[doi:10.7652/xjtuxb202005018]
 YU Yanjie,SUN Jiaqi,GE Sibo,et al.CycleGAN-SN: Image Stylization Algorithm Combining Spectral Normalization and CycleGAN[J].Journal of Xi'an Jiaotong University,2020,54(05):133-141.[doi:10.7652/xjtuxb202005018]
点击复制

CycleGAN-SN:结合谱归一化和CycleGAN的图像风格化算法
分享到:

《西安交通大学学报》[ISSN:0253-987X/CN:61-1069/T]

卷:
54
期数:
2020年第05期
页码:
133-141
栏目:
出版日期:
2020-05-10

文章信息/Info

Title:
CycleGAN-SN: Image Stylization Algorithm Combining Spectral Normalization and CycleGAN
文章编号:
0253-987X(2020)05-0133-09
作者:
余艳杰 孙嘉琪 葛思擘 杨清宇
西安交通大学电子与信息学部, 710049, 西安
Author(s):
YU Yanjie SUN Jiaqi GE Sibo YANG Qingyu
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
关键词:
图像风格化 谱归一化 CycleGAN
Keywords:
image stylization spectral normalization CycleGAN
分类号:
TP18
DOI:
10.7652/xjtuxb202005018
文献标志码:
A
摘要:
为解决CycleGAN算法图像风格化质量不高、网络稳定性不强的问题,提出了CycleGAN-SN算法。在CycleGAN算法判别网络的每一个卷积层后添加谱归一化层,通过幂迭代法估算卷积层参数矩阵的谱范数,采用随机梯度下降法更新卷积层参数。由于参数在每一次更新中的变化量很小,只需迭代一次即可快速估算出矩阵的最大奇异值。根据得到的最大奇异值,对卷积层参数进行归一化处理,使得整个判别网络满足1-Lipschitz连续。在4个常用风格图像数据集上进行实验,并与CycleGAN算法进行对比,结果表明:所提算法能够在保留原有图像细节的基础上,生成色彩鲜艳、纹理清晰、风格渲染充分的风格化图像; 在训练过程中的损失函数振荡幅度小,能够使用更大的学习率进行训练,稳定性较强; 能够有效减少网络收敛所需的步数,收敛速度较快; 在测试阶段一次性风格化751幅图像时,时间最多仅增加0.63 s,几乎没有额外的时间消耗。
Abstract:
To solve the problems of lower image stylization quality of CycleGAN algorithm and weaker network stability, a CycleGAN-SN algorithm is proposed. This algorithm adds a spectral normalization layer after each convolutional layer of the discriminator network of CycleGAN algorithm, then estimates the spectral norm of the parameter matrix of the convolutional layer by a power iteration method, and updates the parameters of the convolutional layer with a random gradient descent method. The parameter change in each update is small, so it only needs to iterate once to estimate the maximum singular value of the matrix. The maximum singular value obtained is used to normalize the parameters of the convolution layer, which makes the entire discriminative network meet the 1-Lipschitz continuity. Experiments are performed on 4 commonly used style image datasets. Compared with CycleGAN algorithm, the results show that the proposed algorithm can generate higher-quality stylized pictures with bright colors, clear textures, and sufficient style renderings while retaining the details of the original pictures. The loss function of the proposed algorithm in the training process has a small oscillation amplitude. The network can be trained at higher learning rate and with greater stability. The proposed algorithm effectively reduces the number of steps required for network convergence and has a faster convergence rate. The proposed algorithm stylizes 751 images at a time during the test phase with protracting the period only by a maximum of 0.63 seconds, almost no extra time consumption.

参考文献/References:

[1] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks [C]∥Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2016: 2414-2423.
[2] JUSTIN J, ALEXANDRE A, LI F F. Perceptual losses for real-time style transfer and super-resolution [C]∥Proceedings of the 14th European Conference on Computer Vision. Cham, Germany: Springer, 2016: 694-711.
[3] ULYANOV D, LEBEDEV V, VEDALDI A, et al. Texture networks: feed-forward synthesis of textures and stylized images [C]∥Proceedings of the 33rd International Conference on Machine Learning. Princeton, NJ, USA: IMLS, 2016: 2027-2041.
[4] ULYANOV D, VEDALDI A, LEMPITSKY V. Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis [C]∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2017: 6924-6932.
[5] LI Yanghao, WANG Naiyan, SHI Jianping, et al. Revisiting batch normalization for practical domain adaptation [C/OL]∥Proceedings of the 5th International on Learning Representations. London, UK: ICLR, 2017. [2019-08-01]. https:∥arxiv.org/pdf/1603. 04779v4.pdf.
[6] DUMOULIN V, SHLENS J, KUDLUR M. A learned representation for artistic style [C/OL]∥Proceedings of the 5th International on Learning Representations. London, UK: ICLR, 2017. [2019-08-01]. https:∥arxiv. org/pdf/1610.07629.pdf.
[7] GHIASI G, LEE H, KUDLUR M, et al. Exploring the structure of a real-time, arbitrary neural artistic stylization network [C/OL]∥Proceedings of the 2017 British Machine Vision Conference. Guildford, UK: BMVA Press, 2017. [2019-08-01]. https:∥arxiv. org/pdf/1705. 06830.pdf.
[8] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization [C]∥Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ, USA: IEEE, 2017: 1501-1510.
[9] PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359.
[10] LI Y, WANG N, LIU J, et al. Demystifying neural style transfer [C/OL]∥Proceedings of the 26th International Joint Conference on Artificial Intelligence. California, USA: IJCAI, 2017. [2019-08-01]. https:∥arxiv.org/abs/1701.01036.
[11] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets [C]∥Proceedings of the 27th International Conference on Neural Information Processing Systems. Cham, Germany: Springer, 2014: 2672-2680.
[12] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]∥Proceedings of the IEEE International Conference on Computer Vision. Piscataway, NJ, USA: IEEE 2017: 2223-2232.
[13] MAO X, LI Q, XIE H, et al. Least squares generative adversarial networks [C]∥Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ, USA: IEEE, 2017: 2794-2802.
[14] MIYATO T, KATAOKA T, KOYAMA M, et al. Spectral normalization for generative adversarial networks [C/OL]∥Proceedings of the 6th International on Learning Representations. London, UK: ICLR, 2018. [2019-08-01]. https:∥arxiv.org/pdf/1802. 05957.pdf
[15] ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein generative adversarial networks [C]∥Proceedings of the 34th International Conference on Machine Learning Princeton, NJ, USA: IMLS, 2017: 298-321.
[16] MAAS A L, HANNUN A Y, NG A Y. Rectifier nonlinearities improve neural network acoustic models [C/OL]∥Proceedings of the 30th International Conference on Machine Learning. Princeton, NJ, USA: IMLS, 2013. [2019-08-01]. http:∥robotics.stanfo rd.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf.
[17] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]∥Proceedings of the 32nd International Conference on Machine Learning. Princeton, NJ, USA: IMLS, 2015: 448-456.
[18] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]∥Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE 2016: 770-778.
[19] ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C]∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2017: 1125-1134.
[20] LEDIG C, THEIS L, HUSZáR F, et al. Photo-realistic single image super-resolution using a generative adversarial network [C]∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2017: 4681-4690.
[21] LI C, WAND M. Precomputed real-time texture synthesis with Markovian generative adversarial networks [C]∥Proceedings of the 14th European Conference on Computer Vision. Cham, Germany: Springer, 2016: 702-716.
[22] SHRIVASTAVA A, PFISTER T, TUZEL O, et al. Learning from simulated and unsupervised images through adversarial training [C]∥Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2017: 2107-2116.
[23] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs [C]∥Proceedings of the 2016 Conference of Advances in Neural Information Processing Systems. Vancouver, Canada: NIPS, 2016: 2234-2242.
[24] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium [C]∥Proceedings of the 2017 Advances in Neural Information Processing Systems. Vancouver, Canada: NIPS, 2017: 6629-6640.

备注/Memo

备注/Memo:
收稿日期: 2019-09-21。作者简介: 余艳杰(1993—),男,硕士生; 葛思擘(通信作者),男,副教授。
更新日期/Last Update: 2020-05-10