AutoGAN: Neural Architecture Search for Generative Adversarial Networks [arXiv:1908.03835v1] In this paper, we present the first preliminary study on introducing the NAS algorithm to generative adversarial networks (GANs), dubbed AutoGAN–Specifically, our discovered architectures achieve highly competitive performance compared to current state-of-the-art hand-crafted GANs, e.g., setting new state-of-the-art FID scores of 12.42 on CIFAR-10, and 31.01 on STL-10, respectively
Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss [arXiv:1908.05840v1] The discriminator is based on an auxiliary classifier GAN to classify the tag information as well as genuineness. In addition, we propose a novel network structure called SECat, which makes the generator properly colorize even small features such as eyes, and also suggest a novel two-step training method where the generator and discriminator first learn the notion of object and shape and then, based on the learned notion, learn colorization, such as where and how to place which color.
On the Validity of Self-Attention as Explanation in Transformer Models [ arXiv:1908.04211v1] We investigate to what extent the implicit assumption made in many recent papers – that hidden embeddings at all layers still correspond to the underlying words – is justified. — we argue that attention visualizations are misleading and should be treated with care when explaining the underlying deep learning system.
Attention is not not Explanation (see above) [arXiv:1908.05164v1] In this work, we propose the Unconstrained Monotonic Neural Network (UMNN) architecture based on the insight that a function is monotonic as long as its derivative is strictly positive. — we argue that attention visualizations are misleading and should be treated with care when explaining the underlying deep learning system.
Predicting 3D Human Dynamics from Video [arXiv:1908.04781v1] In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input. — Our approach can be trained on video sequences obtained in-the-wild without 3D ground truth labels [site]
Temporal Collaborative Ranking Via Personalized Transformer [arXiv:1908.05435v1] we find our model is not only more interpretable but also able to focus on recent engagement patterns for each user. Moreover, our SSE-PT model with a slight modification, which we call SSE-PT++, can handle extremely long sequences and outperform SASRec in ranking results with comparable training speed, striking a balance between performance and speed requirements. Code and data are open sourced at this URL.
Multimodal Emotion Recognition Using Deep Canonical Correlation Analysis [arXiv:1908.05349v1] The experimental results indicate that DCCA has greater robustness. By visualizing feature distributions with t-SNE and calculating the mutual information between different modalities before and after using DCCA, we find that the features transformed by DCCA from different modalities are more homogeneous and discriminative across emotions.