CVPR 2020 – Some Highlights in Transfer Learning

Virtual CVPR2020 took place last week and despite some teething issues as we all continue to settle into our new zoom reality the conference was an overall success! With 7,600 attendees and a reported 1,497,800 minutes of video calls it’s safe to say that a lot was packed into a few short days. I wanted to use this blog post to highlight some of the papers I thought were particularly interesting. There were a lot of different themes covered in CVPR2020 and I am going to focus here on transfer learning. At the bottom of this article I’ve included some other papers I found interesting and link to another blog which provides a pretty good overview of all kinds of other topics that were covered.

CVPR 2020 – Some Highlights in Transfer Learning - preview image
## Overview of CVPR 2020 Virtual CVPR2020 took place last week and despite some teething issues as we all continue to settle into our new zoom reality the conference was an overall success! With 7,600 attendees and a reported 1,497,800 minutes of video calls it’s safe to say that a lot was packed into a few short days. I wanted to use this blog post to highlight some of the papers I thought were particularly interesting. There were a lot of different themes covered in CVPR2020 and I am going to focus here on transfer learning. At the bottom of this article I’ve included some other papers I found interesting and link to another blog which provides a pretty good overview of all kinds of other topics that were covered.
## Brief Overview of Transfer Learning Transfer learning is an area that I have always found innately interesting. To me, it seems that high performance transfer learning is inevitable on our march towards AGI, and moreover a key piece of the puzzle. You should check out the pretty cool PathNet from Google Deep Mind for an impressive example of the potential of transfer learning ([here](https://deepmind.com/research/publications/pathnet-evolution-channels-gradient-descent-super-neural-networks)). Alongside a large-scale view of our approach to AGI, transfer learning also has a huge role to play in ‘real-world’ applications of data science and machine learning. It can be immensely useful in scenarios where we have little data. It allows us to re-use, hopefully successfully, models trained on different data for different tasks and hopefully still achieve an acceptable level of performance. Below you will find some of the papers in this area at CVPR2020 that I found interesting, and what in them I thought was particularly cool. Hopefully it gives you an idea of some of the new emerging ideas in this area and their potential.
## CVPR2020 – Personal Highlights in Transfer Learning This paper (**Towards Inheritable Models for Open-Set domain Adaptation**: [here](https://arxiv.org/pdf/2004.04388.pdf)) does some pretty interesting work proposing *inheritable* models. These models are trained by a *vendor* on a labelled source domain and then transferred to a *client* which uses the inherited model on the new task in the target domain. The idea is that the inheritable models can be used to pass task-specific knowledge from the vendor to the client without any need for data sharing between the two. The proposed methodology works on unsupervised open-set domain adaptation problems – i.e. it can deal with scenarios where there are categories/classes in the target domain which don’t exist in the source domain. The authors of this paper also introduce a pretty neat way of defining and quantifying *inheritability*. This approach achieves some pretty impressive results vs. other state-of-the-art methods. I’m not going to dive into too much detail on the exact architecture and maths of all that they propose – but if you are interested do check out the paper!
Continual learning is a sub-topic of machine learning which is closely linked to transfer learning and this is a paper I really liked: **Conditional Channel Gated Networks for Task-Aware Continual Learning** ([here](https://arxiv.org/pdf/2004.00070.pdf)). When we try to train a network for multiple tasks we run pretty quickly into the problem of catastrophic forgetting. Abati et al. introduce a method of using binary gating modules at each convolutional layer to decide which kernels should be used. The binary gating modules select which modules should be used according to the task at hand. Using Gumbel-Softmax sampling they bypass the problem of non-differentiable binary on/off thresholds of the gates and use backpropogation to train the gates. Moreover, a task classifier is also introduced. Which means that the model can be used in task-agnostic settings (when we can’t tell the model what the task is). The solution is essentially to run the model for all the different tasks simultaneously and then use a task classifier to select the most appropriate feature map at the last convolutional layer. This approach has limitations, but since the gating modules only select a few kernels at each convolutional layer it isn’t quite as computationally expensive as it sounds.
<img src="https://storage.googleapis.com/published-content/cvpr2020-some-highlights-in-transfer-learning/architecture.png"></img> Source: arXiv:2004.00070v1 [cs.CV](http://cs.cv/) 31 Mar 2020
GANs are an area that previously haven’t gathered much attention from a transfer learning perspective, which with hindsight is odd. They are computationally heavy and difficult to train. **MineGAN: effective knowledge transfer from GANs to target domains with few images** ([here](https://arxiv.org/pdf/1912.05270.pdf)) takes a shot at shifting that balance. They propose a method for transferring knowledge from pretrained GANs to new domains with little target data. Personally, I think the high-level ideas presented here are remarkably simple. The authors propose to shift the prior input distribution to the GAN towards the most promising regions given the target data. This shift is achieved using a *miner network* which is much simpler than the GAN itself and therefore is easier to train and less prone to overfitting. The method proposed can also be fairly straightforwardly be expanded to knowledge transfer from multiple GANs to a single model – which is pretty cool.
The last paper I wanted to highlight is more of a tool that hopefully will be interesting for you on your own ventures into the region of Transfer Learning. **Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data** ([here](https://arxiv.org/pdf/2001.02799.pdf)) propose a freely accessible database that can be used to access data to train models for transfer learning. Deciding what source domain data to train your model on to have the best results in the target domain is, after-all, not necessarily straightforward. Access the database here ([here](http://aidemos.cs.toronto.edu/nds/#search)).
## Wrap-Up So, I hope you found that quick tour of transfer learning at CVPR2020 interesting. It’s clear that there are a lot of super interesting ideas in this space and it will be interesting to see how they continue to evolve in the near future and what the impact will be. These papers are by no means cover everything related to transfer learning at CVPR2020, they are just some of my personal highlights. Below I’ve included some links to other papers at CVPR2020 (in both transfer learning and other areas) I found particularly interesting, and some other hopefully useful links – again these are also in no way exhaustive!
--- ### For your interest: - The Virtual CVPR2020 website: [here](http://cvpr20.com) - All of the papers at CVPR2020 can be accessed here: [here](http://openaccess.thecvf.com/CVPR2020.py) - A pretty good blog post covering a wide range of topics and themes that came up (including transfer learning!): [here](https://yassouali.github.io/ml-blog/cvpr2020/ )
### Other CVPR2020 papers I thought were interesting - Regularizing CNN Transfer Learning with Randomised Regression – Zhong and Maki ([here](https://arxiv.org/pdf/1908.05997.pdf)) - Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion - Yin et al. ([here](https://arxiv.org/pdf/1912.08795.pdf)) - Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions – Ye et al. ([here](https://arxiv.org/pdf/1812.03664.pdf)) - Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations – Cui et al. ([here](https://arxiv.org/pdf/2003.12237.pdf)) - Unsupervised learning of Probably Symmetric Deformable 3D Objects From Images in the Wild - Wu et al. ([here](https://arxiv.org/pdf/1911.11130.pdf)) (N.B. Best Paper Award)

Run this article as a notebook

Deepnote is a new kind of data science notebook. Jupyter-compatible and with real-time collaboration.

Sign-up for the waitlist below, or find out more here.

To be continued...