À la Une

Soutenance de thèse Sohrab Ferdowsi


M. Sohrab Ferdowsi soutiendra en anglais, en vue de l'obtention du grade de docteur ès sciences mention informatique de la Faculté des sciences de l'Université de Genève, sa thèse intitulée:

Learning to compress and search visual data in large-scale systems

Date: Mardi 11 décembre 2018 à 14h30

Lieu: CUI / Battelle bâtiment A, Auditoire rez-de-chaussée



  • Prof. Svyatoslav Voloshynovskiy (Directeur, Université de Genève)
  • Prof. Thierry Pun (Université de Genève)
  • Prof. Stéphane Marchand-Maillet (Université de Genève)
  • Prof. Dr. Christine Guillemot (INRIA, Rennes, France)
  • Prof. Dr. Karen Eguiazarian (Université de Tampere, Finlande)
  • Dr. François Fleuret (IDIAP, Martigny, Suisse)


This thesis studies high-dimensional vectorial data, in particular, visual information, e.g., digital images or image descriptors and addresses several of the issues dealt with these data, particularly under large-scale setups.

Attempts for general signal and image modeling in the literature are first reviewed where they are framed under the Bayesian paradigm. These are categorized roughly as basic and composite models, where the former benefits from low sample-complexity, as well as sound theoretical bases, while the latter achieves better performance benefiting from larger data samples.

This thesis pursues the algorithmic development of its models from basic to composite ones. The basic models are developed under two families of synthesis and analysis priors. Our synthesis model introduces the rate-allocation criterion as a regularization to the K-means algorithm and hence is termed the VR-Kmeans. We show that this is very successful in avoiding over-fitting of K-means at high-dimensional settings and particularly under low-sample setups.

Our analysis-like formulation leads to the framework of Sparse Ternary Codes (STC). This starts with the characterization of its information-theoretic properties and follows by investigating ways to maintain rate-distortion optimal encoding and decoding. We then notice the limitations of these models in achieving high-fidelity and low-complexity encoding and point out the need to opt for more intricate models.

The evolution from basic and single-layer architectures to composite and multi-layer models is done using the principle of successive refinement in information theory. In particular, the VR-Kmeans and the STC are extended to the RRQ and the ML-STC using additive residual-based encoding, respectively. These models are analyzed algorithmically and their rate-distortion performances are shown to be superior compared to their existing alternatives.

The ML-STC, and its more data-dependent version the ML-STC-Procrustean admit yet another evolution. This is the joint parameter update using the back-propagation framework which resembles that of artificial neural networks and hence we term it as the STNets. However, our model has certain particularities as compared to the common deep learning frameworks. Instead of starting from random weights, the STNets is first pre-trained layer-by-layer and according to the STC. This is then fine-tuned using back-propagation along with other standard recipes of training neural networks. Technically, this is possible thanks to the properties of ternary encoding which allows us to replace the non-differentiable discrete non-linearity with its smooth counterpart and without incurring approximation errors. Consequently, we are able to learn discrete and compact representations for data and under a wide range of data-availability and operational rate-regimes.

Having developed our algorithmic infrastructure, we next tailor them to three important practical applications. First, the problem of large-scale similarity search in retrieval systems is addressed, where complexity and memory constraints limit the naïve exhaustive scan of the database for a given query. We develop a complete search system based on the STC framework and show its superiority in terms of the triple complexity-memory-performance trade-offs as compared to the two main-stream solutions from the literature, namely the binary hashing and the vector-quantization based family of methods.

We next target the problem of learned image compression. We argue the benefits of learning to compress w.r.t. the conventional codecs and show that it is possible to compress high-resolution natural images using our algorithms trained on a limited number of images and achieve comparable results to the JPEG2000, even without performing different stages of the compression pipeline. More particularly and for a class of domain-specific images, we show that it is possible to benefit from the extra structural redundancy present in these images to compress them further. We also show that learning to compress can be beneficial beyond the task of compression itself.

Finally, we show that compression can be used to solve inverse problems. This is achieved by imposing the compressibility of data under a certain trained model as an effective prior to regularize the solution of ill-posed inverse problems, which is invoked in an iterative algorithm. In particular, we show that it is possible to deonise images using the JPEG2000, or recover under-sampled and noisy auto-regressive data using the ML-STC and through our proposed algorithm.

The thesis is concluded by pointing out some open problems and issues. This paves the way for certain potential directions of very promising future research to be pursued in the interplay between signal processing and machine learning and under the theme of learning compact representations.