Generative models explore latent spaces to generate diverse, realistic data, revolutionizing AI in art, science, and more.
/

Generative models and their latent space

Generative models explore latent spaces to generate diverse, realistic data, revolutionizing AI in art, science, and more.

Generative models are the new, cutting-edge frontier of Artificial Intelligence. They are a class of machine learning algorithms designed to create new data instances that resemble a given dataset. They learn the underlying patterns, structures, and relationships within the training data and then generate new samples with similar characteristics. Generative models are widely used in various applications, including image and video synthesis, text generation, drug discovery, medical imaging, etc.

A key notion underlying most generative models is that of latent space. This refers to a lower-dimensional, abstract representation of data that captures the underlying structure and variations in the original high-dimensional data space. It can be considered a compressed, more organized space where different data points with similar characteristics are located closer to each other.

The generative model learns to map data points from the latent space back to the original space, effectively generating new data instances that resemble those in the training dataset. Supposing that points in the latent space are regularly disseminated according to a known distribution, we can sample a point in it, and pass it as input to the generative model to obtain a new data instance.

The regular distribution of points in the latent space is the crucial property underlying the generative task. Suppose we are interested in generating, say, images of human faces. If we randomly generate an image, the probability of getting a face is practically null. In most cases, we would obtain noise. Instead, the generator learns how to transform any sample from the latent space into a face, mapping a known (so-called, prior) distribution into the actual distribution we are interested in.

There are several generative models, differing in how they are supposed to capture and reproduce data distributions, the organization of the latent space and the training process used for learning the mapping. Even the terminology can be different: the process mapping points from the latent space to the original space is usually called generation or decoding, while the reverse process may be called encoding, inverse generation, or embedding.

Key points about the latent space in generative models

Learning Representations: The generative model learns to extract meaningful features and representations from the data as it maps them to the latent space. These representations are supposed to capture important attributes or characteristics of the data; different encodings may result in more or less entangled combinations of the different explanatory factors of variation behind the data.

Dimensionality Reduction: Latent spaces typically (but not always) have a lower dimensionality than the original data space. This dimensionality reduction can help simplify the modeling process and make it more tractable, especially when dealing with complex and high-dimensional data. This facilitates the semantic investigation of the latent space and the study of the vector arithmetic properties of the variational factors.

Continuity and Smoothness: By construction, the latent space is often continuous and smooth, meaning that small changes in the latent coordinates correspond to gradual changes in the generated data. This property allows for smooth interpolation between data points, translating to gradual transformations in the generated data and enabling creative exploration of the generative model’s capabilities.

Interpolation and Manipulation: Latent spaces offer the ability to perform meaningful manipulations over data, preventing, in principle, the risk of exiting from their manifold. Any editing operation on data can be understood in terms of a suitable trajectory in the latent space,  allowing for tasks like altering specific attributes or even performing much more complex operations, e.g. head rotation.

Figure 2. Head rotation following trajectories in the latent space of diffusion model.
Source. Cornell University

Domain Adaptation and Style Transfer: Latent spaces can also enable domain adaptation and content or style transfer, where the model can learn to disentangle different factors of variation (such as style and content) and transfer them between samples.

Conditioning: Conditioning in generative models influences data generation by providing additional information. This information, known as a “condition” or “context,” guides the model’s output to align with desired characteristics. This enables controlled generation of content, making the model more adaptable and versatile, enhancing its ability to generate contextually relevant and coherent results.

Several classes of generative models have been investigated over the years, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Autoregressive Models, Normalizing Flows, or the recent Denoising Diffusion Models.

GANs have been the most influential and widely used generative models for several years.   A GAN consists of two neural networks: a generator and a discriminator. The generator creates data samples, while the discriminator differentiates between real and generated samples. The generator improves its ability to produce increasingly realistic samples through adversarial training. They have shown remarkable capabilities in generating high-quality data, especially images and artworks, obtaining numerous breakthroughs and contributing to the popularity of generative techniques.

The leading role of Generative Adversarial Networks has been recently challenged by Denoising Diffusion Models (DDM), which are rapidly imposing as the new state-of-the-art technology in deep generative modeling.

The conceptual foundation of DDMs diverges significantly. Fundamentally, the latent space consists of entirely noisy images, which undergo a gradual transformation via an iterative denoising process, ultimately producing samples resembling those from the training distribution.

Figure 3. Forward (from left to right) and reverse (from right to left) diffusion process. The forward process progressively adds Gaussian noise to the image, and the reverse process removes it.
 Source.

To visualize this process, consider the noise within the source image akin to a cloud of dust gradually condensing into a solid shape corresponding to a sample from the designated distribution. Distinct clouds of noise will yield differing samples. The model aims to discern the underlying principles governing this phenomenon of gradual convergence.

Conclusions

In recent years, generative models have sparked significant interest and research, pushing the boundaries of what is possible in data generation and creative AI applications and greatly contributing to the overall development of AI.

Significant advancements have occurred across numerous domains, yielding impressive outcomes such as generating realistic images and videos, producing coherent text, composing music, and facilitating drug discovery and molecular design.

Other typical applications include super-resolution, image enhancement and restoration, anomaly detection, deepfake detection and prevention.

Generally, generative models represent the state-of-the-art approach for addressing problems characterized by substantial stochasticity in the predicted outcomes. In essence, the focus lies in modeling the probability distribution of the results, wherein the expected value signifies the most probable prediction. The precise goal of generative models is to capture this distribution accurately.

This approach finds applications across a wide array of disciplines, encompassing domains such as weather forecasting, financial analysis and prediction, epidemiology and disease spread, traffic flow and transportation modeling, as well as social dynamics and opinion propagation.

🔬🧫🧪🔍🤓👩‍🔬🦠🔭📚

Journal reference

Asperti, A., Evangelista, D., Marro, S., & Merizzi, F. (2023). Image embedding for denoising generative models. Artificial Intelligence Review, 1-23. https://doi.org/10.1080/02667363.2022.2155932

Andrea Asperti was born in Bergamo, Italy, in 1961. He earned a Ph.D. in Computer Science from the University of Pisa in 1989. Throughout his career, he has held various positions, including working at the Ecole Normale Supérieure in Paris and INRIA-Rocquencourt. Currently, he serves as a full professor of Machine Learning and Deep Learning at the University of Bologna. From 2005 to 2007, he held the position of Director of the Department of Computer Science. From 2000 to 2007, he was a member of the Advisory Committee of the World Wide Web Consortium (W3C). Over time, he has coordinated several national and European projects. His recent research interests revolve around Deep Learning, Generative Modeling, and Deep Reinforcement Learning. He presently represents the University of Bologna for the area of Data Science and Artificial Intelligence within the UnaEuropa Consortium.