🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

How do you sample noise for the forward diffusion process?

To sample noise for the forward diffusion process, you start by defining a noise schedule and then progressively apply scaled Gaussian noise to the data over multiple timesteps. The forward diffusion process transforms data (e.g., an image) into pure noise by iteratively adding small amounts of Gaussian noise according to a predefined variance schedule. At each timestep ( t ), the noise is sampled from a normal distribution ( \mathcal{N}(0, I) ) and scaled by coefficients derived from the schedule. These coefficients control how much signal is retained versus how much noise is added at each step, ensuring the data gradually becomes indistinguishable from random noise.

The process relies on two key components: a variance schedule ( \beta_t ) (where ( t ) ranges from 1 to ( T )) and a set of derived parameters ( \alpha_t = 1 - \beta_t ). The cumulative product ( \bar{\alpha}t = \prod{s=1}^t \alpha_s ) tracks the total signal retention up to timestep ( t ). For example, if ( \beta_1 = 0.1 ), then ( \alpha_1 = 0.9 ), and if ( \beta_2 = 0.2 ), ( \alpha_2 = 0.8 ), the cumulative product ( \bar{\alpha}_2 = 0.9 \times 0.8 = 0.72 ). At each step ( t ), the noisy data ( x_t ) is computed as ( x_t = \sqrt{\bar{\alpha}_t} \cdot x_0 + \sqrt{1 - \bar{\alpha}_t} \cdot \epsilon ), where ( \epsilon \sim \mathcal{N}(0, I) ). The ( \sqrt{\bar{\alpha}_t} ) term preserves a fraction of the original data, while ( \sqrt{1 - \bar{\alpha}_t} ) scales the sampled noise to match the accumulated variance.

In practice, this allows the process to be computed efficiently in a single step for any ( t ), bypassing the need to iterate through all previous timesteps. For instance, if training a diffusion model on images, you might precompute all ( \bar{\alpha}_t ) values and randomly sample a timestep ( t ) during training. The noise ( \epsilon ) is then drawn from a standard normal distribution, scaled by ( \sqrt{1 - \bar{\alpha}_t} ), and added to the scaled input ( \sqrt{\bar{\alpha}_t} \cdot x_0 ). This approach ensures consistency across timesteps and simplifies implementation. A common choice for the variance schedule is a linear or cosine-based progression of ( \beta_t ), which balances gradual noise addition with computational stability.

Like the article? Spread the word