Explaining the SDXL Latent Space

The author starts by sharing their background story of creating correction filters for the SDXL inference process. They found that the SDXL output often had issues with noise or smoothness and limited color range due to how SD models work. To improve the output, the author explored the SDXL latents and discovered that the tensor used in the diffusion models had 4 channels. They explain how the 8-bit pixel space has 3 channels (RGB) and how the SDXL latent representation has 4 channels. The author then discusses a direct conversion function from SDXL latents to RGB using a linear approximation. They also explore why the SDXL color range is biased towards yellow. The author provides code snippets and techniques for improving the SDXL output, such as centering the values, color balancing, removing outliers, and maximizing the tensor. The author concludes by demonstrating the modifications made to the SDXL output and how it improves color range and detail. The author also mentions that these modifications allow for long prompts at high guidance scales.

https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space