The author delves into the world of glitch tokens in the LLM DeepSeek, highlighting the challenges posed by nonstandard characters like Chinese and broken Unicode. They uncover anomalous tokens like ‘Nameeee’ or ‘EDMFunc’ and explore the unique images associated with these tokens. The behavior of these glitch tokens, especially in response to specific prompts, reveals a fascinating complexity and unpredictability. The post serves as an invitation for further exploration and discovery in the embedding space, promising more hidden secrets to uncover. Overall, the author’s detailed analysis sheds light on the intriguing world of glitch tokens in the LLM DeepSeek.
https://outsidetext.substack.com/p/anomalous-tokens-in-deepseek-v3-and