What happened to BERT and T5?

The web content explores the evolution of language and NLP models, focusing on the shift from encoder models like BERT to the new era of large language models (LLMs). The author delves into the intricacies of encoder-decoders, decoder-only models, and PrefixLMs, highlighting their similarities and differences. They discuss the pros and cons of different model architectures and the value of denoising objectives in pretraining. The content also touches on the gradual phasing out of BERT-like models in favor of more flexible denoising methods, leading to the rise of autoregressive models like T5. The author emphasizes the importance of understanding inductive biases in LLM research.

https://www.yitay.net/blog/model-architecture-blogpost-encoders-prefixlm-denoising