Pile-T5 – Oh TL;DR

The T5 model has been a favorite in the NLP community, but improvements were made with the creation of Pile-T5, a new version trained on the Pile dataset using the LLAMA tokenizer. This new model outperforms the original T5 and even its token-matched settings, especially excelling in code-related tasks. The Pile-T5 models were evaluated on various benchmarks and showed competitive performance on SuperGLUE, CodeXGLUE, MMLU, and Bigbench Hard. Despite some unexpected lagging in certain benchmarks, the intermediate checkpoint release provides valuable insights into the evolution of the models. Overall, Pile-T5 is positioned as a promising model for multitask finetuning and other encoder-decoder tasks.

https://blog.eleuther.ai/pile-t5/