Self attention and transformers driving the evolution of large language models July 8, 2019 16 32 64 128