Introduced the Transformer architecture, revolutionizing sequence modeling by relying entirely on self-attention mechanisms without recurrence. Achieved state-of-the-art results on machine translation tasks while being more parallelizable and requiring significantly less training time.