• Nevertheless, today’s frontier models perform competently even on novel tasks they were not trained for, crossing a threshold that previous generations of AI and supervised deep learning systems never managed. Decades from now, they will be recognized as the first true examples of AGI, just as the 1945 ENIAC is now recognized as the first true general-purpose electronic computer. (View Highlight)
  • Frontier models have achieved a significant level of general intelligence, according to the everyday meanings of those two words. And yet most commenters have been reluctant to say so for, it seems to us, four main reasons:
    1. A healthy skepticism about metrics for AGI
    2. An ideological commitment to alternative AI theories or techniques
    3. A devotion to human (or biological) exceptionalism
    4. A concern about the economic implications of AGI (View Highlight)
  • Today’s frontier models are of course not fully qualified to be lawyers or doctors, even though they can pass those qualifying exams. As Goodhart’s law states: “When a measure becomes a target, it ceases to be a good measure.” Better tests are needed, and there is much ongoing work, such as Stanford’s test suite HELM (Holistic Evaluation of Language Models). (View Highlight)