Draft:Humanity's Last Exam

Submission declined on 16 February 2025 by LunaEclipse (talk).

Thank you for your submission, but the subject of this article already exists in Wikipedia. You can find it and improve it at Humanity's Last Exam instead.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by LunaEclipse 2 days ago. Last edited by LunaEclipse 2 days ago. Reviewer: Inform author.

Resubmit

Please note that if the issues are not fixed, the draft will be declined again.

In artificial intelligence, Humanity's Last Exam is a project aiming to create a benchmark for large language models consisting of expert-level questions in a variety of areas. Organized by the nonprofit organization Center for AI Safety and the company Scale AI, the project started soliciting question submissions from the academic community and other subject-matter experts in September 2024.^[1] The goal of the project is to measure progress of AI models towards human expert knowledge and abstract reasoning abilities, going beyond the difficulty level of undergraduate questions as included e.g. in the MMLU benchmark.^[2]

Humanity's Last Exam was finally published in January 23, 2025. The dataset consists of 3,000 challenging questions across over a hundred subjects.^[3]

Scope of questions

The project aims to collect several thousand questions in all fields, ranging from mathematics, physics, biology and electrical engineering to analytic philosophy.^[1]^[4] Answers to these questions are required to be objective and self-contained and have to be submitted together with the question.^[5]

Competition and History of the project

The project was originally announced on September 15, 2024, in a blog post^[5] by Dan Hendrycks and Alexandr Wang, and covered by several news sources. To encourage the submission of questions, a prize pool of $500,000 (sponsored by Scale AI) was announced for the top 550 questions submitted until November 1, 2024. Additionally, all authors of questions accepted in the benchmark were offered co-authorship in the resulting publication.^[2]^[6]

Submission of questions was later extended beyond the original November deadline.^[7]

External links

References

^ ^a ^b Tharin Pillay (2024-12-24). "AI Models Are Getting Smarter. New Tests Are Racing to Catch Up". Time. Retrieved 2025-01-12.
^ ^a ^b Jeffrey Dastin and Katie Paul (2024-09-16). "AI experts ready 'Humanity's Last Exam' to stump powerful tech". Reuters. Retrieved 2025-01-12.
^ Humanity's Last Exam, January 23, 2025, retrieved February 3, 2025
^ "Humanity's Last Exam Submission Form". Center for AI Safety. Archived from the original on 2024-12-26. Retrieved 2025-01-12.
^ ^a ^b "Submit Your Toughest Questions for Humanity's Last Exam". Center for AI Safety. 2024-09-15. Retrieved 2025-01-12.
^ Carroll, Mickey (September 18, 2024). "Public asked to help create 'humanity's last exam' to spot when AI achieves peak intelligence". Sky News. Retrieved January 15, 2025.
^ Dan Hendrycks [@DanHendrycks] (2024-11-10). "As we clean up the dataset, we're accepting questions at agi.safe.ai" (Tweet). Retrieved 2025-01-12 – via Twitter.

Category:Large language models

[time2024-1] Tharin Pillay (2024-12-24). "AI Models Are Getting Smarter. New Tests Are Racing to Catch Up". Time. Retrieved 2025-01-12.

[reuters24-2] Jeffrey Dastin and Katie Paul (2024-09-16). "AI experts ready 'Humanity's Last Exam' to stump powerful tech". Reuters. Retrieved 2025-01-12.

[3] Humanity's Last Exam, January 23, 2025, retrieved February 3, 2025

[submission-4] "Humanity's Last Exam Submission Form". Center for AI Safety. Archived from the original on 2024-12-26. Retrieved 2025-01-12.

[safeblog-5] "Submit Your Toughest Questions for Humanity's Last Exam". Center for AI Safety. 2024-09-15. Retrieved 2025-01-12.

[6] Carroll, Mickey (September 18, 2024). "Public asked to help create 'humanity's last exam' to spot when AI achieves peak intelligence". Sky News. Retrieved January 15, 2025.

[hendrycks_accepting-7] Dan Hendrycks [@DanHendrycks] (2024-11-10). "As we clean up the dataset, we're accepting questions at agi.safe.ai" (Tweet). Retrieved 2025-01-12 – via Twitter.

[1]

[2]

[3]

[4]

[5]

[6]

[7]