ChatGPT Flunks Simple Math Problems: Study

By The Financial District
Jul 24, 2023
1 min read

High-profile AI chatbot ChatGPT performed worse on certain tasks in June than its March version, a Stanford University study found, Paolo Confino reported for Fortune.

Photo Insert: Researchers found wild fluctuations—called drift—in the technology’s ability to perform certain tasks.

The study compared the performance of the chatbot, created by OpenAI, over several months at four “diverse” tasks: solving math problems, answering sensitive questions, generating software code, and visual reasoning.

Researchers found wild fluctuations—called drift—in the technology’s ability to perform certain tasks. The study looked at two versions, once called GPT-3.5 and another known as GPT-4.

The most notable results came from research into GPT-4’s ability to solve math problems. Over the course of the study, researchers found that in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked.

But just three months later, its accuracy plummeted to a lowly 2.4%.

All the news: Business man in suit and tie smiling and reading a newspaper near the financial district.

Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time—while the June version was consistently right, answering correctly 86.8% of the time.

Science & technology: Scientist using a microscope in laboratory in the financial district.

Similarly varying results happened when the researchers asked the models to write code and to do a visual reasoning test that asked the technology to predict the next figure in a pattern.

James Zuo, a Stanford computer science professor who was one of the study’s authors, says the “magnitude of the change” was unexpected from the “sophisticated ChatGPT.”

WEEKLY FEATURE : MVP Group Keeps Lights On During Pandemic

Optimize asset flow management and real-time inventory visibility with RFID tracking devices and custom cloud solutions.