ChatGPT Generates Error-Filled Cancer Treatment Plans: Study


Artificial intelligence chatbot ChatGPT has been one of the most talked about technologies of the year.
Jakub Porzycki/NurPhoto via Getty Images
  • Study finds that ChatGPT provided false information when asked to design cancer treatment plans.
  • The chatbot mixed correct and incorrect information together, making it harder to decipher. 
  • Accuracy issues with Generative AI means it's unlikely to be taking over from doctors any time soon.

ChatGPT may be taking the world by storm – but a new study suggests there is one key area where it is unlikely to be used any time soon. 

Researchers at Brigham and Women's Hospital – a teaching hospital of Harvard Medical School in Boston, Massachusetts – found that cancer treatment plans generated by OpenAI's revolutionary chatbot were full of errors.

According to the study, which was published in the journal JAMA Oncology and initially reported by Bloomberg – when asked to generate treatment plans for a variety of cancer cases, one-third of the large language model's responses contained incorrect information. 

The study also noted that the chatbot had a tendency to mix correct and incorrect information together, in a way that made it difficult to identify what was accurate. Out of a total of 104 queries, around 98% of ChatGPT's responses included at least one treatment recommendation that met the National Comprehensive Cancer Network guidelines, the report said.

The authors were "struck by the degree to which incorrect information was mixed in with correct information, which made it especially difficult to detect errors – even for experts," coauthor Dr. Danielle Bitterman told Insider.

"Large language models are trained to provide responses that sound very convincing, but they are not designed to provide accurate medical advice," she added. "The error rate and the instability of responses are critical safety issues that will need to be addressed for the clinical domain."

ChatGPT became an overnight sensation when it launched in November 2022, reaching 100 million active users two months after its debut. The chatbot sparked a rush to invest in AI companies and an intense debate over the long-term impact of artificial intelligence; Goldman Sachs research found it could affect 300 million jobs globally. 

Despite ChatGPT's success, generative AI models are still prone to "hallucinations," where they confidently present information that is misleading or wildly incorrect. Famously, Google's ChatGPT rival Bard wiped $120 billion off the company's stock value when it gave an inaccurate answer to a question about the James Webb space telescope.

Efforts to integrate AI into healthcare, primarily to streamline administrative tasks, are already underway. Earlier this month, a major study found that using AI to screen for breast cancer was safe, and suggested it could almost halve the workload of radiologists. 

A computer scientist at Harvard recently found that GPT-4, the latest version of the model, could pass the US medical licensing exam with flying colors – and suggested it had better clinical judgment than some doctors.

Despite this, accuracy issues with generative models such as ChatGPT mean they are unlikely to be taking over from doctors any time soon.

The JAMA study found that 12.5% of ChatGPT's responses were "hallucinated," and that the chatbot was most likely to present incorrect information when asked about localized treatment for advanced diseases or immunotherapy.   

OpenAI has acknowledged that ChatGPT can be unreliable. The company's terms of usage warn that their models are not designed to provide medical information, and should not be used to "provide diagnostic or treatment services for serious medical conditions."

OpenAI did not immediately respond to Insider's request for comment. 



Post a Comment

0 Comments