Ethics of large language models in medicine and medical research

“Large language models (LLMs) are a type of deep learning model that are trained on vast amounts of text data with the goal of generating new text that closely resembles human responses. The release of ChatGPT (OpenAI, San Francisco, CA, USA), an LLM-based chatbot, on Nov 30, 2022, propelled LLMs to the forefront of public attention and made them accessible to millions of people to experiment with. Since then, medical practitioners and researchers have been exploring potential applications of LLMs, as much of medical practice and research revolve around large text-based tasks, such as presentations, publications, documentation, and reporting. [..]

Ethical considerations for the use of LLMs in academic medicine and medical research should be addressed. First, in safety-critical domains, such as medicine and medical research, hidden bias in LLMs could have serious consequences for patient outcomes. LLMs produce text that are reflections of their training data and thus could perpetuate biases pertaining to race, sex, language, and culture. Moreover, the body of knowledge used to train the models typically arises from well funded institutions in high-income, English-speaking countries. Thus, there is a significant under-representation of perspectives from other regions of the world, leading to mechanistic models of health and disease biased towards understandings of these processes of high-income countries. For example, a clinician in Africa using LLMs to generate an outline of a presentation for treatment options in diabetes could lead towards focusing on treatment paradigms applicable only in high-income countries. This could limit the scope of discussion of different treatments popular in other regions of the world or those that might be more relevant to that country’s patient population.

Second, the use of LLMs in medical writing disrupts the traditional notion of trust, which encompasses the dependability of information and credibility of sources or authors. LLM outputs are untraceable, difficult to discern from the voices of actual authors, and at times might be completely inaccurate. Furthermore, the ground truth in medicine is constantly evolving, making it difficult to determine whether LLMs reflect the most current data, and current models do not evaluate the quality or provide a measure of uncertainty for their outputs. Although this could be a future feature in LLMs, current hesitation regarding LLMs in medical research is justified and evidenced by journals now requiring authors to disclose the use of LLMs or prohibiting their use entirely. Interestingly, this raises the question of whether the use of LLMs will be stigmatised or discriminated against, particularly in areas of great consequence to patient outcomes, such as medical research. Will society forfeit the immense potential of LLMs due to our lack of trust in its use? Or should we embrace this technology but demand a higher level of scrutiny when assessing content generated by LLMs? [..]

[Paragraphs about LLMs serving as authors and being accessible to all academics regardless of financial means were omitted.]

Fifth, the use of LLMs raises the ethical concern relating to the collection, use, and potential dissemination of the data inputted into LLMs. Text input into LLM application programming interfaces might contain sensitive protected health information or unpublished data that might be at risk by being available to their company employees or potential hackers. Given the absence of transparency in how commercial companies use or store their input information, a user must consider whether it is ethical to risk putting sensitive data into this black box. Thus, it will be important for individuals and institutions to implement strict controls for the de-identification of data and obtaining informed consent for the use of protected health information submitted to LLM application programming interfaces.”

Full article, H Li, JT Moon, S Purkayastha et al., The Lancet Digital Health, 2023.4.27