The Compelling Need for Shared Responsibility of AI Oversight: Lessons From Health IT Certification

“As artificial intelligence (AI) tools become more consistently used in health care, federal agencies, health care facilities, medical societies, and other stakeholders are grappling with how to ensure they do not introduce unintended patient harm. [..] Regardless of approach, a well-designed one could improve safety, promote patient and health care professional confidence in AI use, and incentivize developers and users to focus on these important issues. Developing a testing and certification approach that is effective, rigorous, and rapid and that is a shared responsibility of both developers and users is necessary to meet the needs of multiple stakeholders.

The Office of the National Coordinator for Health Information Technology (ONC) voluntary certification program for health information technology (IT) developers, created as part of the $34 billion Health Information Technology for Clinical and Economic Health Act (2009), offers several insights that should be used to inform any AI assurance testing and certification process. Studies examining the ONC’s certification process revealed that some certified electronic health records (EHRs) did not actually meet certain certification criteria yet were still certified as doing so, and some EHR developers were accused of falsifying certification information, resulting in millions of dollars in Department of Justice settlements. [..] Based on the health IT certification program, we provide 5 cautionary recommendations to inform AI assurance testing and certification.

Testing and certification should occur on products that closely resemble what will be used by end users. Most health IT certification criteria were focused on testing and certification of base health IT products that were not yet configured for use in real clinical environments. During configuration, features of the health IT system that were certified as meeting certain criteria may be modified such that they no longer adhere. In the context of AI, it is even more important to monitor AI algorithms performance in local use, especially since only a small subset of validated models have demonstrated clinically meaningful benefits. AI can also have algorithm drift, which is a decline in model performance over time due to minor changes in the underlying data and/or populations. All of this emphasizes the importance of localized and ongoing real-world testing and monitoring of AI.

Since it is not feasible for independent agencies to perform algorithm certification in every setting in which AI tools will be used, this function needs to be performed by local AI governance and oversight and could be required as part of the Centers for Medicare & Medicaid Services (CMS) Conditions of Participation. This would require health care facilities to adopt these practices since they would have to demonstrate to auditing agencies a local process for ensuring appropriate monitoring of these algorithms after deployment, as recommended by the White House Blueprint for an AI Bill of Rights. To support local testing and monitoring, those performing assurance testing should make their software open source and available for use at the local level. [..]

Mitigate potential consequences associated with AI developers paying certifying bodies. Under the ONC’s program, health IT developers pay certifying bodies to review and certify their products. This structure poses potential risks. First, certifying bodies may feel obligated to approve developer products since the developer is their customer. Second, health IT developers can select from multiple certifying bodies and the certification bodies are competing against each other for customers, resulting in possible pressure to certify all health IT developer products, even when they do not meet certification standards, to maintain a positive standing in the industry. Health IT developers may be less likely to select a certification body known for rigorous testing and a history of identifying issues that would prevent certification of their product. While this issue is not unique to health care, when designing AI certification programs, it will be important to consider how the certification body and AI developer relationship might be affected by the payment structure. [..]

Certification should not rely solely on attestation. To expedite the certification process and minimize burden on health IT developers, certain aspects of the health IT certification program, such as those related to usability, required developers to attest to using a user-centered design process. Analysis of attestation documents and inspection of developer organizations’ actual processes found that developers were falsely attesting to meeting requirements. Rather than relying solely on attestation, an AI certification program should require AI developers to publicly show evidence in support of any certification requirements to reduce the likelihood of false attestations and improve the rigor of the certification process.

Periodic recertification should be required, especially when there are substantial changes to the AI or when new certification criteria are defined. Like health IT systems that require software updates and other modifications over time, AI algorithms may require regular modifications for performance optimization. In addition, as AI continues to evolve in both its development and application, new risks may be identified and potentially mitigated through new certification criteria. An AI certification process should account for evolving technology and certification criteria by having clear guidelines for when recertification will be required.

Postmarket surveillance should be used to promote maintenance of certified AI technology, recertification, and local governance and oversight. The ONC conducted randomized surveillance of certified health IT products and through this process identified developer products that were no longer compliant with certification requirements. Randomized surveillance should be part of an AI certification process to identify lack of certification adherence in real-world settings and use of AI in unintended ways. This would encourage AI developers to maintain and recertify their products throughout the AI life cycle. It would also encourage sites using AI to monitor for appropriate use.”

Full editorial, RM Ratwani, D Classen and C Longhurst, JAMA, 2024.8.12.