Auditing LLMs: Are Users Getting What They Pay For?

The Trust Issue with LLM APIs: Are You Getting What You Pay For?
The rapid rise and increasing prevalence of large language models (LLMs), accessible through black-box APIs, raises a crucial question of trust: Are users actually paying for the performance they are promised? LLM API providers advertise specific model capabilities based on factors like size and performance. However, the incentive exists to secretly use a cheaper, less powerful model for cost reasons. This lack of transparency undermines user trust, hinders reliable performance evaluation, and raises fairness concerns.
The Black-Box Challenge
Detecting such model substitutions is extremely difficult due to the black-box nature of the APIs. Interaction is typically limited to input-output queries, which complicates analysis of the underlying models. The transparency to verify whether the advertised model is actually being used, or whether a different, possibly inferior model is processing the requests, is lacking.
Existing Verification Methods and Their Limitations
Research is intensively focused on developing methods for verifying model integrity. Statistical tests based on the analysis of text outputs, benchmark evaluations, and log-probability analysis are some of the approaches being investigated. However, studies show that these methods, especially those based on text outputs, reach their limits when it comes to more subtle or adaptive substitution strategies. For example, model quantization or randomized substitutions can be difficult to detect. While log-probability analysis offers stronger guarantees, it is often inaccessible because the required information is not provided by the API providers.
Attack Scenarios and Their Impact
Various attack scenarios illustrate the complexity of the problem. In addition to the aforementioned model quantization and randomized substitution, the targeted circumvention of benchmarks also poses a challenge. Providers could optimize models to perform well on known benchmarks, while general performance is lower. This makes objective assessment of the actual model quality difficult.
Hardware-Based Solutions and Future Perspectives
A promising approach to ensuring model integrity lies in the use of hardware-based solutions such as Trusted Execution Environments (TEEs). TEEs provide a secure environment in which the execution of the model can be verified. However, the trade-offs between security, performance, and acceptance by providers must be considered. Implementing TEEs can be associated with performance losses and requires the willingness of providers to invest in the necessary infrastructure.
Ensuring model integrity in LLM APIs is a central challenge for the future development and trustworthy use of AI technologies. Research is continuously working on the development of more robust testing methods and security mechanisms to increase transparency and trust in LLM APIs and prevent misuse.
Bibliography: Cai, W., Shi, T., Zhao, X., & Song, D. (2025). Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs. 2025 NAACL Conference. https://2025.naacl.org/program/accepted_papers/ Anthropic. (n.d.). LLMs. https://docs.anthropic.com/llms-full.txt FACCT Conference 2024. https://facctconference.org/static/papers24/facct24-152.pdf NeurIPS 2024. https://nips.cc/virtual/2024/papers.html arXiv preprint. https://arxiv.org/pdf/2502.07776 Ada Lovelace Institute. (n.d.). Under the Radar. https://www.adalovelaceinstitute.org/report/under-the-radar/ SpringerLink. https://link.springer.com/article/10.1007/s11846-023-00696-z ACM Digital Library. https://dl.acm.org/doi/pdf/10.1145/3630106.3659037 ACM SIGSAC Conference on Computer and Communications Security (CCS '24). https://www.sigsac.org/ccs/CCS2024/program/accepted-papers.html ScienceDirect. https://www.sciencedirect.com/science/article/pii/S0268401223000233