FinAudio Benchmark Evaluates Audio LLM Performance in Finance

Audio AI in Finance: FinAudio Benchmark Sets New Standards

The rapid development of Audio Large Language Models (AudioLLMs) is revolutionizing the processing and analysis of audio data. Advances in areas such as conversation, audio understanding, and automatic speech recognition (ASR) are undeniable. However, especially in the financial sector, where audio data such as earnings calls or CEO speeches represent important sources of information, a standardized benchmark for evaluating the performance of these models has been lacking. FinAudio, a new benchmark specifically designed for the evaluation of AudioLLMs in the financial domain, now addresses this need.

FinAudio: Focus on Three Core Tasks

FinAudio focuses on three central tasks that reflect the specific challenges of the financial sector:

1. ASR for short financial audio data: This involves the precise transcription of short audio sequences, such as short expert interviews or commercials.

2. ASR for long financial audio data: This task aims at the error-free transcription of longer recordings, for example, hours-long conference calls or presentations.

3. Summarization of long financial audio data: This task tests the ability of AudioLLMs to extract the essential information from long audio recordings and summarize it concisely.

Database and Evaluation

To cover these three task areas, FinAudio comprises two datasets for short and two datasets for long audio data. In addition, a new dataset was created specifically for the summarization of financial audio data. Seven common AudioLLMs were evaluated using this benchmark. The results of the evaluation reveal both the strengths and current limitations of existing models in the financial context and provide valuable insights for future improvements.

Potential and Challenges

The development of FinAudio marks an important step in the application of AI in finance. By providing a standardized benchmark, AudioLLMs can be trained and optimized more effectively to meet the specific requirements of this complex sector. The accurate transcription and summarization of financial audio data holds enormous potential for the automation of analysis processes, the improvement of investment decisions, and the increase of efficiency in the financial sector. At the same time, FinAudio highlights the need for further research and development to improve the accuracy and reliability of AudioLLMs in dealing with the complex language and specialized content of finance.

Outlook

The release of FinAudio's datasets and code opens up new opportunities for the research community and the financial industry to explore and further develop the capabilities of AudioLLMs in the financial domain. FinAudio is expected to serve as a catalyst for innovations in AI-driven financial analysis and contribute to revolutionizing the use of audio data in finance. The insights gained from the evaluation of AudioLLMs with FinAudio provide valuable impetus for the development of future models and pave the way for more efficient and data-driven decision-making in the financial sector.

Bibliography: Cao, Y. et al. (2025). FinAudio: A Benchmark for Audio Large Language Models in Financial Applications. arXiv preprint arXiv:2503.20990. Bian, K. et al. (2024). FinNLP: Shared Task on Financial Natural Language Processing. Proceedings of the First Workshop on Financial Natural Language Processing. Allen, D. et al. (2024). The Impact of Large Language Models in Finance - Towards Trustworthy Adoption. The Alan Turing Institute. Huang, J. et al. (2023). FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models. NeurIPS 2023 Datasets and Benchmarks Track. Srivastava, S. et al. (2022). A Transformer-based Framework for Multivariate Time Series Representation Learning. arXiv preprint arXiv:2211.09106. Shareghi, E. et al. (2024). Is Attention All Finance Needs? arXiv preprint arXiv:2402.12659. Liu, P. et al. (2024). Pre-trained Language Models for Financial Text Analysis: A Survey. arXiv preprint arXiv:2406.11903v1. Hendrycks, D. et al. (2024). Measuring Massive Multitask Language Understanding. OpenReview. Davenport, T. H. et al. (2019). Artificial Intelligence in the Audit Function: An Application Perspective. International Journal of Production Operations Management Research, 51(2), 162-176.