Automated ICD Coding in Russian: A Promising Approach

Top post
Automated ICD Coding in Russian: A Promising Approach
The automation of processes in healthcare is becoming increasingly important. One area with great potential is the automated coding of diagnoses using the International Classification of Diseases (ICD). A new study investigates the feasibility of this approach in Russian, a language with limited biomedical resources so far.
The RuCCoD Dataset
At the center of the study is the new dataset RuCCoD (Russian Clinical Coding Dataset). This comprises diagnostic data from electronic patient records (EPR), annotated with over 10,000 entities and more than 1,500 different ICD codes. RuCCoD serves as a benchmark for various state-of-the-art models in the field of Natural Language Processing (NLP).
Evaluated Models and Transfer Learning
The researchers evaluated the performance of various models, including BERT, LLaMA with LoRA, and RAG. Particular attention was paid to transfer learning experiments. The study investigated transfer across different domains, for example, from PubMed abstracts to medical diagnoses, as well as transfer between terminologies, such as from UMLS concepts to ICD codes.
Application to Real Patient Data
The best-performing model was subsequently applied to an internal dataset of EPRs containing patient histories from 2017 to 2021. The results, achieved on a carefully curated test dataset, show a significant improvement in accuracy in automated coding compared to data manually annotated by physicians.
Potential for Low-Resource Languages
The study provides valuable insights into the potential of automated ICD coding in low-resource languages like Russian. Automating this process could increase clinical efficiency and improve data accuracy, which in turn can contribute to better patient care. By reducing the manual effort involved in coding, physicians could dedicate more time to actual patient care.
Future Research
Research in this area is still in its early stages. Future studies could focus on expanding the RuCCoD dataset to improve coverage of rare diseases and complex diagnoses. Investigating further NLP models and transfer learning methods could also lead to further advances in automated ICD coding.
Significance for Mindverse
For Mindverse, a German company offering AI-powered solutions for text, content, images, and research, these results are of particular interest. The development of customized solutions, such as chatbots, voicebots, AI search engines, and knowledge systems, benefits from advances in the field of automated coding. The findings from this study could contribute to improving the accuracy and efficiency of such systems in the medical context.
Bibliographie: https://arxiv.org/abs/2502.21263 https://arxiv.org/html/2502.21263v1 https://www.researchgate.net/publication/389510391_RuCCoD_Towards_Automated_ICD_Coding_in_Russian https://chatpaper.com/chatpaper/paper/116125 http://paperreading.club/page?id=288084 https://x.com/UFCS/status/1896504265121661228 https://www.chatpaper.com/chatpaper/pt/paper/116125 https://www.researchgate.net/figure/Distribution-of-ICD-code-frequencies-in-the-RuCCoD-train-set_fig2_389510391 https://aclanthology.org/2021.findings-acl.184.pdf ```