The Application of Large Language Models and Retrieval-Augmented Generation in Precise Information Extraction from Scientific Articles: A Study in Applied Linguistics Literature Review

Document Type : Original Article

Authors

1 M.A., Department of Intelligent Systems Engineering, Faculty of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.

2 Professor, Department of Intelligent Systems Engineering, Faculty of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.

3 Assistant Professor, English Translation Department, Faculty of Humanities, Damghan University, Damghan, Iran

Abstract

This study aims to examine the application of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) in accurately extracting information from applied linguistics research articles. With the growing volume of publications, the need for automated tools to transform unstructured texts into analyzable data has become increasingly urgent. Through a systematic literature review, this research proposes a conceptual framework based on LLM and RAG to extract components such as research questions, theoretical frameworks, methodologies, findings, and limitations. The methodology involves selecting articles from secondary databases, designing specialized prompts, and conducting evaluations using Precision, Recall, and F1-Score metrics. Findings indicate that the integration of LLM and RAG achieves high accuracy (average F1 = 0.81) in extracting structured elements such as data sources and analytical methods, while inferential components still require human validation. These results highlight the significant potential of this approach for accelerating systematic literature reviews and offer practical recommendations, such as fine-tuning, to enhance overall performance.

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 04 October 2025
  • Receive Date: 23 June 2025
  • Revise Date: 25 September 2025
  • Accept Date: 04 October 2025