The Application of Large Language Models and Retrieval-Augmented Generation in Precise Information Extraction from Scientific Articles: A Study in Applied Linguistics Literature Review

Valizadeh, Seyed Mahdi; Ghazanfari, Mehdi; Hassani, Ghodrat

doi:10.22034/jls.2025.144337.1290

The Application of Large Language Models and Retrieval-Augmented Generation in Precise Information Extraction from Scientific Articles: A Study in Applied Linguistics Literature Review

Articles in Press

Document Type : Original Article

Authors

¹ M.A., Department of Intelligent Systems Engineering, Faculty of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.

² Professor, Department of Intelligent Systems Engineering, Faculty of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.

³ Assistant Professor, English Translation Department, Faculty of Humanities, Damghan University, Damghan, Iran

10.22034/jls.2025.144337.1290

Abstract

This study aims to examine the application of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) in accurately extracting information from applied linguistics research articles. With the growing volume of publications, the need for automated tools to transform unstructured texts into analyzable data has become increasingly urgent. Through a systematic literature review, this research proposes a conceptual framework based on LLM and RAG to extract components such as research questions, theoretical frameworks, methodologies, findings, and limitations. The methodology involves selecting articles from secondary databases, designing specialized prompts, and conducting evaluations using Precision, Recall, and F1-Score metrics. Findings indicate that the integration of LLM and RAG achieves high accuracy (average F1 = 0.81) in extracting structured elements such as data sources and analytical methods, while inferential components still require human validation. These results highlight the significant potential of this approach for accelerating systematic literature reviews and offer practical recommendations, such as fine-tuning, to enhance overall performance.

Keywords

Main Subjects

Linguistics

The Application of Large Language Models and Retrieval-Augmented Generation in Precise Information Extraction from Scientific Articles: A Study in Applied Linguistics Literature Review

Articles in Press, Accepted Manuscript
Available Online from 04 October 2025

Files

History

Share

How to cite

Statistics

The Application of Large Language Models and Retrieval-Augmented Generation in Precise Information Extraction from Scientific Articles: A Study in Applied Linguistics Literature Review

Articles in Press, Accepted Manuscript Available Online from 04 October 2025

Files

History

Share

How to cite

Statistics

Articles in Press, Accepted Manuscript
Available Online from 04 October 2025