AIG 2025 Abstracts

Area 1 - Automatic Item Generation

Full Papers

Paper Nr:	6
Title:	Automatic Item Generation Integrated into the E-Assessment-System JACK
Authors:	Michael Striewe
Abstract:	Automatic item generation (AIG) can save time in the production of high-quality assessment items, but requires to create and maintain appropriate software tools that fit into a larger context in which the generated items are to be used. Hence, sustainable AIG solutions not only require sophisticated item generation capabilities, but also appropriate software design. This paper presents a concept for AIG that is integrated into an e-assessment system and promotes reusability and extensibility as its main software quality properties. The paper demonstrates the practicality of the concept and discusses the underlying software structure.
Download

Paper Nr:	7
Title:	Evaluation of LLM-Generated Distractors of Multiple-Choice Questions for the Japanese National Nursing Examination
Authors:	Yûsei Kido, Hiroaki Yamada, Takenobu Tokunaga, Rika Kimura, Yuriko Miura, Yumi Sakyo and Naoko Hayashi
Abstract:	This paper reports the evaluation results in the usefulness of distractors generated by large language models (LLMs) in creating multiple-choice questions for the Japanese National Nursing Examination. Our research questions are: “(RQ1) Do question writers adopt LLM-generated distractor candidates in question writing?” and “(RQ2) Does providing LLM-generated distractor candidates reduce the time for writing questions?”. We selected ten questions from the proprietary mockup examinations of the National Nursing Examination administered by a prep school, considering the analysis of the last ten-year questions of the National Nursing Examination. Distractors are generated by seven different LLMs, given a stem and a key for each question of the above ten, and they are compiled into the distractor candidate sets. Given a stem and a key for each question, 15 domain experts completed questions by filling in three distractors. Eight experts are provided with the LLM-generated distractor candidates; the other seven are not. The results of comparing the two groups provided us with affirmative answers to both RQs. The current evaluation remains subjective from the viewpoint of the question writers; it is necessary to evaluate whether questions generated with the assistance of LLM work in a real examination setting. Our future plan includes administering a large-scale mockup examination using both human-made and LLM-assisted questions and analysing the differences in the responses to both types of questions.
Download

Paper Nr:	11
Title:	A Comparison of Different Approaches of Model Editors for Automatic Item Generation (AIG)
Authors:	Florian Stahr, Sebastian Kucharski, Iris Braun and Gregor Damnik
Abstract:	The Automatic Item Generation (AIG) approach allows users to generate tasks or items based on user-defined knowledge models created with associated editors. The challenge is that these editors typically require a certain level of technical expertise, which limits the users who can benefit from the AIG approach. To overcome this, editors can be used with strict user guidance, following a purist approach to avoid feature overload. However, once users are familiar with AIG, the purist approach may hinder their productivity. This paper examines the relationship between the users who can benefit from AIG, the AIG model editing approach used, and its usability aspects. In addition, it tries to identify further perspectives for the development of AIG model editors that make them accessible to both experienced and novice users. For this purpose, we conceptualized an editor that allows more modeling freedom and compared it with a previously developed editor that enforces strict user guidance. Our evaluation shows that the new editor can use more AIG features, but is harder to get used to, and that an appropriate approach may be to dynamically adapt the guidance and features based on the user’s goal and expertise.
Download

Short Papers

Paper Nr:	10
Title:	Banking Strategies and Software Solutions for Generated Test Items
Authors:	Tahereh Firoozi and Mark J. Gierl
Abstract:	Automatic item generation is a scalable item development approach that can be used to produce large numbers of test items. Banking strategies and software solutions are required to organize and manage large numbers of generated items. The purpose of our paper is to describe an organizational structure and different management strategies that rely on appending items with descriptive information using content codes. Content codes contain descriptive data that can be used to identify and differentiate generated items. We present a modern approach to banking that allows the generated items to be managed at both the item and model level. We also demonstrate how a content coding system at both the item and model level provides the user with the ability to execute many different types of searches including accessing one content-specific item from one model, multiple content-specific items from one model, one content-specific item from multiple models, and multiple content-specific items from multiple models. These examples help demonstrate that content coding is a fundamental concept that must be implemented when attempting to organize and manage generated test items.
Download

Paper Nr:	12
Title:	Generating SQL-Query-Items Using Knowledge Graphs
Authors:	Paul L. Christ, Torsten Munkelt and Jörg M. Haake
Abstract:	SQL is still one of the most popular languages used in todays industry across many fields. Poorly written SQL remains one of the root causes of performance issues. Thus, achieving a high level of mastery for SQL is important. Achieving mastery requires practicing with many SQL assessment items of varying complexity and content. The manual creation of such items is very labor-some and expensive. Automatic item generation reduces the cost of item creation. This paper proposes an approach for automatically generating SQL-query items of varying complexity, content, and human-like natural language problem statements (NLPS). The approach is evaluated by human raters regarding the complexity and plausibility of the generated SQL-queries and the preference between two alternative NLPS. The results show agreement on the plausibility of the generated SQL-queries, while the complexity and the NLPS preference show higher variance.
Download