AIG 2026 Abstracts

Area 1 - Automatic Item Generation

Full Papers

Paper Nr:	6
Title:	Computer-Aided Methods for Intelligent Distractor Generation for Japanese Cloze Tests
Authors:	Daniele Amore and Michael Striewe
Abstract:	This paper presents the development and evaluation of two systems for the automatic generation of distrac-tors for Japanese cloze tests. The first system is a re-implementation of an existing architecture originally developed for Chinese, adapted to address the specific linguistic challenges of Japanese such as its complex writing system and agglutinative grammar. The second system extends this approach with a novel context-aware mechanism that leverages BERT to dynamically classify sentence contexts as open or closed and adjusts generation and filtering strategies accordingly. Both systems employ a two-stage pipeline of candidate generation and candidate filtering, utilizing criteria based on frequency similarity, orthographic similarity, word co-occurrence, and semantic similarity. Several inventories were constructed from a 25.4 million sentence Japanese Wikipedia corpus. The context-aware system further introduces a contextual similarity criterion and a BERT-based plausibility filter using Pseudo-Log-Likelihood scoring. A human evaluation study shows that both systems produce predominantly good or moderate distractors, with the word similarity and semantic similarity criteria yielding particularly strong results.
Download

Paper Nr:	7
Title:	From Static Analysis to Reflection: Towards Generating Questions about Code Quality Defects
Authors:	Sigrid L. Klinger, Patrick Weber and Sven Strickroth
Abstract:	Providing meaningful feedback on code quality remains challenging in introductory programming courses. While static analysis tools can automatically detect many code quality defects in student submissions, their results are typically presented as warnings or metric values that novice programmers must interpret on their own. Our approach aims at providing code quality feedback and guiding students to reflect on their own code and hopefully improve their code in a scalable way. This paper presents a first step towards automatic item generation for this purpose by transforming selected static-analysis detections into reflection-oriented multiple-choice questions. The approach combines defect-specific item models with contextual information extracted from the student’s submitted code, allowing generated questions to refer directly to concrete elements of the student’s own code. We derive technical requirements for such a generation framework, describe its integration into an existing feedback pipeline, and present the implemented item models for nine code quality defects, illustrated with representative examples. We discuss experiences and challenges that emerged during implementation, such as generating modified code that remains recognizable to students. The paper provides a technical foundation for generating reflection questions on code quality from student-produced code artifacts.
Download

Paper Nr:	10
Title:	A Dual-Conditioned CVAE for Interpretable Automatic Question Generation: Controlling Topic and Difficulty through Latent Spaces
Authors:	Edwin Eldho Paul and Ayse Saliha Sunar
Abstract:	Teachers need questions that target specific topics at appropriate difficulty levels, yet current Automatic Question Generation systems focus on improving fluency rather than attribute control. This paper introduces a dual-conditioned Conditional Variational Autoencoder (CVAE) that generates standalone educational questions with simultaneous control over topic and difficulty. The architecture forces the decoder to rely on the latent variable by routing all encoder information through a latent space via memory-token cross-attention on a frozen Bidirectional and Auto-Regressive Transformers (BART) backbone. An auxiliary difficulty classifier is then added after discovering that reconstruction loss alone encodes topic at roughly twice the weight of difficulty. Evaluated on 33,829 EduQuest questions across 8 subjects and 3 difficulty levels, the model reveals three findings. First, difficulty control is semantic, not lexical: a consistent 19 percent gap separates keyword-based and learned classifiers. Second, the auxiliary classifier nearly doubles disentanglement scores with Mutual Information Gap (MIG) increasing from 0.019 to 0.074 and Disentanglement, Completeness, and Informativeness (DCI) from 0.240 to 0.431. Third, potential for scale is shown through a 103-times improvement in perplexity from a 2,300-question to a 27,000-question training set. These results provide evidence for CVAEs’ usage as an interpretable alternative to black-box large language models for controlled educational question generation.
Download

Paper Nr:	11
Title:	What’s the Difference? A Bibliographic Analysis on Automatic Item and Question Generation
Authors:	Paul L. Christ and Torsten Munkelt
Abstract:	Research on automatic item generation (AIG) and automatic question generation (AQG) has expanded substantially over the past five decades, yet the two fields have largely evolved in parallel communities with limited cross-citation and fragmented terminology. This study presents a bibliographic mapping analysis of AIG and AQG research from 1976 to 2026. A PRISMA-guided corpus of 3,004 publications was assembled from five databases using field-specific search queries, followed by deduplication, semi-automated LLM-assisted screening, and metadata enrichment. Co-citation, co-authorship, topic modeling, topic burst detection, and topic co-occurrence analyses were applied to map the intellectual structure of both fields. Results reveal four distinct publication eras, sharply divergent geographic and institutional profiles, and largely separate foundational literatures. Explosive growth since 2020, driven by large language models, has created new convergence opportunities. This study provides the first integrated bibliographic map of AIG and AQG and identifies opportunities for terminological consolidation and cross-disciplinary collaboration.
Download

Short Papers

Paper Nr:	8
Title:	Reducing Perceived Mental Effort of AIG Cognitive Model Creation Using LLM-Powered Suggestions
Authors:	Florian Stahr, Sebastian Kucharski, Iris Braun and Gregor Damnik
Abstract:	Cognitive model creation for Automatic Item Generation (AIG) entails representation of domain knowledge in structured form by subject matter experts. In the past, editors were built to aid especially non-expert users with this task. However, they do not yet support users during the formalization of domain knowledge in the form of a cognitive model. This can manifest itself as perceived mental effort. We propose to extend such editors with cognitive model component suggestions based on a user-provided pool of domain knowledge. Their integration into an existing AIG editor is investigated with the aim of reducing perceived mental effort using large language models (LLMs). Its success is evaluated through user testing. Thereby, evaluation participants accomplish a task both with and without LLM-generated suggestions and rate their perceived mental effort on a 7-point Likert scale. Results show reduced perceived mental effort for most AIG cognitive model components, except for implications, where it was slightly higher. Further, we propose to precede the AIG cognitive model stage by a new stage that explicitly makes determining what, why, and how to measure part of the AIG process.
Download