AIG 2024 Abstracts


Area 1 - Automatic Item Generation

Full Papers
Paper Nr: 7
Title:

A Method for Generating Testlets

Authors:

Mark J. Gierl and Tahereh Firoozi

Abstract: A testlet is a set of two or more items based on the same scenario. A testlet can be used to measure complex problem-solving skills that require a series or sequence of steps. A testlet is challenging to write because it requires one unique scenario and two or more items. Despite this challenge, large numbers of testlets are often required to support formative and summative computerized testing. The purpose of our study is to address the testlet item writing challenge by describing and demonstrating a systematic method that can be used to create large numbers of testlets. Our method is grounded in the three-step process associated with template-based automatic item generation. To begin, we describe a testlet-based item model. The model contains global and local variables. Global variables are unique to testlet generation because they can be used throughout the testlet, meaning that these variables can be used to place content anywhere in the testlet. Local variables, on the other hand, are specific to each item model in the testlet and can only be used in the same item model. Next, we present four cases that demonstrate how global and local variables can be combined to generate testlets. Each case provides a practical example of how the testlet item model can be used to structure global and local variables in order to generate diverse sets of test items. We conclude by highlighting the benefits of testlet-based automatic item generation for computerized testing.
Download

Paper Nr: 9
Title:

Generalized Automatic Item Generation for Graphical Conceptual Modeling Tasks

Authors:

Paul Christ, Torsten Munkelt and Jörg M. Haake

Abstract: Graphical conceptual modeling is an important competency in various disciplines. Its mastery requires self-practice with tasks that address different cognitive processing dimensions. A large number of such tasks is needed to accommodate a large number of students with varying needs, and cannot be produced manually. Current automatic production methods such as Automatic Item Generation (AIG) either lack scalability or fail to address higher cognitive processing dimensions. To solve these problems, a generalized AIG process is proposed. Step 1 requires the creation of an item specification, which consists of a task instruction, a learner input, an expected learner output and a response format. Step 2 requires the definition of a generator for the controlled generation of items via a configurable generator composition. A case study shows that the approach can be used to generate graphical conceptual modeling tasks addressing the cognitive process dimensions Analyze and Create.
Download

Short Papers
Paper Nr: 5
Title:

Automatic Question Generation for the Japanese National Nursing Examination Using Large Language Models

Authors:

Yûsei Kido, Hiroaki Yamada, Takenobu Tokunaga, Rika Kimura, Yuriko Miura, Yumi Sakyo and Naoko Hayashi

Abstract: This paper introduces our ongoing research project that aims to generate multiple-choice questions for the Japanese National Nursing Examination using large language models (LLMs). We report the progress and prospects of our project. A preliminary experiment assessing the LLMs’ potential for question generation in the nursing domain led us to focus on distractor generation, which is a difficult part of the entire questiongeneration process. Therefore, our problem is generating distractors given a question stem and key (correct choice). We prepare a question dataset from the past National Nursing Examination for the training and evaluation of LLMs. The generated distractors are evaluated with compared to the reference distractors in the test set. We propose reference-based evaluation metrics for distractor generation by extending recall and precision, which is popular in information retrieval. However, as the reference is not the only acceptable answer, we also conduct human evaluation. We evaluate four LLMs: GPT-4 with few-shot learning, ChatGPT with few-shot learning, ChatGPT with fine-tuning and JSLM with fine-tuning. Our future plan includes improving the LLMs’ performance by integrating question writing guidelines into the prompts to LLMs and conducting a large-scale administration of automatically generated questions.
Download

Paper Nr: 10
Title:

Investigating the Quality of AI-Generated Distractors for a Multiple-Choice Vocabulary Test

Authors:

Wojciech Malec

Abstract: This paper reports the findings of a study into the effectiveness of distractors generated for a multiple-choice vocabulary test. The distractors were created by OpenAI’s ChatGPT (the free version 3.5) and used for the construction of a vocabulary test administered to 142 students learning English as a foreign language at the advanced level. Quantitative analysis revealed that the test had relatively low reliability, and some of its items had very ineffective distractors. When examined qualitatively, certain items were likewise found to have an ill-matched set of options. Moreover, follow-up queries failed to correct the original errors and produce more appropriate distractors. The results of this study indicate that although the use of artificial intelligence has an unquestionably positive impact on test practicality, ChatGPT-generated multiple-choice items cannot yet be used in operational settings without human moderation.
Download