Abstract: |
Massively Online Open Courses (MOOCs) are often characterized by very low completion rates, with most published research agreeing on a median circling 6.5% but mounting up to 60% for fee-based certificates. Within this context, studies have pointed out that factors such as engagement, intention and motivation commonly affect learners’ performance in MOOCs.
Moreover, research has shown that the learner’s psychological state carries a major weight in the learning process. Flow, a fundamental psychological state, deeply motivates people to persist in their activities without extrinsic rewards, and known to be positively correlated to self-efficacy, motivation, engagement, and academic achievement. Because of this broad positive impact, flow is a prime candidate for (1) detection and ultimately, (2) promotion during the learning process in MOOCs.
However, automatic, transparent flow detection is particularly difficult as any attempt to measure flow inevitably contributes to its disruption, a challenge particularly exacerbated in a MOOC context, where the distant, and asynchronous factors uniquely reflect on the educational and psychological context.
In parallel, Machine Learner (ML) is a tool being extensively employed to make sense of data, in times where data is abundant. Therefore, ML plays a key role in learning from data the knowledge and insights that might be challenging to obtain from otherwise unavailable human experts.
Thus, profiting from data generated by MOOCs, we approach the particularly challenging flow detection issue (1) by training a ML model to detect flow transparent and automatically in a MOOC. We pair the results of the educationally appropriate EduFlow2 and Flow-Q questionnaires (n = 1 553, two years data collection, and rigorous data cleaning & validation) along the participants’ MOOC log data (French MOOC “Gestion de Projet” [Project Management]) to a multi-staged ML Logistic Regression pipeline designed to train and optimize hundreds of ML models to land on the one best-trained ML model able to detect flow (ROC = 0.68 and PRC = 0.87) in a MOOC context.
The resulting trained ML model successfully detects flow presence transparent and automatically with a greater Precision (0.85) than it detects flow absence (0.34) in unseen MOOC participants. It employs 23 individualized features (e.g., “Number of navigational events”, “Total number of different types of events”, or “Total seconds logged in”) pre-calculated from aggregating the MOOC participants’ log data. Upon access via an API (Application Programming Interface) call, the model returns in real-time the calculated flow state of any MOOC learner as a confidence percentage facilitating additional treatment, e.g., displaying it on a trainer/trainee MOOC dashboard, or factoring it in into further content personalization processing.
Indeed, the resulting trained ML model requires an independent, resource-intensive, prior phase of data aggregation to generate (and store) the 23 features employed to detect flow. Furthermore, the model's resolution does not allow for fine flow detection. In fine, its performance is yet to be evaluated in new, unseen MOOC contexts (e.g., not a francophone MOOC, a MOOC on a biology course,etc.), all of which constitute the current active focus of our research. |