Abstract: |
Nonverbal communication, in particular eye contact, is a critical element of the music classroom, shown to keep students on task, coordinate musical flow, and communicate improvisational ideas. Unfortunately, this nonverbal aspect to performance and pedagogy is lost in the virtual classroom. In this paper, we propose a machine learning system which uses single instance, single camera image frames as input to estimate the gaze target of a user seated in front of their computer, augmenting the user’s video feed with a display of the estimated gaze target and thereby restoring nonverbal communication of directed gaze. The proposed estimation system consists of modular machine learning blocks, leading to a target-oriented (rather than coordinate-oriented) gaze prediction. We instantiate one such example of the complete system to run a pilot study in a virtual music classroom over Zoom software. Inference time and accuracy meet benchmarks for videoconferencing applications, and quantitative and qualitative results of pilot experiments include improved success of cue interpretation and student-reported formation of collaborative, communicative relationships between conductor and musician. |