Part of the
4TU.
Centre for
Engineering Education
TU DelftTU EindhovenUniversity of TwenteWageningen University
4TU.
Centre for
Engineering Education
Close

4TU.Federation

+31(0)6 48 27 55 61

secretaris@4tu.nl

Website: 4TU.nl

Grading feels like a Turing test. Here's something we can do about it!

Tuesday, 8 July 2025
We developed a taxonomy to classify student-GenAI interactions, then annotated 1450 interactions (i.e., prompts) from 70 graded essays (2 writing-intensive courses: BSc and MSc). We connected interactions with learning indicators via essay grades and interaction quality evaluation.

๐Ÿ””๐—ก๐—ฒ๐˜„ ๐—ฝ๐—ฟ๐—ฒ๐—ฝ๐—ฟ๐—ถ๐—ป๐˜๐Ÿ”” Grading feels like a Turing test. Here's something we can do about it!

It often feels like we're evaluatingย students' prompting skills rather than their understanding. As AI tools become common, our research explores a constructive path forward beyond controlling AI use. Our core idea is shifting focus from final product to learning process by searching for learning in student-AI conversations.

๐—ข๐˜‚๐—ฟ ๐—ฎ๐—ฝ๐—ฝ๐—ฟ๐—ผ๐—ฎ๐—ฐ๐—ต:ย We developed a taxonomy to classify student-GenAI interactions, then annotated 1450 interactions (i.e., prompts) from 70 graded essays (2 writing-intensive courses: BSc and MSc). We connected interactions with learning indicators via essay grades and interaction quality evaluation.

To identify learning in GenAI interactions, we developed the ๐——๐—ฅ๐—œ๐—ฉ๐—˜ย framework. It identifies two critical patterns: students actively steering the conversation (๐——๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฅ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป๐—ถ๐—ป๐—ด ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป) and demonstrating knowledge throughout dialogue (๐—ฉ๐—ถ๐˜€๐—ถ๐—ฏ๐—น๐—ฒ ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐˜€๐—ฒ).

Our research led to two main findings:

๐Ÿ”น ๐—›๐—ผ๐˜„ ๐˜€๐˜๐˜‚๐—ฑ๐—ฒ๐—ป๐˜๐˜€ ๐˜๐—ฎ๐—น๐—ธ ๐˜๐—ผ ๐—”๐—œ ๐—ฟ๐—ฒ๐—ณ๐—น๐—ฒ๐—ฐ๐˜๐˜€ ๐˜๐—ต๐—ฒ๐—ถ๐—ฟ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด.ย Our data showed strong positive correlation between high-quality GenAI interactions and higher essay grades (see image). This means analyzing chat logs can reliably identify learning.

๐Ÿ”น ๐—˜๐—ณ๐—ณ๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—”๐—œ ๐˜‚๐˜€๐—ฒ ๐—ถ๐˜€ ๐—ฝ๐—ฎ๐—ฟ๐˜๐—ป๐—ฒ๐—ฟ๐˜€๐—ต๐—ถ๐—ฝ, ๐—ป๐—ผ๐˜ ๐—ฑ๐—ฒ๐—น๐—ฒ๐—ด๐—ฎ๐˜๐—ถ๐—ผ๐—ป. High-quality GenAI interactions involved treating AI as a "partner" to co-develop ideas. Lower-quality interactions involved task delegation or basic search use.

Although our findings apply primarily to academic argumentative writing, teachers can use ๐——๐—ฅ๐—œ๐—ฉ๐—˜ย as a guiding framework to tailor the assessment criteria for student-GenAI interactions to their (writing) course-specific learning objectives. Our paper discusses some key considerations related to this approach.

This approach allows us to move beyond the assessment dilemma and focus on what truly matters: helping students navigate AI technology thoughtfully, including when to use it and when not to.

๐—ฃ๐—ฟ๐—ฒ๐—ฝ๐—ฟ๐—ถ๐—ป๐˜ ๐—ต๐—ฒ๐—ฟ๐—ฒ๐Ÿ“„: https://lnkd.in/eEfRMXD5


Authors: Manuel Oliveira, Carlos Zednik, Gunter Bombaerts, Bert Sadowski, Rianne Conijn