Project introduction and background information
Scaling up courses to larger classrooms is a challenge, especially without compromising on the education quality. This is especially true for programming courses, where usual strategies like multiple-choice questions do not work for teaching students practical skills, and evaluation of students' submissions is time-consuming, as their code needs to be run on a computer to check if it works properly. Automatic testing of student submissions (autotests) are a potential way to tackle this challenge. Based on continuous integration (CI) systems used by the IT industry for software development, autotests take code submitted by students, run tests that have been prepared in advance on the submissions, and provide both feedback to students and grades for teachers in an automatic and near-real-time fashion.
In the Geoscripting (GRS33806) course, the CodeGrade platform is used to handle students' submissions. CodeGrade integrates with EMSs (i.e. Brightspace) and and provides its own interface for giving and receiving peer/teacher feedback, assigning teachers to grade submissions, and virtual machines for running autotests. However, the system to evaluate code written in R was not set in place, nor were there R autotests available for the course content.
Objective and expected outcomes
The objective of the education innovation project was to improve students' learning and evaluation by providing autotests for their submissions in the R programming language using the Codegrade platform. The expected outcomes were:
- Autotests available for formative assignments, giving guidance to the students about what is incorrect in their code, and a confirmation that the code works as expected when everything is done correctly.
- Autotests available for summative assignments, where in addition to the previous point, teachers get an indication of how well the assignment was completed, as soon as the assignment deadline is over.
- The system for creating the setup and development environments for Codegrade autotests is established and documented publicly, enabling the reuse of the autotest system for other courses teaching R.
Results and learnings
The expected outcomes were achieved for most of the assignments in the Geoscripting course using R. Students submit their assignments using integration between version control (Git) and Codegrade. Every time a change is pushed to Git, it is picked up by Codegrade, and a test process is run on the (partial) submission. Codegrade starts a virtual machine with an image containing software that is needed to assess the given assignment, and prepared tests specific to the assignment are run, either in sequence or in parallel. If the code does not run as expected, the tests report an error and a suggestion for how to improve the result to the students on the Codegrade interface. The student can then take it into account, try to fix the issue, submit changes to Git, and have the tests rerun on the new version, getting further feedback. If everything runs as expected, the student is given feedback about it and automatically receives points for the assignment grade. The teachers get an overview of the current automatic grade of all students in class as the assignment progresses, and a final automatic grade after the assignment is finished.
The biggest challenge in this project was to set up the autotest system for R. While Codegrade provides some examples and integrated environments for Python, it was not available for R. Downloading and installing all the required packages takes a very long time (~20 minutes), which made it very difficult to test the evaluation scripts. If there is a mistake in the evaluation script, the system would need to rebuild the packages again to try rerunning the autotests. This was solved using r2u, a prebuilt R package repository that installs packages in seconds on Ubuntu machines, which is what the Codegrade environment is running.
The whole autotest setup for R on Codegrade was then documented online, and can be found publicly on GitHub. The documentation can help other R programming courses to make use of the same setup. It also helps teaching assistants in subsequent years to get familiar with the system and add more autotests and programming languages.
The students appreciated having real-time feedback on their submissions, and some found it engaging to aim for a full dashboard of green checkmarks, indicating that the assignment is done correctly and is independently reproducible on another machine.
For teachers it was also useful to get a quick sense of how well an assignment was done by each student, and especially to catch errors that are not immediately obvious. One example is when a student uses a variable defined in the global environment in a function, which is supposed to be stand-alone and reusable. If the student's submission is run line by line, it works, as the main script creates the needed variable before the function gets called, and it appears that the assignment is done correctly. But the autotests check the functions individually, and give an error when they encounter variables that only exist in the main script (global environment) and not in the function. The autotests also test a wider range of inputs to each function than a teacher would normally try for each student.
The autotests were useful to identify the lower bound of how well a student performed in an assignment, but they were of limited use for determining the final grade. This is because autotests cannot effectively provide partial grades. If the student misspelled a variable name, the autotest would give zero points, even if everything else was done correctly. Therefore, manual checks are still always needed to evaluate summative assignments.
Another limitation of the system was that autotest grades associated with a rubric item were not overridable. Only the final grade could be overridden. Therefore, if the student got extra partial grades, this was not visible on the rubric in Codegrade. As a workaround, the correct detailed grade had be provided as a general feedback item and the final grade manually overridden. Developers of Codegrade have stated that they are looking into ways to allow overriding individual autotest grades in the future.
Lastly, the creation of autotest evaluation scripts is a time-consuming process, and it has to be repeated every time that a new assignment is made or when one is altered. During the project duration, some of the assignments did not get in-depth autotests beyond "does it run altogether" due to the lack of remaining time.
Recommendations
- Autotests provide real-time feedback to the students, which they appreciate, as they have an opportunity to fix these issues as soon as they arise, and to get a hint as to what went wrong.
- Autotests help do an independent check and can identify issues that are difficult to notice for the teachers, leading to improved fairness of evaluations.
- The time investment for setting up autotests (both the system and the evaluation scripts themselves) is significant. However, it pays off in the long run, and helps reduce load during the course, and therefore is a worthwhile investment. The documentation resulting from the project helps with setting the system up for new courses wanting to use the autotesting system, and will save a significant amount of time.
- To come to a valid grade in summative assignments, assignments still have to be checked manually. Autotests save time when assessing assignments that are done without errors.
Practical outcomes
The autotest system is implemented in the Geoscripting course and continues to be improved every year. Python autotesting is going to be attempted next, in addition to R. The documentation about R autotesting setup is available publicly on GitHub (and will be updated for subsequent developments), and autotest evaluation scripts for the assignments used during the course are available on request, to use as a reference.