Project introduction and background information
In this project, we evaluated generate AI’s ability to solve mathematical modelling tasks. Generative AI tools construct answers to prompts by iteratively selecting the next word in the sentence based on the presence of phrases in training data. Whilst this can help with writing tasks, we were interested to see if mathematical modelling problems can also be solved with generative AI. We wished to understand the limitations of generative AI to solve such problems, and ascertain strengths/weaknesses of generative AI that we can teach students in future years. To this end, we took exercises from seven courses taught at Wageningen UR covering first year BSc mathematics, to advanced MSc modelling courses and tested the performance of different generative AI tools (summer 2024 versions of ChatGPT3.5, ChatGPT4o, Perplexity and Gemini) to solve these problems. We evaluated the answers using a rubric based on how close the provided solutions were to the model answers, and tested how consistent the answers were when generative AI tools saw the same problem multiple times. Based on our results, we go on to provide advice on the advantages and disadvantages of generative AI in mathematical modelling education.
Objective and expected outcomes
Objective 1: to assess how accurate solutions obtained from generative AI tools are to solve mathematical modelling exercises.
Objective 2: to observe if generative AI can be used to translate code templates from one programming language (e.g. Python) to a second language (e.g. R or MATLAB).
Objective 3: to test if generative AI tools learn from previous conversations, answers, or problems to solve follow-up questions.
Results and learnings
We tested a range of mathematical modelling exercises from first year BSc courses up to advanced MSc courses using four generative AI tools (summer 2024 versions of ChatGPT3.5, ChatGPT4o, Perplexity, and Gemini). The solutions provided by generative AI were evaluated using a rubric where solutions were scored A (close to teacher’s model answer) to D (highly incorrect or no answer provided). Our results showed that ChatGPT4o and Perplexity performed better than ChatGPT3.5 and Gemini but all tools solved 50-75% of the problems.
Interestingly, output of generative AI appeared random and did the tools did not learn from previous conversations or mistakes and, critically, we found that generative AI could not solve problems that contained hidden implicit assumptions that the teacher expects a student to know without explicitly stating such facts within the problem statement. This is very useful for teachers as this could give them a means of tuning generative AI’s effectiveness in their courses. Furthermore, we found that generative AI can be a useful tool for converting coding problems from one language to another, and generative AI can help students interpret graphs. Teachers and students could take advantage of generative AI for these problems. Teachers do not need to necessarily specify which programming language students use or they could allow students to use different programming languages in their courses by translating teacher-provided code into a language they understand better. Students can use generative AI to help improve their graph evaluation skills, or use generative AI as a sparring partner to help understand data. As a whole, our results show that, for now, mathematical modelling teachers can still control how effective generative AI is within their courses.
A technical report of our results can be shared upon request from Rob Smith: robert1.smith@wur.nl
Recommendations
Recommendation 1: we advise teachers to highlight to students the limitations of generative AI as a means of solving mathematical problems. This could be through teaching activities in classes where students compare the output of generative AI with model answers.
Recommendation 2: we found that generative AI becomes less successful when less problem information is provided. Teachers could take advantage of this. Initially, students may be provided with complete information and allowed to use generative AI to solve problems. Later in the course, incomplete real world scenarios may be taught and student’s dependence on generative AI will have to be weakened as a consequence. Such a concept would fit with the 4C/ID education model (https://www.4cid.org/).
Recommendation 3: depending on the learning outcome of the course, teachers should take advantage of generative AI’s ability to accurately translate code between languages. If one assumes that students in advanced modelling courses already have basic programming skills in one language, then teachers can skip basic programming tutorials in their course and spend more time on teaching mathematical modelling.
Practical outcomes
Based on our results, we are initially highlighting within our courses the limitations of generative AI for solving mathematical modelling and coding problems within introductory material (recommendations 1 and 3). In future years, this will be developed on further to show students how generative AI solutions become weaker when less information is provided in their prompts (recommendation 2).