Part of the
Ethics and Technology
TU DelftTU EindhovenUniversity of TwenteWageningen University
Ethics and Technology


+31(0)6 48 27 55 61


© AI generated image by Craiyon

Responsible AI development, are explanations enough?


Responsible artificial intelligence (AI) is one of the recent hot-topics in AI. AI seems here to stay and (arguably) has the potential to greatly improve our lives. But these promises come together with risks: what if AI makes decisions that negatively impact us? To avoid this, we should develop responsible AI: algorithms we can trust, that respect human rights and societal values, and that we, humans, have some kind of guidance over (RAI | RAII Home, n.d.).

In this blogpost, I ask whether explanations can be used for responsible AI, in particular for engineers. AI is notoriously opaque: we do not know how it works and why it makes certain decisions (Castelvecchi, 2016). So maybe providing explanations that show why an AI system predicts something, also solves the problem of responsibility? Unfortunately, this approach is too simplistic and risks falling into the trap of technological solutionism – the idea that all problems can be solved by simply applying more and better technology, without thinking about the social and technical context in which the technology is used (Santoni de Sio & Mecacci, 2021). 

AI is notoriously opaque: we do not know how it works and why it makes certain decisions

Just looking at technical solutions is not enough, so let’s turn to philosophy for a more comprehensive perspective. In philosophy, the problem is often framed as a responsibility gap. With self-learning systems, we can no longer hold the operator or manufacturer of a system morally responsible, as they cannot predict future behavior of the system. If we decide to use the systems, the question is: who is morally responsible for the actions of such a system (Matthias, 2004)?

AI and responsibility gaps 

In a recent paper, Santoni de Sio and Mecacci (2021) outline the problem in even more detail, describing four responsibility gaps in the context of AI with different sources and different solutions. Very generally, these are:

  • Culpability: was the mistake due to some agent’s wrong behavior and can they be blamed?
  • Moral accountability: why was a decision taken by the system and who played a role in taking this decision?
  • Public accountability: can public agents explain the (automated) decision making in public institutes?
  • Active responsibility: are the stakeholders involved in designing, using, and interacting with AI aware of the (future) impacts on society and their own role in this?

I will focus on the last type of responsibility gap – the active responsibility gap – to put my own research, on interventions for explainable AI, in a broader perspective. 

Explanations for engineers

In my PhD, I study explanation methods that are useful for engineers. For example, engineers might not only want to know how an AI system made a prediction, but also how they can change this. Imagine an engineer is working on a self-driving car and notices that the car recognizes all stop signs as speed signs. The engineer wants the car to be safe and needs to fix this. Right now, the easiest way to do this is to retrain the entire system: give it new data and hope that it learns the right correlations between input and output. Wouldn’t it be much easier if the engineer understands the system and can just tweak something, i.e. apply an intervention, to fix it? 

Wouldn’t it be much easier if the engineer understands the system and can just tweak something?

This is where intervention methods come in. They are most popular in natural language processing (Bau et al., 2018; De Cao et al., 2021; Geva et al., 2022) and seem relatively successful. However, the theory behind these methods is lagging behind: it is unclear what interventions target, what underlying assumptions there are, and what we can learn from interventions. If we can figure this out, intervention methods could be a promising explanation method for engineers. But do interventions also give them more control over these systems? What would this mean for the responsibility of engineers? 

Let’s look back at the active responsibility gap. Do interventions allow engineers to take active responsibility, i.e. to make sure the system acts in line with their social and moral obligations? Interventions seem to provide some improvement. In addition to just explaining what an AI system does, engineers can change the output. As such, they have an active role: they can study the effect of different interventions on the outputs, they can change undesirable outputs, et cetera. In this way, they could make sure that AI systems do not have a negative impact on the rights of people. Moreover, they could promote positive effects, for example by contributing to overall well-being (Santoni de Sio & Mecacci, 2021). 


Image by Michael Dziedzic on Usplash 

Technological solutionism

But let’s think more carefully. Even though interventions might allow the engineer to change the output of AI systems, this is merely a technical solution. As such, this is dangerously close to technological solutionism: interventions are just another technical method, and not a fix-for-all. Even if we clarify assumptions and provide recommendations for how to implement the method, this ignores the social and political context of AI development. In particular, would engineers know what to change to promote positive social effects? If they are unaware of their own obligations and values and the effects on others, they might not be capable of making these decisions during the development of AI.

This is why considering responsibility gaps when thinking about intervention methods can be helpful. It is important to develop the theoretical background of explanation methods we use – these will only be useful if we know what explanations mean and how to interpret them – but this should not happen in a vacuum. If we fail to consider the broader problem of responsibility in AI, we risk promoting technological solutionism. 

If we fail to consider the broader problem of responsibility in AI, we risk promoting technological solutionism.

In my research on intervention methods, I could consider how to educate engineers about the risks associated with AI development and their own responsibilities. For example, by introducing them to the framework of moral responsibility, letting them think about the consequences of their design decisions, and introducing them to value sensitive design (van den Hoven, 2013). Of course, this only tackles a small part of the responsibility problem, but it might be a first step to integrating new explanation methods in a broader, socio-technical context and developing more responsible AI.


Bau, A., Belinkov, Y., Sajjad, H., Durrani, N., Dalvi, F., & Glass, J. (2018). Identifying and Controlling Important Neurons in Neural Machine Translation. ArXiv:1811.01157 [Cs]. 
Castelvecchi, D. (2016). Can we open the black box of AI? Nature News, 538(7623), 20. 
De Cao, N., Aziz, W., & Titov, I. (2021). Editing Factual Knowledge in Language Models. ArXiv:2104.08164 [Cs]. 
Geva, M., Caciularu, A., Dar, G., Roit, P., Sadde, S., Shlain, M., Tamir, B., & Goldberg, Y. (2022). LM-Debugger: An Interactive Tool for Inspection and Intervention in 
Transformer-Based Language Models. ArXiv:2204.12130 [Cs]. 
Matthias, A. (2004). The responsibility gap: Ascribing responsibility for the actions of learning automata. Ethics and Information Technology, 6(3), 175–183. 
RAI | RAII Home. (n.d.). Retrieved June 3, 2022, from 
Santoni de Sio, F., & Mecacci, G. (2021). Four Responsibility Gaps with Artificial Intelligence: Why they Matter and How to Address them. Philosophy & Technology, 34(4), 
van den Hoven, J. (2013). Value Sensitive Design and Responsible Innovation. In R. Owen, J. Bessant, & M. Heintz (Eds.), Responsible Innovation (pp. 75–83). John Wiley & Sons, Ltd.