A few notes:
1. Huggingface models are supported by Captum - a framework for gradient based explanations of any pytorch model: https://captum.ai/tutorials/Bert_SQUAD_Interpret
2. There are several huggingface "spaces" which show-case in the browser the ability to do model explanations on huggingface models using a variety of techniques, such as with LIME: https://huggingface.co/spaces/Hellisotherpeople/Interpretabl...
or with SHAP: https://huggingface.co/spaces/Hellisotherpeople/HF-SHAP
and there is def an example already of doing it with gradient based techniques but I'm having trouble finding it!
3. It's cool to see someone do this with from-scratch code, since gradient based explanation techniques are very complicated and also have a lot of variance from one technique to another.