Implementing gradient explanations for a HuggingFace text classification model (opens in new tab)

(victordibia.com)

49 pointsvykthur4y ago5 comments

5 comments

4 comments · 1 top-level

Der_Einzige4y ago· 3 in thread

A few notes:

1. Huggingface models are supported by Captum - a framework for gradient based explanations of any pytorch model: https://captum.ai/tutorials/Bert_SQUAD_Interpret

2. There are several huggingface "spaces" which show-case in the browser the ability to do model explanations on huggingface models using a variety of techniques, such as with LIME: https://huggingface.co/spaces/Hellisotherpeople/Interpretabl...

or with SHAP: https://huggingface.co/spaces/Hellisotherpeople/HF-SHAP

and there is def an example already of doing it with gradient based techniques but I'm having trouble finding it!

3. It's cool to see someone do this with from-scratch code, since gradient based explanation techniques are very complicated and also have a lot of variance from one technique to another.

vykthurOP4y ago

Yes. Captum is a great library and a few of my colleagues have used it with good results in the past. Like you mention, most of the few examples that demonstrate gradient based explanation methods for Huggingface models typically focus on Pytorch models. The example here looks at things from the Tensorflow 2.0/Keras perspective. One thing to note is that model agnostic SHAP can be resource intensive to compute , especially compared to gradient methods that require a single pass through the model for a datapoint.

p1esk4y ago

How would you characterize the degree of explainability of large language models (e.g. gpt3)? Anything surprising or non-intuitive there?

behnamoh4y ago

i think a more general question is how would you measure explainability when you see it? is there some sort of metric for that?

1 more reply

j / k navigate · click thread line to collapse

5 comments

4 comments · 1 top-level

Der_Einzige4y ago· 3 in thread

A few notes:

1. Huggingface models are supported by Captum - a framework for gradient based explanations of any pytorch model: https://captum.ai/tutorials/Bert_SQUAD_Interpret

2. There are several huggingface "spaces" which show-case in the browser the ability to do model explanations on huggingface models using a variety of techniques, such as with LIME: https://huggingface.co/spaces/Hellisotherpeople/Interpretabl...

or with SHAP: https://huggingface.co/spaces/Hellisotherpeople/HF-SHAP

and there is def an example already of doing it with gradient based techniques but I'm having trouble finding it!

3. It's cool to see someone do this with from-scratch code, since gradient based explanation techniques are very complicated and also have a lot of variance from one technique to another.

vykthurOP4y ago

Yes. Captum is a great library and a few of my colleagues have used it with good results in the past. Like you mention, most of the few examples that demonstrate gradient based explanation methods for Huggingface models typically focus on Pytorch models. The example here looks at things from the Tensorflow 2.0/Keras perspective. One thing to note is that model agnostic SHAP can be resource intensive to compute , especially compared to gradient methods that require a single pass through the model for a datapoint.

p1esk4y ago

How would you characterize the degree of explainability of large language models (e.g. gpt3)? Anything surprising or non-intuitive there?

behnamoh4y ago

i think a more general question is how would you measure explainability when you see it? is there some sort of metric for that?

1 more reply

j / k navigate · click thread line to collapse