"Unbiased" and "fair" are quite overloaded here, to borrow a programming term.
I think it's one of those times where single words should expressly NOT be used to describe the intent.
The intent of this is to presume that the rate of the thing we are trying to detect is constant across subgroups. The definition of a "good" model therefore is one that approximates this.
I'm curious if their data matches that assumption. Do subgroups submit bad applications at the same rate?
It may be that they don't have the data and therefore can't answer that.