Yeah, there is an interesting question there (always has been). When do you stop citing the paper for a specific model?
Just to take some examples, is BiCGStab famous enough now that we can stop citing van der Vorst? Is the AdS/CFT correspondence well known enough that we can stop citing Maldacena? Are transformers so ubiquitous that we don't have to cite "Attention is all you need" anymore? I would be closer to yes than no on these, but it's not 100% clear-cut.
One obvious criterion has to be "if you leave out the citation, will it be obvious to the reader what you've done/used"? Another metric is approximately "did the original author get enough credit already"?