https://www.deepcrawl.com/blog/best-practice/noindex-disallo...
Specifically the part:
>Noindex (robots.txt) + Disallow: This prevents pages appearing in the index, and also prevents the pages being crawled. However, remember that no PageRank can pass through this page.
They can still end up in the index, just with a not that says "no description is available for this page"
I remember years ago the debate Matt Cutts asked if G should index and pointed out that other engines were indexing pages that were robots.txt blocked.. meh.
I had to setup a 301 to homepage redirect system to zap all the pages I took out... although some other engines still spider looking for those pages even though I removed them with 301s over a year ago - perhaps the spammers still have links going to them?
I started just blocking all indexing from sogu or whatever it's called and similar bots in the robots.txt and then started to look at ip / cidrs to block further after thinking they would get the hint after several months.
Hope your situation is different.