Yes this is what I'm saying. Evaluation can be way more than just "slightly off" though and can make the evaluation MUCH weaker because it might go down "ply-routes" that aren't just suboptimal - they simply wouldn't work at all.
For example thinking it had a "move set" that lead to a checkmate but the final ply involves a bishop that is five squares away per my rule constraint mentioned earlier.
It's fine for fun but the statement that "Stockfish's skill level applies correctly" is not a true one at any moderate level of play. (above 1200 ELO).
That's why I suggested looking into Fairy which supports runtime rule variations.