In cpython:
DUP_TOP:
PyObject *top = TOP();
Py_INCREF(top);
PUSH(top);
FAST_DISPATCH();
TOP:
(stack_pointer[-1])
LOAD_FAST:
PyObject *value = GETLOCAL(oparg);
if (value == NULL) { /* throw */ }
Py_INCREF(value);
PUSH(value);
FAST_DISPATCH();
GETLOCAL:
(fastlocals[i])
So looks like either way, you have an array access of something definitely in cache, a Py_INCREF, a PUSH, and a FAST_DISPATCH. The walrus operator saves you a null-check, but that check is probably skipped right over by the branch predictor, as it always throws. I'd bet the performance is indistinguishable, but I'd be interested to see for real.