I tend to do everything in one take then edit out the mistakes so I haven't got an informed preference, though doing the audio first does seem to make sense from a pacing and "maintaining interest" perspective.. you can always edit down/speed up the video parts, after all.
I suspect that merely being detailed enough to think about these processes and how they work (or not) for you is enough to put you above average in the screencasting stakes.
I've been using it to develop live presentations with slides, and was very helpful in working out a talk for Ignite Phoenix, where timing is extra crucial.
I tend to write down some notes, then do a recording more or less ad lib, see how much I screwed up and flubbed things, and gradual get a feel for what phrasing feels most natural, what steps are needed, what I have to explain.
The end result is a sort of pre-recording of your talk, which you can watch as an aid to practising and getting the whole thing into our head.
Also, good point on keeping screencasts short. I hate having to jump around in some 20-minute recording trying to find where one or another item was explained. Better to break things down to a set of tighter presentations.