> Is the wearable accurate enough to be sure that 3bpm is not a measurement fluke
If the statistical tests show significance (and are valid), the answer to this question is yes. If you have enough data you can make strong conclusions even witwith imperfect hardware.
Not at all, if you have a lot of data coming from imperfect hardware (which can mean both a fixed bias and unknown variance), and you don't know the variance for plenty of practical reasons, you are left with a result that is statistically significant, but wrong