PI smartly combined discretized tokens with flow-matching for efficient training, and it works well in most cases. Still, end-effector representation may be better for teleop with devices like a SpaceMouse, VR, or VibeTracker. PI-07 also supports EEF, but I am not sure how much data is needed to fine-tune PI-05 for that.
I'd suggest starting with the default pi05 model. Data strategy is probably more important than model improvements. Since VLA performance is highly dependent on the data/action distribution and it's easy to modify. After that, you can add high-level reasoning like PI05. I visited a Chinese VLA company that already adopted the PI-05 approach, and it works quite well in practice.