There is something to say for roundtripping PDFs from source you control (you can accurately model the corruption produced by a particular system) but you will certainly see new and different phenomena if you try more.
I'd agree that spacy's sentence segmentation is better than many of the alternatives.