Note that the encoded dataset (through the GPT-2 BPE tokenizer) will be much, much less than 17GB, both on disk and in memory (in my experience it can be anywhere from 1/3rd to 1/2 the size)
If finetuning an existing GPT-2 model, you must use that BPE tokenizer; you could theoretically use your own but that wouldn't make a difference performance-wise and you'd just have a lot of wasted tokenspace.
The efficiencies of using your own tokenizer for bespoke, esoteric content that does not match typical internet speak (like this) are why I recommend training your own tokenizer and GPT-2 from scratch if possible.