I need to investigate scope and cost for training and hosting an LLM to ingest a small to medium size corpus of docs (400 K pages of web text plus PDFs) and train an LLM over it for the purposes of providing a chat like query interface.
How do I a) scope and estimate GPU/hrs needed, and b) decide what pre trained transformer model(s) might be best as a starting point?