as a side note, I love that you highlight regex in your new NLP course. There is an inherent tension between the probabilistic nature of models and the need for deterministic outputs in most production settings. Often if we can uncover linguistics rules or regex patterns that guarantee minimal precision (or as our VP puts it - don't look stupid), we'll eschew the model in the short term or use the model to augment the rules.
Also I really appreciated that on of the training goals for ULMfit was to be trainable on a single gpu. With these large-capacity models, training is getting crazy expensive and out of hand. Any chance that your future work will still keep the single gpu training goal?