BERT was one of the most exciting NLP papers published in 2018. Instead of using BERT to build an end-to-end model, using word representations from BERT can help you improve your model performance a lot, but save a lot of computing resources.
My idea has been implemented as a bert-embedding and published on Github. check: https://github.com/imgarylai/bert-embedding