TL;DR:
* Scale the time series data and quantize the floating point values into B bins.
* Each bin becomes a corresponding token id in a vocabulary of B embeddings.
* Train a small LLM to predict the next token id given a sequence of token ids.
* At each time step, the LLM gives you a probability distribution over B bins.