Establishing a training and testing regime
Even with lots of data available, we have to ask ourselves; How do we want to split data between training, validation, and testing. This dataset already comes with a test set of future data, therefore we don't have to worry about the test set, but for the validation set, there are two ways of splitting: a walk-forward split, and a side-by-side split:
In a walk-forward split, we train on all 145,000 series. To validate, we are going to use more recent data from all the series. In a side-by-side split, we sample a number of series for training and use the rest for validation.
Both have advantages and disadvantages. The disadvantage of walk-forward splitting is that we cannot use all of the observations of the series for our predictions. The disadvantage of side-by-side splitting is that we cannot use all series for training.
If we have few series, but multiple data observations per series, a walk-forward split is preferable. However...