AI/ML News & Innovations Hub

I am training an LSTM to a univariate time series and I have some questions about how to evaluate the train vs validations loss charts and which number of epochs to use in the model.

To give more context about my data. It is a monthly univariate time series and the LSTM wants to predict the next 12 data points. The data is in sliding window format with 12 inputs and 12 outputs. A summary of the model is below.

In both charts I see that the error in the validation dataset is smaller than the error in the training set. It means that I cannot generalize well so I am underfitting, right? The training and validation loss seems to converge around 40 epochs for the MAE loss and for the MSE.
Should I use MAE as loss? As far as I know, MAE and MSE are the error metrics generally used for time series.
Which number of epochs should I use for this model?

#DEFINE THE MODEL
lstm_model <- keras_model_sequential()
  lstm_model %>%
  layer_lstm(units = 12, #24, # size of the layer
       batch_input_shape = c(1, 12, 1), # batch size, timesteps, features
       return_sequences = TRUE,
       stateful = TRUE,
       name = "LSTM") %>%
  time_distributed(keras::layer_dense(units = 1), name = "Output")

  #COMPILE
    lstm_model %>%
    compile(loss = 'mae', optimizer = optimizer_adam(lr = 0.001, decay = 1e-6), metrics = 'mse')
  summary(lstm_model)
 
 #FIT THE MODEL
  validation_split = 0.25 
    train_history = lstm_model %>% 
    fit(
    x = x_train_arr,
    y = y_train_arr,
    batch_size = 1,
    epochs = 100,
    verbose = 1,
    validation_split = validation_split,
    shuffle = FALSE
    )

Understanding train vs validation loss chart