The problem is set up as a reinforcement learning problem, with a trial and error method. The environment is described using state_values state_values (?), and the state_values are changed by actions. The actions are determined by an algorithm, based on the current state_value, in order to achieve a particular state_value that is termed a Markov model. In an ideal case, the past state_values does have an influence on future state_values, but here, we assume that the current state_value has all of the previous state_values encoded. There are two types of state_values; one is observable, and the other is non-observable. The model has to take non-observable state_values into account, as well. That is called a Hidden Markov model.




















































