- A replay buffer is used in DQN in order to store past experiences, sample a mini-batch of data from it, and use it to train the agent.
- Target networks help in the stability of the training. This is achieved by keeping an additional neural network whose weights are updated using an exponential moving average of the weights of the main neural network. Alternatively, another approach that is also widely used is to copy the weights of the main neural network to the target network once every few thousand steps or so.
- One frame as the state will not help in the Atari Breakout problem. This is because no temporal information is deductible from one frame only. For instance, in one frame alone, the direction of motion of the ball cannot be obtained. If, however, we stack up multiple frames, the velocity and acceleration of the ball can be ascertained.
- L2 loss is known to overfit...
Germany
Slovakia
Canada
Brazil
Singapore
Hungary
Philippines
Mexico
Thailand
Ukraine
Luxembourg
Estonia
Lithuania
Norway
Chile
United States
Great Britain
India
Spain
South Korea
Ecuador
Colombia
Taiwan
Switzerland
Indonesia
Cyprus
Denmark
Finland
Poland
Malta
Czechia
New Zealand
Austria
Turkey
France
Sweden
Italy
Egypt
Belgium
Portugal
Slovenia
Ireland
Romania
Greece
Argentina
Malaysia
South Africa
Netherlands
Bulgaria
Latvia
Australia
Japan
Russia