- Unlike MAML, in Meta-SGD, along with finding optimal parameter value,Â
, we also find the optimal learning rate,Â
, and update the direction.
- The learning rate is implicitly implemented in the adaptation term. So, in Meta-SGD, we don't initialize a learning rate with a small scalar value. Instead, we initialize them with random values with the same shape as
 and learn them along withÂ
.
- The update equation of the learning rate can be expressed asÂ
.
- Sample n tasks and run SGDÂ for fewer iterations on each of the sampled tasks, and then update our model parameter in a direction that is common to all the tasks.
- The reptile update equation can be expressed asÂ
.