2019最牛的梯度优化算法出炉,AdaBound实验对比代码
https://github.com/LiyuanLucasLiu/RAdam
lookahead
73上用的这个:
https://github.com/alphadl/lookahead.pytorch
https://github.com/dseuss/pytorch-lookahead-optimizer/blob/master/optim.py
这个有用法:
base_opt = torch.optim.Adam(model.parameters(), lr=1e-3, betas=(0.9, 0.999)) # Any optimizer
lookahead = Lookahead(base_opt, k=5, alpha=0.5) # Initialize Lookahead
lookahead.zero_grad()
loss_function(model(input), target).backward() # Self-defined loss function
lookahead.step()
class Lookahead(nn.Module):
# Wang et al 2016 - Lookahead Convolution Layer for Unidirectional Recurrent Neural Networks
# input shape - sequence, batch, feature - TxNxH
# output shape - same as input
def __init__(self, n_features, context):
# should we handle batch_first=True?
super(Lookahead, self).__init__()
self.n_features = n_features
self.weight = Parameter(torch.Tensor(n_features, context + 1))
assert context > 0
self.context = context
self.register_parameter('bias', None)
self.init_parameters()
def init_parameters(self): # what's a better way initialiase this layer?
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.uniform_(-stdv, stdv)
def forward(self, input):
seq_len = input.size(0)
# pad the 0th dimension (T/sequence) with zeroes whose number = context
# Once pytorch's padding functions have settled, should move to those.
padding = torch.zeros(self.context, *(input.size()[1:])).type_as(input)
x = torch.cat((input, padding), 0)
# add lookahead windows (with context+1 width) as a fourth dimension
# for each seq-batch-feature combination
x = [x[i:i + self.context + 1] for i in range(seq_len)] # TxLxNxH - sequence, context, batch, feature
x = torch.stack(x)
x = x.permute(0, 2, 3, 1) # TxNxHxL - sequence, batch, feature, context
x = torch.mul(x, self.weight).sum(dim=3)
return x
def __repr__(self):
return self.__class__.__name__ + '(' \
+ 'n_features=' + str(self.n_features) \
+ ', context=' + str(self.context) + ')'
转载自原文链接, 如需删除请联系管理员。
原文链接:lookahead,转载请注明来源!