x i = 1 = 27 ,所以∂ o u t ∂ x i = 3 2 ( x i + 2 ) \frac{\partial_{out}}{\partial_{x_i}}=\frac{3}{2}(x_i+2) ∂ x i ∂ o u t = 2 3 ( x i + 2 )
所以
∂ o u t ∂ x i ∣ x i = 1 = 9 2 = 4.5 \frac{\partial_{out}}{\partial_{x_i}}\bigr\rvert_{x_i=1}=\frac{9}{2}=4.5 ∂ x i ∂ o u t x i = 1 = 2 9 = 4.5
雅可比矩阵
数学上,若有向量值函数 y = f(x)
,那么 y
相对于 x
的梯度是一个雅可比矩阵。
J = [ ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x n ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ⋯ ∂ y m ∂ x n ] J= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_n} \end{bmatrix} J = ∂ x 1 ∂ y 1 ⋮ ∂ x 1 ∂ y m ⋯ ⋱ ⋯ ∂ x n ∂ y 1 ⋮ ∂ x n ∂ y m
通常来说,torch.autograd
是计算雅可比向量积的一个引擎。也就是说,给定任意向量 v
,计算乘积J ⋅ v J·v J ⋅ v 。如果 v
恰好是标量函数 l = g(y)
的梯度,也即v = ( ∂ l ∂ y 1 , ⋯ , ∂ l ∂ y m ) T v=(\frac{\partial l}{\partial y_1}, \cdots ,\frac{\partial l}{\partial y_m})^T v = ( ∂ y 1 ∂ l , ⋯ , ∂ y m ∂ l ) T ,那么根据链式法则,雅可比向量积的计算刚好就是 l
对 x
的导数:
J ⋅ v = [ ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x n ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ⋯ ∂ y m ∂ x n ] [ ∂ l ∂ y 1 ⋮ ∂ l ∂ y m ] = [ ∂ l ∂ x 1 ⋮ ∂ l ∂ x n ] J·v= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_n} \end{bmatrix} \begin{bmatrix} \frac{\partial l}{\partial y_1}\\ \vdots\\ \frac{\partial l}{\partial y_m} \end{bmatrix}=\begin{bmatrix} \frac{\partial l}{\partial x_1}\\ \vdots\\ \frac{\partial l}{\partial x_n} \end{bmatrix} J ⋅ v = ∂ x 1 ∂ y 1 ⋮ ∂ x 1 ∂ y m ⋯ ⋱ ⋯ ∂ x n ∂ y 1 ⋮ ∂ x n ∂ y m ∂ y 1 ∂ l ⋮ ∂ y m ∂ l = ∂ x 1 ∂ l ⋮ ∂ x n ∂ l
雅可比向量积的这一特性使得将外部梯度输入到具有非标量输出的模型中变得非常方便。
1 2 3 4 5 x = torch.randn(3 , requires_grad=True ) y = x * 2 while y.data.norm() < 1000 : y = y * 2 print (y)
下面这种情况,y
不再是标量。torch.autograd
不能直接计算完整的雅可比矩阵,但是如果我们只想要雅可比向量积,只需将这个向量作为参数传给 backward
。
1 2 3 v = torch.tensor([0.1 , 1.0 , 0.0001 ], dtype=torch.float ) y.backward(v)print (x.grad)
也可以通过将代码块包装在 with torch.no_grad():
中,来阻止 autograd
跟踪设置了 .requires_grad=True
的张量的历史记录。
1 2 3 4 5 print (x.requires_grad)print ((x ** 2 ).requires_grad)with torch.no_grad(): print ((x ** 2 ).requires_grad)
Variable Variable
和 Tensor
的区别是 Variable
会被放入计算图中,然后进行前向传播,反向传播,自动求导。
Variable
是在 torch.autograd.Variable
中的,使用Variable
需要导入 torch.autograd.Variable
。
1 2 3 4 5 6 7 8 9 10 11 12 13 from torch.autograd import Variable x = Variable(torch.Tensor([1 ]),requires_grad=True ) w = Variable(torch.Tensor([2 ]),requires_grad=True ) b = Variable(torch.Tensor([3 ]),requires_grad=True ) y = w * x + b y.backward()print (x.grad)print (w.grad)print (b.grad)
搭建一个简单的神经网络
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import torch batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 x = torch.randn(batch_n, input_data) y = torch.randn(batch_n, output_data) w1 = torch.randn(input_data, hidden_layer) w2 = torch.randn(hidden_layer, output_data) epoch_n = 20 learning_rate = 1e-6 for epoch in range (epoch_n): h1 = x.mm(w1) h1 = h1.clamp(min =0 ) y_pred = h1.mm(w2) loss = (y_pred - y).pow (2 ).sum () print ("Epoch:{}, Loss:{:.4f}" .format (epoch, loss)) grad_y_pred = 2 *(y_pred-y) grad_w2 = h1.t().mm(grad_y_pred) grad_h = grad_y_pred.clone() grad_h = grad_h.mm(w2.t()) grad_h.clamp_(min =0 ) grad_w1 = x.t().mm(grad_h) w1 -= learning_rate*grad_w1 w2 -= learning_rate*grad_w2
使用 Variable
搭建一个自动计算梯度的神经网络
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import torchfrom torch.autograd import Variable batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 x = Variable(torch.randn(batch_n, input_data), requires_grad=False ) y = Variable(torch.randn(batch_n, output_data), requires_grad=False ) w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad=True ) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad=True ) epoch_n = 20 learning_rate = 1e-6 for epoch in range (epoch_n): y_pred = x.mm(w1).clamp(min =0 ).mm(w2) loss = (y_pred-y).pow (2 ).sum () print ("Epoch:{},Loss:{:.4f}" .format (epoch, loss)) loss.backward() w1.data -= learning_rate*w1.grad.data w2.data -= learning_rate*w2.grad.data w1.grad.data.zero_() w2.grad.data.zero_()
使用 nn.Module
自定义传播函数来搭建神经网络
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 import torchfrom torch.autograd import Variable batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 class Model (torch.nn.Module): def __init__ (self ): super (Model, self).__init__() def forward (self, input_n, w1, w2 ): x = torch.mm(input_n, w1) x = torch.clamp(x, min =0 ) x = torch.mm(x, w2) return x def backward (self ): pass model = Model() x = Variable(torch.randn(batch_n, input_data), requires_grad=False ) y = Variable(torch.randn(batch_n, output_data), requires_grad=False ) w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad=True ) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad=True ) epoch_n = 20 learning_rate = 1e-6 for epoch in range (epoch_n): y_pred = model(x, w1, w2) loss = (y_pred-y).pow (2 ).sum () print ("Epoch:{},Loss:{:.4f}" .format (epoch, loss)) loss.backward() w1.data -= learning_rate*w1.grad.data w2.data -= learning_rate*w2.grad.data w1.grad.data.zero_() w2.grad.data.zero_()
Dataset 数据集 torch.utils.data.Dataset
是代表这一数据的抽象类,可以自己定义数据类继承和重写这个抽象类,只需要定义__len__
和__getitem__
函数即可。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from torch.utils.data import Datasetclass myDataset (Dataset ): def __init__ (self, csv_file, txt_file, root_dir, other_file ): self.csv_data = pd.read_csv(csv_file) with open (txt_file, 'r' ) as f: data_list = f.readlines() self.txt_data = data_list self.root_dir = root_dir def __len__ (self ): return len (self.csv_data) def __getitem__ (self, idx ): data = (self.csv_data[idx], self.txt_data[idx]) return data
通过上面的方式,可以定义需要的数据类,可以通过迭代的方法取得每一个数据,但是这样很难实现取 batch
,shuffle
或者多线程去读取数据,所以 Pytorch
中提供了 torch.utils.data.DataLoader
来定义一个新迭代器。
1 2 from torch.utils.data import DataLoader dataiter = DataLoader(myDataset, batch_size=32 )
nn.Module 模组 所有的层结构和损失函数来自 torch.nn
。
1 2 3 4 5 6 7 8 9 10 11 from torch import nnclass net_name (nn.Module): def __init__ (self, other_arguments ): super (net_name, self).__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size) def forward (self, x ): x = self.conv1(x) return x
一个神经网络的典型训练过程如下:
定义包含一些可学习参数(或者叫权重)的神经网络 在输入数据集上迭代 通过网络处理输入 计算 loss
(输出和正确答案的距离) 将梯度反向传播给网络的参数 更新网络的权重,一般使用一个简单的规则:weight = weight - learning_rate * gradient
使用 torch.nn
内的序列容器 Sequential
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import torch batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 model = torch.nn.Sequential( torch.nn.Linear(input_data, hidden_layer), torch.nn.ReLU(), torch.nn.Linear(hidden_layer, output_data) )print (model)
使用 nn.Module
定义一个神经网络
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 import torchimport torch.nn as nnimport torch.nn.functional as Fclass Net (nn.Module): def __init__ (self ): super (Net, self).__init__() self.conv1 = nn.Conv2d(1 , 6 , 5 ) self.conv2 = nn.Conv2d(6 , 16 , 5 ) self.fc1 = nn.Linear(16 * 5 * 5 , 120 ) self.fc2 = nn.Linear(120 , 84 ) self.fc3 = nn.Linear(84 , 10 ) def forward (self, x ): x = F.max_pool2d(F.relu(self.conv1(x)), (2 , 2 )) x = F.max_pool2d(F.relu(self.conv2(x)), 2 ) x = x.view(-1 , self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features (self, x ): size = x.size()[1 :] num_features = 1 for s in size: num_features *= s return num_features net = Net()print (net)
torch.optim 优化 优化算法分为两大类:
(1)一阶优化算法 使用各个参数的梯度值来更新参数,最常用的是梯度下降。梯度下降的功能是通过寻找最小值,控制方差,更新模型参数,最终使模型收敛,网络的参数更新公式:
w = w − η ∂ L ∂ w w = w - \eta \frac{\partial L}{\partial w} w = w − η ∂ w ∂ L
其中,η \eta η 是学习率,∂ L ∂ w \frac{\partial L}{\partial w} ∂ w ∂ L 是损失函数关于参数w w w 的梯度。
(2)二阶优化算法 二阶优化算法使用了二阶导数(Hessian方法)来最小化或最大化损失函数,主要是基于牛顿法:
w = w − η H − 1 ∂ L ∂ w w = w - \eta H^{-1} \frac{\partial L}{\partial w} w = w − η H − 1 ∂ w ∂ L
其中,H H H 是损失函数关于参数w w w 的Hessian矩阵。
1 2 optimizer = torch.optim.SGD(model.parameters(), lr=0.01 , momentum=0.9 )
模型的保存和加载 1 2 3 4 5 torch.save(model,path) torch.save(model.state_dict(),path)
1 2 3 4 5 model = torch.load(path) model.load_state_dict(torch.load(path))
小嗷犬
分享技术,记录生活
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 小嗷犬 !