x i = 1 = 27 ,所以∂ o u t ∂ x i = 3 2 ( x i + 2 ) \frac{\partial_{out}}{\partial_{x_i}}=\frac{3}{2}(x_i+2) ∂ x i ∂ o u t = 2 3 ( x i + 2 )
∂ o u t ∂ x i ∣ x i = 1 = 9 2 = 4.5 \frac{\partial_{out}}{\partial_{x_i}}\bigr\rvert_{x_i=1}=\frac{9}{2}=4.5 ∂ x i ∂ o u t x i = 1 = 2 9 = 4.5
数学上,若有向量值函数 y = f(x)
,那么 y
相对于 x
J = [ ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x n ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ⋯ ∂ y m ∂ x n ] J= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_n} \end{bmatrix} J = ∂ x 1 ∂ y 1 ⋮ ∂ x 1 ∂ y m ⋯ ⋱ ⋯ ∂ x n ∂ y 1 ⋮ ∂ x n ∂ y m
是计算雅可比向量积的一个引擎。也就是说,给定任意向量 v
,计算乘积J ⋅ v J·v J ⋅ v 。如果 v
恰好是标量函数 l = g(y)
的梯度,也即v = ( ∂ l ∂ y 1 , ⋯ , ∂ l ∂ y m ) T v=(\frac{\partial l}{\partial y_1}, \cdots ,\frac{\partial l}{\partial y_m})^T v = ( ∂ y 1 ∂ l , ⋯ , ∂ y m ∂ l ) T ,那么根据链式法则,雅可比向量积的计算刚好就是 l
对 x
J ⋅ v = [ ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x n ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ⋯ ∂ y m ∂ x n ] [ ∂ l ∂ y 1 ⋮ ∂ l ∂ y m ] = [ ∂ l ∂ x 1 ⋮ ∂ l ∂ x n ] J·v= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_n} \end{bmatrix} \begin{bmatrix} \frac{\partial l}{\partial y_1}\\ \vdots\\ \frac{\partial l}{\partial y_m} \end{bmatrix}=\begin{bmatrix} \frac{\partial l}{\partial x_1}\\ \vdots\\ \frac{\partial l}{\partial x_n} \end{bmatrix} J ⋅ v = ∂ x 1 ∂ y 1 ⋮ ∂ x 1 ∂ y m ⋯ ⋱ ⋯ ∂ x n ∂ y 1 ⋮ ∂ x n ∂ y m ∂ y 1 ∂ l ⋮ ∂ y m ∂ l = ∂ x 1 ∂ l ⋮ ∂ x n ∂ l
1 2 3 4 5 x = torch.randn(3 , requires_grad=True ) y = x * 2 while y.data.norm() < 1000 : y = y * 2 print (y)
不能直接计算完整的雅可比矩阵,但是如果我们只想要雅可比向量积,只需将这个向量作为参数传给 backward
1 2 3 v = torch.tensor([0.1 , 1.0 , 0.0001 ], dtype=torch.float ) y.backward(v)print (x.grad)
也可以通过将代码块包装在 with torch.no_grad():
中,来阻止 autograd
跟踪设置了 .requires_grad=True
1 2 3 4 5 print (x.requires_grad)print ((x ** 2 ).requires_grad)with torch.no_grad(): print ((x ** 2 ).requires_grad)
Variable Variable
和 Tensor
的区别是 Variable
是在 torch.autograd.Variable
需要导入 torch.autograd.Variable
1 2 3 4 5 6 7 8 9 10 11 12 13 from torch.autograd import Variable x = Variable(torch.Tensor([1 ]),requires_grad=True ) w = Variable(torch.Tensor([2 ]),requires_grad=True ) b = Variable(torch.Tensor([3 ]),requires_grad=True ) y = w * x + b y.backward()print (x.grad)print (w.grad)print (b.grad)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import torch batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 x = torch.randn(batch_n, input_data) y = torch.randn(batch_n, output_data) w1 = torch.randn(input_data, hidden_layer) w2 = torch.randn(hidden_layer, output_data) epoch_n = 20 learning_rate = 1e-6 for epoch in range (epoch_n): h1 = x.mm(w1) h1 = h1.clamp(min =0 ) y_pred = h1.mm(w2) loss = (y_pred - y).pow (2 ).sum () print ("Epoch:{}, Loss:{:.4f}" .format (epoch, loss)) grad_y_pred = 2 *(y_pred-y) grad_w2 = h1.t().mm(grad_y_pred) grad_h = grad_y_pred.clone() grad_h = grad_h.mm(w2.t()) grad_h.clamp_(min =0 ) grad_w1 = x.t().mm(grad_h) w1 -= learning_rate*grad_w1 w2 -= learning_rate*grad_w2
使用 Variable
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import torchfrom torch.autograd import Variable batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 x = Variable(torch.randn(batch_n, input_data), requires_grad=False ) y = Variable(torch.randn(batch_n, output_data), requires_grad=False ) w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad=True ) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad=True ) epoch_n = 20 learning_rate = 1e-6 for epoch in range (epoch_n): y_pred = x.mm(w1).clamp(min =0 ).mm(w2) loss = (y_pred-y).pow (2 ).sum () print ("Epoch:{},Loss:{:.4f}" .format (epoch, loss)) loss.backward() w1.data -= learning_rate*w1.grad.data w2.data -= learning_rate*w2.grad.data w1.grad.data.zero_() w2.grad.data.zero_()
使用 nn.Module
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 import torchfrom torch.autograd import Variable batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 class Model (torch.nn.Module): def __init__ (self ): super (Model, self).__init__() def forward (self, input_n, w1, w2 ): x = torch.mm(input_n, w1) x = torch.clamp(x, min =0 ) x = torch.mm(x, w2) return x def backward (self ): pass model = Model() x = Variable(torch.randn(batch_n, input_data), requires_grad=False ) y = Variable(torch.randn(batch_n, output_data), requires_grad=False ) w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad=True ) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad=True ) epoch_n = 20 learning_rate = 1e-6 for epoch in range (epoch_n): y_pred = model(x, w1, w2) loss = (y_pred-y).pow (2 ).sum () print ("Epoch:{},Loss:{:.4f}" .format (epoch, loss)) loss.backward() w1.data -= learning_rate*w1.grad.data w2.data -= learning_rate*w2.grad.data w1.grad.data.zero_() w2.grad.data.zero_()
Dataset 数据集 torch.utils.data.Dataset
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from torch.utils.data import Datasetclass myDataset (Dataset ): def __init__ (self, csv_file, txt_file, root_dir, other_file ): self.csv_data = pd.read_csv(csv_file) with open (txt_file, 'r' ) as f: data_list = f.readlines() self.txt_data = data_list self.root_dir = root_dir def __len__ (self ): return len (self.csv_data) def __getitem__ (self, idx ): data = (self.csv_data[idx], self.txt_data[idx]) return data
通过上面的方式,可以定义需要的数据类,可以通过迭代的方法取得每一个数据,但是这样很难实现取 batch
或者多线程去读取数据,所以 Pytorch
中提供了 torch.utils.data.DataLoader
1 2 from torch.utils.data import DataLoader dataiter = DataLoader(myDataset, batch_size=32 )
nn.Module 模组 所有的层结构和损失函数来自 torch.nn
1 2 3 4 5 6 7 8 9 10 11 from torch import nnclass net_name (nn.Module): def __init__ (self, other_arguments ): super (net_name, self).__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size) def forward (self, x ): x = self.conv1(x) return x
定义包含一些可学习参数(或者叫权重)的神经网络 在输入数据集上迭代 通过网络处理输入 计算 loss
(输出和正确答案的距离) 将梯度反向传播给网络的参数 更新网络的权重,一般使用一个简单的规则:weight = weight - learning_rate * gradient
使用 torch.nn
内的序列容器 Sequential
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import torch batch_n = 100 hidden_layer = 100 input_data = 1000 output_data = 10 model = torch.nn.Sequential( torch.nn.Linear(input_data, hidden_layer), torch.nn.ReLU(), torch.nn.Linear(hidden_layer, output_data) )print (model)
使用 nn.Module
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 import torchimport torch.nn as nnimport torch.nn.functional as Fclass Net (nn.Module): def __init__ (self ): super (Net, self).__init__() self.conv1 = nn.Conv2d(1 , 6 , 5 ) self.conv2 = nn.Conv2d(6 , 16 , 5 ) self.fc1 = nn.Linear(16 * 5 * 5 , 120 ) self.fc2 = nn.Linear(120 , 84 ) self.fc3 = nn.Linear(84 , 10 ) def forward (self, x ): x = F.max_pool2d(F.relu(self.conv1(x)), (2 , 2 )) x = F.max_pool2d(F.relu(self.conv2(x)), 2 ) x = x.view(-1 , self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features (self, x ): size = x.size()[1 :] num_features = 1 for s in size: num_features *= s return num_features net = Net()print (net)
torch.optim 优化 优化算法分为两大类:
(1)一阶优化算法 使用各个参数的梯度值来更新参数,最常用的是梯度下降。梯度下降的功能是通过寻找最小值,控制方差,更新模型参数,最终使模型收敛,网络的参数更新公式:
w = w − η ∂ L ∂ w w = w - \eta \frac{\partial L}{\partial w} w = w − η ∂ w ∂ L
其中,η \eta η 是学习率,∂ L ∂ w \frac{\partial L}{\partial w} ∂ w ∂ L 是损失函数关于参数w w w 的梯度。
(2)二阶优化算法 二阶优化算法使用了二阶导数(Hessian方法)来最小化或最大化损失函数,主要是基于牛顿法:
w = w − η H − 1 ∂ L ∂ w w = w - \eta H^{-1} \frac{\partial L}{\partial w} w = w − η H − 1 ∂ w ∂ L
其中,H H H 是损失函数关于参数w w w 的Hessian矩阵。
1 2 optimizer = torch.optim.SGD(model.parameters(), lr=0.01 , momentum=0.9 )
模型的保存和加载 1 2 3 4 5 torch.save(model,path) torch.save(model.state_dict(),path)
1 2 3 4 5 model = torch.load(path) model.load_state_dict(torch.load(path))
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 小嗷犬 !