Quickstart
This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper.
Working with data
PyTorch has two primitives to work with data: torch.utils.data.DataLoader
and torch.utils.data.Dataset
. Dataset
stores the samples and their corresponding labels, and DataLoader
wraps an iterable around the Dataset
.
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
PyTorch offers domain-specific libraries such as TorchText, TorchVision, andTorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.
The torchvision.datasets
module contains Dataset
objects for many real-world vision data like CIFAR, COCO (full list here). In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset
includes two arguments:transform
and target_transform
to modify the samples and labels respectively.
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(),
)
# Download test data from open datasets.
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor(),
)
print(len(training_data))
60000
We pass the Dataset
as an argument to DataLoader
. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.
DataLoader
返回第一项是数据,第二项目是标签
batch_size = 64
# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
for X, y in test_dataloader:
print(f"Shape of X [N, C, H, W]: {X.shape}")
print(f"Shape of y: {y.shape} {y.dtype}")
break
print(len(train_dataloader))
Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
938 #共60000条数据,每个批次64,一共938批
Read more about loading data in PyTorch.
Creating Models
To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__
function and specify how data will pass through the network in the forward
function. To accelerate operations in the neural network, we move it to the accelerator such as CUDA, MPS, MTIA, or XPU. If the current accelerator is available, we will use it. Otherwise, we use the CPU.
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")
# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten() #维度展平
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
print(model)
Using mps device
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
)
)
Read more about building neural networks in PyTorch.
Optimizing the Model Parameters
To train a model, we need a loss function and an optimizer.
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
print("size="+str(size))
model.train() # 启用 Batch Normalization 和 Dropout,归一化,随机丢弃神经元防止过拟合,测试时不丢弃
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# Compute prediction error
pred = model(X)
#print("---------->pred=")
#print(pred)
#print("-----------------------<")
loss = loss_fn(pred, y)
# Backpropagation
loss.backward() # 计算梯度
optimizer.step() # 根据梯度优化参数
optimizer.zero_grad() # 梯度归零
if batch % 100 == 0: # 每100个batch打印一次
loss, current = loss.item(), (batch + 1) * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
We also check the model’s performance against the test dataset to ensure it is learning.
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
size=60000 # 938个批次,每一百打印一次,打印10次
loss: 2.315621 [ 64/60000]
loss: 2.297486 [ 6464/60000]
loss: 2.284707 [12864/60000]
loss: 2.276697 [19264/60000]
loss: 2.251865 [25664/60000]
loss: 2.229615 [32064/60000]
loss: 2.235096 [38464/60000]
loss: 2.205865 [44864/60000]
loss: 2.203413 [51264/60000]
loss: 2.159754 [57664/60000]
Test Error:
Accuracy: 39.3%, Avg loss: 2.162051
Epoch 2
-------------------------------
size=60000
loss: 2.175932 [ 64/60000]
loss: 2.160811 [ 6464/60000]
loss: 2.113213 [12864/60000]
loss: 2.128377 [19264/60000]
loss: 2.065253 [25664/60000]
loss: 2.014427 [32064/60000]
loss: 2.038499 [38464/60000]
loss: 1.966430 [44864/60000]
loss: 1.969670 [51264/60000]
loss: 1.885340 [57664/60000]
Test Error:
Accuracy: 51.0%, Avg loss: 1.894609
Epoch 3
-------------------------------
size=60000
loss: 1.927377 [ 64/60000]
loss: 1.893924 [ 6464/60000]
loss: 1.791221 [12864/60000]
loss: 1.828687 [19264/60000]
loss: 1.707735 [25664/60000]
loss: 1.663736 [32064/60000]
loss: 1.677831 [38464/60000]
loss: 1.585051 [44864/60000]
loss: 1.603040 [51264/60000]
loss: 1.495138 [57664/60000]
Test Error:
Accuracy: 59.2%, Avg loss: 1.523530
Epoch 4
-------------------------------
size=60000
loss: 1.584166 [ 64/60000]
loss: 1.549330 [ 6464/60000]
loss: 1.415807 [12864/60000]
loss: 1.485298 [19264/60000]
loss: 1.358027 [25664/60000]
loss: 1.356192 [32064/60000]
loss: 1.363756 [38464/60000]
loss: 1.292613 [44864/60000]
loss: 1.323514 [51264/60000]
loss: 1.225876 [57664/60000]
Test Error:
Accuracy: 62.5%, Avg loss: 1.258216
Epoch 5
-------------------------------
size=60000
loss: 1.329733 [ 64/60000]
loss: 1.309326 [ 6464/60000]
loss: 1.161434 [12864/60000]
loss: 1.264519 [19264/60000]
loss: 1.133686 [25664/60000]
loss: 1.158902 [32064/60000]
loss: 1.173635 [38464/60000]
loss: 1.112870 [44864/60000]
loss: 1.150839 [51264/60000]
loss: 1.069254 [57664/60000]
Test Error:
Accuracy: 64.2%, Avg loss: 1.094074
Done!
Read more about Training your model.
Saving Models
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).
for var_name in model.state_dict():
print(var_name, "\t", model.state_dict()[var_name])
for var_name in optimizer.state_dict():
print(var_name, "\t", optimizer.state_dict()[var_name])
state {}
param_groups [{'lr': 0.001, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'fused': None, 'params': [0, 1, 2, 3, 4, 5]}]
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
Saved PyTorch Model State to model.pth
Loading Models
The process for loading a model includes re-creating the model structure and loading the state dictionary into it.
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))
This model can now be used to make predictions.
classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
x = x.to(device)
pred = model(x)
predicted, actual = classes[pred[0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')
Predicted: "Ankle boot", Actual: "Ankle boot"