[Pytorch] 모듈과 데이터셋 + 데이터로더

Study/AI

[Pytorch] 모듈과 데이터셋 + 데이터로더

motti 2023. 3. 19. 20:18

torch.nn.Module

딥러닝을 구성하는 Layer의 base class
input, Output, Forward, Backward 정의
학습의 대상이 되는 parameter(tensor) 정의

nn.Parameter

Tensor 객체의 상속 객체
nn.Module 내에 attribute가 될 때는 required_grad = True로 지정되어 학습 대상이 되는 Tensor
보통 우리가 직접 지정할 일은 잘 없고 대부분의 layer에는 weights 값들이 지정되어 있음.

class MyLinear(nn.Module):
	def init (self, in_features, out_features, bias=True):
		super(). init ()
		self.in_features = in_features
		self.out_features = out_features
		self.weights = nn.Parameter(
			torch.randn(in_features, out_features))
		self.bias = nn.Parameter(torch.randn(out_features))

	def forward(self, x : Tensor):
		return x @ self.weights + self.bias

Backward

Layer에 있는 Parameter들의 미분을 수행
Forward의 결과값(model의 output = 예측치)과 실제값간의 차이(loss)에 대해 미분을 수행
해당 값으로 Parameter 업데이트
학습을 진행할때 아래 4단계를 꼭 진행한다.
- zero_grad,loss,backward,step

for epoch in range(epochs):
… …
	# Clear gradient buffers because we don't want any gradient from previous epoch to
#carry forward
	optimizer.zero_grad()
	# get output from the model, given the inputs
	outputs = model(inputs)
	# get loss for the predicted output
	loss = criterion(outputs, labels)
	print(loss)
	# get gradients w.r.t to parameters
	loss.backward()
	# update parameters
	optimizer.step()

Backward from the scratch

실제 backward는 Module 단계에서 직접 지정가능
Module에서 backwward와 optimizer 오버라이딩
사용자가 직접 미분수식을 써야하는 부담이 있다.
- 쓸 일은 없으나 순서를 이해할 필요는 있음

Dataset 클래스

데이터 입력 형태를 정의하는 클래스
데이터를 입력하는 방식의 표준화
이미지,텍스트,오디오 등에 따라 다른 입력정의

출처 : 네이버 부스트캠프 ai tech 강의 컨텐츠

import torch
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
		### 초기 데이터 생성 방법을 지정
    def __init__(self, text, labels):
            self.labels = labels
            self.data = text
		### 데이터의 전체길이
    def __len__(self):
            return len(self.labels)
		### index 값을 주었을 때 반환되는 데이터의 형태(X,y)
    def __getitem__(self, idx):
            label = self.labels[idx]
            text = self.data[idx]
            sample = {"Text": text, "Class": label}
            return sample

Dataset 클래스 생성시 유의점

데이터 형태에 따라 각 함수를 다르게 정의함
모든 것을 데이터 생성 시점에 처리할 필요는 없음
- image의 Tensor 변화는 학습에 필요한 시점에 반환
데이터 셋에 대한 표준화된 처리방법 제공이 필요하다. 이는 후속연구자 또는 동료들에게 필요함
최근에는 HuggingFace등 표준화된 라이브러리 사용

DataLoader 클래스

Data의 Batch를 생성해주는 클래스
학습직전(Gpu에 feed전) 데이터의 변화를 책임
Tensor로 변환 + Batch 처리가 메인 업무
병력적인 데이터 전처리코드의 고민 필요

'Study > AI' 카테고리의 다른 글

[AI] Recurrent Neural Networks(RNN) (0)	2023.03.26
[AI] Convolutional Neural Networks(CNN) (0)	2023.03.26
[Pytorch] 파이토치 기초 (0)	2023.03.19
[AI] 딥러닝 기초 (0)	2023.03.12
[Math] 경사하강법 (0)	2023.03.12

현재글[Pytorch] 모듈과 데이터셋 + 데이터로더

모티's Study Blog

AI/데이터 분석을 공부하고 있습니다!

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

모티's Study Blog

[Pytorch] 모듈과 데이터셋 + 데이터로더

torch.nn.Module

nn.Parameter

Backward

Backward from the scratch

Dataset 클래스

Dataset 클래스 생성시 유의점

DataLoader 클래스

'Study > AI' 카테고리의 다른 글

'Study/AI'의 다른글

티스토리툴바

[Pytorch] 모듈과 데이터셋 + 데이터로더

torch.nn.Module

nn.Parameter

Backward

Backward from the scratch

Dataset 클래스

Dataset 클래스 생성시 유의점

DataLoader 클래스

'Study > AI' 카테고리의 다른 글

'Study/AI'의 다른글

관련글

티스토리툴바