深度学习(37)—— 图神经网络GNN(2)

深度学习(37)—— 图神经网络GNN(2)

这一期主要是一些简单示例,针对不同的情况,使用的数据都是torch_geometric的内置数据集

文章目录

  • 深度学习(37)—— 图神经网络GNN(2)
    • 1. 一个graph对节点分类
    • 2. 多个graph对图分类
    • 3.Cluster-GCN:当遇到数据很大的图

1. 一个graph对节点分类

from torch_geometric.datasets import Planetoid  # 下载数据集用的
from torch_geometric.transforms import NormalizeFeatures
from torch_geometric.nn import GCNConv
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
import torch
from torch.nn import Linear
import torch.nn.functional as F# 可视化部分
def visualize(h, color):z = TSNE(n_components=2).fit_transform(h.detach().cpu().numpy())plt.figure(figsize=(10, 10))plt.xticks([])plt.yticks([])plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")plt.show()# 加载数据
dataset = Planetoid(root='data/Planetoid', name='Cora', transform=NormalizeFeatures())  # transform预处理
print(f'Dataset: {dataset}:')
print('======================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')data = dataset[0]  # Get the first graph object.
print()
print(data)
print('===========================================================================================================')# Gather some statistics about the graph.
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Number of training nodes: {data.train_mask.sum()}')
print(f'Training node label rate: {int(data.train_mask.sum()) / data.num_nodes:.2f}')
print(f'Has isolated nodes: {data.has_isolated_nodes()}')
print(f'Has self-loops: {data.has_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')# 网络定义
class GCN(torch.nn.Module):def __init__(self, hidden_channels):super().__init__()torch.manual_seed(1234567)self.conv1 = GCNConv(dataset.num_features, hidden_channels)self.conv2 = GCNConv(hidden_channels, dataset.num_classes)def forward(self, x, edge_index):x = self.conv1(x, edge_index)x = x.relu()x = F.dropout(x, p=0.5, training=self.training)x = self.conv2(x, edge_index)return xmodel = GCN(hidden_channels=16)
print(model)# 训练模型
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()def train():model.train()optimizer.zero_grad()out = model(data.x, data.edge_index)loss = criterion(out[data.train_mask], data.y[data.train_mask])loss.backward()optimizer.step()return lossdef test():model.eval()out = model(data.x, data.edge_index)pred = out.argmax(dim=1)test_correct = pred[data.test_mask] == data.y[data.test_mask]test_acc = int(test_correct.sum()) / int(data.test_mask.sum())return test_accfor epoch in range(1, 101):loss = train()print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')
model.eval()
out = model(data.x, data.edge_index)
visualize(out, color=data.y)

2. 多个graph对图分类

  • 图也可以进行batch,做法和图像以及文本的batch是一样的
  • 和对一张图中的节点分类不同的是:多了聚合操作 将各个节点特征汇总成全局特征,将其作为整个图的编码
import torch
from torch_geometric.datasets import TUDataset  # 分子数据集:https://chrsmrrs.github.io/datasets/
from torch_geometric.loader import DataLoader
from torch.nn import Linear
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.nn import global_mean_pool# 加载数据
dataset = TUDataset(root='data/TUDataset', name='MUTAG')
print(f'Dataset: {dataset}:')
print('====================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')data = dataset[0]  # Get the first graph object.
print(data)
print('=============================================================')# Gather some statistics about the first graph.
# print(f'Number of nodes: {data.num_nodes}')
# print(f'Number of edges: {data.num_edges}')
# print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
# print(f'Has isolated nodes: {data.has_isolated_nodes()}')
# print(f'Has self-loops: {data.has_self_loops()}')
# print(f'Is undirected: {data.is_undirected()}')train_dataset = dataset
print(f'Number of training graphs: {len(train_dataset)}')# 数据用dataloader加载
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
for step, data in enumerate(train_loader):print(f'Step {step + 1}:')print('=======')print(f'Number of graphs in the current batch: {data.num_graphs}')print(data)print()# 模型定义
class GCN(torch.nn.Module):def __init__(self, hidden_channels):super(GCN, self).__init__()torch.manual_seed(12345)self.conv1 = GCNConv(dataset.num_node_features, hidden_channels)self.conv2 = GCNConv(hidden_channels, hidden_channels)self.conv3 = GCNConv(hidden_channels, hidden_channels)self.lin = Linear(hidden_channels, dataset.num_classes)def forward(self, x, edge_index, batch):# 1.对各节点进行编码x = self.conv1(x, edge_index)x = x.relu()x = self.conv2(x, edge_index)x = x.relu()x = self.conv3(x, edge_index)# 2. 平均操作x = global_mean_pool(x, batch)  # [batch_size, hidden_channels]# 3. 输出x = F.dropout(x, p=0.5, training=self.training)x = self.lin(x)return xmodel = GCN(hidden_channels=64)
print(model)# 训练
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
def train():model.train()for data in train_loader:  # Iterate in batches over the training dataset.out = model(data.x, data.edge_index, data.batch)  # Perform a single forward pass.loss = criterion(out, data.y)  # Compute the loss.loss.backward()  # Derive gradients.optimizer.step()  # Update parameters based on gradients.optimizer.zero_grad()  # Clear gradients.def test(loader):model.eval()correct = 0for data in loader:  # Iterate in batches over the training/test dataset.out = model(data.x, data.edge_index, data.batch)pred = out.argmax(dim=1)  # Use the class with highest probability.correct += int((pred == data.y).sum())  # Check against ground-truth labels.return correct / len(loader.dataset)  # Derive ratio of correct predictions.for epoch in range(1, 3):train()train_acc = test(train_loader)print(f'Epoch: {epoch:03d}, Train Acc: {train_acc:.4f}')

3.Cluster-GCN:当遇到数据很大的图

  • 传统的GCN,层数越多,计算越大
  • 针对每个cluster进行GCN计算之后更新,数据量会小很多

但是存在问题:如果将一个大图聚类成多个小图,最大的问题是如何丢失这些子图之间的连接关系?——在每个batch中随机将batch里随机n个子图连接起来再计算
在这里插入图片描述

  • 使用torch_geometric的内置方法

    • 首先使用cluster方法分区
    • 之后使用clusterloader构建batch

【即】分区后对每个区域进行batch的分配

# 遇到特别大的图该怎么办?
# 图中点和边的个数都非常大的时候会遇到什么问题呢?
# 当层数较多时,显存不够import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
from torch_geometric.loader import ClusterData, ClusterLoaderdataset = Planetoid(root='data/Planetoid', name='PubMed', transform=NormalizeFeatures())
print(f'Dataset: {dataset}:')
print('==================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')data = dataset[0]  # Get the first graph object.
print(data)
print('===============================================================================================================')# Gather some statistics about the graph.
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Number of training nodes: {data.train_mask.sum()}')
print(f'Training node label rate: {int(data.train_mask.sum()) / data.num_nodes:.3f}')
print(f'Has isolated nodes: {data.has_isolated_nodes()}')
print(f'Has self-loops: {data.has_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')# 数据分区构建batch,构建好batch,1个epoch中有4个batch
torch.manual_seed(12345)
cluster_data = ClusterData(data, num_parts=128)  # 1. 分区
train_loader = ClusterLoader(cluster_data, batch_size=32, shuffle=True)  # 2. 构建batch.total_num_nodes = 0
for step, sub_data in enumerate(train_loader):print(f'Step {step + 1}:')print('=======')print(f'Number of nodes in the current batch: {sub_data.num_nodes}')print(sub_data)print()total_num_nodes += sub_data.num_nodes
print(f'Iterated over {total_num_nodes} of {data.num_nodes} nodes!')# 模型定义
class GCN(torch.nn.Module):def __init__(self, hidden_channels):super(GCN, self).__init__()torch.manual_seed(12345)self.conv1 = GCNConv(dataset.num_node_features, hidden_channels)self.conv2 = GCNConv(hidden_channels, dataset.num_classes)def forward(self, x, edge_index):x = self.conv1(x, edge_index)x = x.relu()x = F.dropout(x, p=0.5, training=self.training)x = self.conv2(x, edge_index)return xmodel = GCN(hidden_channels=16)
print(model)# 训练模型
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()def train():model.train()for sub_data in train_loader:out = model(sub_data.x, sub_data.edge_index)loss = criterion(out[sub_data.train_mask], sub_data.y[sub_data.train_mask])loss.backward()optimizer.step()optimizer.zero_grad()def test():model.eval()out = model(data.x, data.edge_index)pred = out.argmax(dim=1)accs = []for mask in [data.train_mask, data.val_mask, data.test_mask]:correct = pred[mask] == data.y[mask]accs.append(int(correct.sum()) / int(mask.sum()))return accsfor epoch in range(1, 51):loss = train()train_acc, val_acc, test_acc = test()print(f'Epoch: {epoch:03d}, Train: {train_acc:.4f}, Val Acc: {val_acc:.4f}, Test Acc: {test_acc:.4f}')

这个还是很基础的一些,下一篇会说如何定义自己的数据。还有进阶版的案例。
所有项目代码已经放在github上了,欢迎造访

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/63647.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

volte端到端问题分析(一)

1、MME专载保持功能验证 **描述:**当无线环境较差时,有可能由于“Radio_Connection_with_UE_Lost” 原因造成的VoLTE通话掉话,如果UE发生RRC重建成功,手机将不会掉话。 对MME1202进行功能验证:开启后,MME专…

在时间和频率域中准确地测量太阳黑子活动及使用信号处理工具箱(TM)生成广泛的波形,如正弦波、方波等研究(Matlab代码实现)

💥💥💞💞欢迎来到本博客❤️❤️💥💥 🏆博主优势:🌞🌞🌞博客内容尽量做到思维缜密,逻辑清晰,为了方便读者。 ⛳️座右铭&a…

喜报!诚恒科技与赛时达科技达成BI金蝶云星空项目合作

随着全球数字化浪潮轰轰烈烈袭来,仅仅凭借手工处理的方式难以在庞大的数据海洋中精准获取信息、把握市场需求、了解目标用户,为企业创新提供强有力的支持。深圳赛时达科技有限公司(简称赛时达科技)希望通过数字化转型实现从手工处…

Blazor 调试控制台

文章目录 设置 设置 Blazor项目启动之后,有好几种项目设置,我其实想要这一种控制台 直接Console.log就行了 public void LoginBtn(){Console.WriteLine("登录");//navigationManager.NavigateTo("/index");}

Unity-Linux部署WebGL项目MIME类型添加

在以往的文章中有提到过使用IIS部署WebGL添加MIME类型使WebGL项目在浏览器中能够正常加载,那么如果咱们做的是商业项目,往往是需要部署在学校或者云服务器上面的,大部分情况下如果项目有接口或者后台管理系统,后台基本都会使用Lin…

数据结构(一):顺序表详解

在正式介绍顺序表之前,我们有必要先了解一个名词:线性表。 线性表: 线性表是,具有n个相同特性的数据元素的有限序列。常见的线性表:顺序表、链表、栈、队列、数组、字符串... 线性表在逻辑上是线性结构,但…

2021年09月 C/C++(一级)真题解析#中国电子学会#全国青少年软件编程等级考试

第1题:数字判断 输入一个字符,如何输入的字符是数字,输出yes,否则输出no 输入 一个字符 输出 如何输入的字符是数字,输出yes,否则输出no 样例1输入 样例1输入 5 样例1输出 yes 样例2输入 A 样例2输出 no 下面是一个使用C语言编写的数字判断程序的示例代码,根据输入的字符…

(5)所有角色数据分析页面的构建-5

所有角色数据分析页面,包括一个时间轴柱状图、六个散点图、六个柱状图(每个属性角色的生命值/防御力/攻击力的max与min的对比)。 """绘图""" from pyecharts.charts import Timeline from find_type import FindType import pandas …

Visual Studio 2022安装教程(英文版)

文章目录 1.下载安装 1.下载 官网地址:https://visualstudio.microsoft.com/zh-hans/vs/ 选择第一个社区版本:Community 2022 安装 1.将下载好的文件保存到桌面,双击点开 2.等待visual studio installer配置好 3.点击安装后会来到配件选…

uniapp开发公众号,微信开发者工具进行本地调试

每次修改完内容都需要发行之后,再查看效果,很麻烦 !!! 下述方法,可以一边在uniapp中修改内容,一边在微信开发者工具进行本地调试 修改hosts文件 在最后边添加如下内容 修改前端开发服务端口 …

IL汇编 ldarg 指令学习

IL汇编代码, .assembly extern mscorlib {} .assembly MathLib {.ver 1 : 0 : 1 : 0 }.module MathLib.dll.namespace MyMath { .class public ansi auto MathClass extends [mscorlib]System.Object{ .method public int32 GetSquare(int32) c…

vue实现pdf预览功能

背景&#xff1a;材料上传之后点击预览实现在浏览器上预览的效果 效果如下&#xff1a; 实现代码如下&#xff1a; //预览和下载操作 <el-table-column fixed"right" label"操作" width"210"><template #default"scope">…