第95步 深度学习图像目标检测:Faster R-CNN建模

基于WIN10的64位系统演示

一、写在前面

本期开始,我们学习深度学习图像目标检测系列。

深度学习图像目标检测是计算机视觉领域的一个重要子领域,它的核心目标是利用深度学习模型来识别并定位图像中的特定目标。这些目标可以是物体、人、动物或其他可识别的实体。与传统的图像分类任务不同,目标检测不仅要确定图像中存在哪些类别的目标,还要确定它们的确切位置和尺寸。这通常是通过在图像上绘制一个或多个边界框来实现的,这些边界框精确地标出了目标的位置和范围。

二、Faster R-CNN简介

Faster R-CNN 是一种流行的深度学习图像目标检测算法,由 Shaoqing Ren, Kaiming He, Ross Girshick 和 Jian Sun 在 2015 年提出。它是 R-CNN 系列模型中的一个重要里程碑,因为它提高了检测速度,同时保持了高精度。以下是 Faster R-CNN 的主要特点和组件:

(1)区域提议网络 (RPN):

Faster R-CNN 的核心创新是引入了一个叫做区域提议网络 (RPN) 的组件。RPN 能够在卷积特征图上直接生成目标的边界框提议,这大大减少了提议的计算时间。RPN 使用了一组固定大小和比例的锚框(anchors),对每一个锚框预测偏移量和目标存在的概率。

(2)共享卷积特征:

与其前任 Fast R-CNN 不同,Faster R-CNN 的 RPN 和最终的目标检测都共享相同的卷积特征。这意味着图像只需要进行一次前向传播,从而大大提高了计算效率。

(3)ROI Pooling:

一旦得到了区域提议,Faster R-CNN 使用 ROI (Region of Interest) Pooling 技术来从每个提议中提取固定大小的特征。这确保无论提议的大小如何,都可以输入到一个固定大小的全连接网络中进行分类和边界框回归。

(4)双任务损失:

RPN 被训练为一个双任务问题:分类(目标 vs. 非目标)和边界框回归。这种双任务损失结构确保了 RPN 在生成提议时既考虑了准确性也考虑了定位。

总之,Faster R-CNN 通过引入区域提议网络和共享卷积特征,大大提高了目标检测的速度和精度,为后续的研究和应用打下了坚实的基础。

三、数据源

来源于公共数据,文件设置如下:

大概的任务就是:用一个框框标记出MTB的位置。

四、Faster R-CNN实战

直接上代码:

import os
import random
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
from PIL import Image
from torch.utils.data import DataLoader
import xml.etree.ElementTree as ET
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
import matplotlib.pyplot as plt
from torchvision import transforms
import albumentations as A
from albumentations.pytorch import ToTensorV2
import numpy as np# Function to parse XML annotations
def parse_xml(xml_path):tree = ET.parse(xml_path)root = tree.getroot()boxes = []for obj in root.findall("object"):bndbox = obj.find("bndbox")xmin = int(bndbox.find("xmin").text)ymin = int(bndbox.find("ymin").text)xmax = int(bndbox.find("xmax").text)ymax = int(bndbox.find("ymax").text)# Check if the bounding box is validif xmin < xmax and ymin < ymax:boxes.append((xmin, ymin, xmax, ymax))else:print(f"Warning: Ignored invalid box in {xml_path} - ({xmin}, {ymin}, {xmax}, {ymax})")return boxes# Function to split data into training and validation sets
def split_data(image_dir, split_ratio=0.8):all_images = [f for f in os.listdir(image_dir) if f.endswith(".jpg")]random.shuffle(all_images)split_idx = int(len(all_images) * split_ratio)train_images = all_images[:split_idx]val_images = all_images[split_idx:]return train_images, val_images# Dataset class for the Tuberculosis dataset
class TuberculosisDataset(torch.utils.data.Dataset):def __init__(self, image_dir, annotation_dir, image_list, transform=None):self.image_dir = image_dirself.annotation_dir = annotation_dirself.image_list = image_listself.transform = transformdef __len__(self):return len(self.image_list)def __getitem__(self, idx):image_path = os.path.join(self.image_dir, self.image_list[idx])image = Image.open(image_path).convert("RGB")xml_path = os.path.join(self.annotation_dir, self.image_list[idx].replace(".jpg", ".xml"))boxes = parse_xml(xml_path)# Check for empty bounding boxes and return Noneif len(boxes) == 0:return Noneboxes = torch.as_tensor(boxes, dtype=torch.float32)labels = torch.ones((len(boxes),), dtype=torch.int64)iscrowd = torch.zeros((len(boxes),), dtype=torch.int64)target = {}target["boxes"] = boxestarget["labels"] = labelstarget["image_id"] = torch.tensor([idx])target["iscrowd"] = iscrowd# Apply transformationsif self.transform:image = self.transform(image)return image, target# Define the transformations using torchvision
data_transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),  # Convert PIL image to tensortorchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize the images
])# Adjusting the DataLoader collate function to handle None values
def collate_fn(batch):batch = list(filter(lambda x: x is not None, batch))return tuple(zip(*batch))# Function to get the Mask R-CNN model
def get_model(num_classes):model = fasterrcnn_resnet50_fpn(pretrained=True)in_features = model.roi_heads.box_predictor.cls_score.in_featuresmodel.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)return model# Function to save the model
def save_model(model, path="mmaskrcnn_mtb.pth", save_full_model=False):if save_full_model:torch.save(model, path)else:torch.save(model.state_dict(), path)print(f"Model saved to {path}")# Function to compute Intersection over Union
def compute_iou(boxA, boxB):xA = max(boxA[0], boxB[0])yA = max(boxA[1], boxB[1])xB = min(boxA[2], boxB[2])yB = min(boxA[3], boxB[3])interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)iou = interArea / float(boxAArea + boxBArea - interArea)return iou# Adjusting the DataLoader collate function to handle None values and entirely empty batches
def collate_fn(batch):batch = list(filter(lambda x: x is not None, batch))if len(batch) == 0:# Return placeholder batch if entirely emptyreturn [torch.zeros(1, 3, 224, 224)], [{}]return tuple(zip(*batch))#Training function with modifications for collecting IoU and loss
def train_model(model, train_loader, optimizer, device, num_epochs=10):model.train()model.to(device)loss_values = []iou_values = []for epoch in range(num_epochs):epoch_loss = 0.0total_ious = 0num_boxes = 0for images, targets in train_loader:# Skip batches with placeholder dataif len(targets) == 1 and not targets[0]:continue# Skip batches with empty targetsif any(len(target["boxes"]) == 0 for target in targets):continueimages = [image.to(device) for image in images]targets = [{k: v.to(device) for k, v in t.items()} for t in targets]loss_dict = model(images, targets)losses = sum(loss for loss in loss_dict.values())optimizer.zero_grad()losses.backward()optimizer.step()epoch_loss += losses.item()# Compute IoU for evaluationwith torch.no_grad():model.eval()predictions = model(images)for i, prediction in enumerate(predictions):pred_boxes = prediction["boxes"].cpu().numpy()true_boxes = targets[i]["boxes"].cpu().numpy()for pred_box in pred_boxes:for true_box in true_boxes:iou = compute_iou(pred_box, true_box)total_ious += iounum_boxes += 1model.train()avg_loss = epoch_loss / len(train_loader)avg_iou = total_ious / num_boxesloss_values.append(avg_loss)iou_values.append(avg_iou)print(f"Epoch {epoch+1}/{num_epochs} Loss: {avg_loss} Avg IoU: {avg_iou}")# Plotting loss and IoU valuesplt.figure(figsize=(12, 5))plt.subplot(1, 2, 1)plt.plot(loss_values, label="Training Loss")plt.title("Training Loss across Epochs")plt.xlabel("Epochs")plt.ylabel("Loss")plt.subplot(1, 2, 2)plt.plot(iou_values, label="IoU")plt.title("IoU across Epochs")plt.xlabel("Epochs")plt.ylabel("IoU")plt.show()# Save model after trainingsave_model(model)# Validation function
def validate_model(model, val_loader, device):model.eval()model.to(device)with torch.no_grad():for images, targets in val_loader:images = [image.to(device) for image in images]targets = [{k: v.to(device) for k, v in t.items()} for t in targets]model(images)# Paths to your data
image_dir = "tuberculosis-phonecamera"
annotation_dir = "tuberculosis-phonecamera"# Split data
train_images, val_images = split_data(image_dir)# Create datasets and dataloaders
train_dataset = TuberculosisDataset(image_dir, annotation_dir, train_images, transform=data_transform)
val_dataset = TuberculosisDataset(image_dir, annotation_dir, val_images, transform=data_transform)# Updated DataLoader with new collate function
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, collate_fn=collate_fn)
val_loader = DataLoader(val_dataset, batch_size=4, shuffle=False, collate_fn=collate_fn)# Model and optimizer
model = get_model(2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)# Train and validate
train_model(model, train_loader, optimizer, device="cuda", num_epochs=100)
validate_model(model, val_loader, device="cuda")#######################################Print Metrics######################################
def calculate_metrics(predictions, ground_truths, iou_threshold=0.5):TP = 0  # True PositivesFP = 0  # False PositivesFN = 0  # False Negativestotal_iou = 0  # to calculate mean IoUfor pred, gt in zip(predictions, ground_truths):pred_boxes = pred["boxes"].cpu().numpy()gt_boxes = gt["boxes"].cpu().numpy()# Match predicted boxes to ground truth boxesfor pred_box in pred_boxes:max_iou = 0matched = Falsefor gt_box in gt_boxes:iou = compute_iou(pred_box, gt_box)if iou > max_iou:max_iou = iouif iou > iou_threshold:matched = Truetotal_iou += max_iouif matched:TP += 1else:FP += 1FN += len(gt_boxes) - TPprecision = TP / (TP + FP) if (TP + FP) != 0 else 0recall = TP / (TP + FN) if (TP + FN) != 0 else 0f1_score = (2 * precision * recall) / (precision + recall) if (precision + recall) != 0 else 0mean_iou = total_iou / (TP + FP)return precision, recall, f1_score, mean_ioudef evaluate_model(model, dataloader, device):model.eval()model.to(device)all_predictions = []all_ground_truths = []with torch.no_grad():for images, targets in dataloader:images = [image.to(device) for image in images]predictions = model(images)all_predictions.extend(predictions)all_ground_truths.extend(targets)precision, recall, f1_score, mean_iou = calculate_metrics(all_predictions, all_ground_truths)return precision, recall, f1_score, mean_ioutrain_precision, train_recall, train_f1, train_iou = evaluate_model(model, train_loader, "cuda")
val_precision, val_recall, val_f1, val_iou = evaluate_model(model, val_loader, "cuda")print("Training Set Metrics:")
print(f"Precision: {train_precision:.4f}, Recall: {train_recall:.4f}, F1 Score: {train_f1:.4f}, Mean IoU: {train_iou:.4f}")print("\nValidation Set Metrics:")
print(f"Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1 Score: {val_f1:.4f}, Mean IoU: {val_iou:.4f}")#sheet
header = "| Metric    | Training Set | Validation Set |"
divider = "+----------+--------------+----------------+"train_metrics = f"| Precision | {train_precision:.4f}      | {val_precision:.4f}          |"
recall_metrics = f"| Recall    | {train_recall:.4f}      | {val_recall:.4f}          |"
f1_metrics = f"| F1 Score  | {train_f1:.4f}      | {val_f1:.4f}          |"
iou_metrics = f"| Mean IoU  | {train_iou:.4f}      | {val_iou:.4f}          |"print(header)
print(divider)
print(train_metrics)
print(recall_metrics)
print(f1_metrics)
print(iou_metrics)
print(divider)#######################################Train Set######################################
import numpy as np
import matplotlib.pyplot as pltdef plot_predictions_on_image(model, dataset, device, title):# Select a random image from the datasetidx = np.random.randint(50, len(dataset))image, target = dataset[idx]img_tensor = image.clone().detach().to(device).unsqueeze(0)# Use the model to make predictionsmodel.eval()with torch.no_grad():prediction = model(img_tensor)# Inverse normalization for visualizationinv_normalize = transforms.Normalize(mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],std=[1/0.229, 1/0.224, 1/0.225])image = inv_normalize(image)image = torch.clamp(image, 0, 1)image = F.to_pil_image(image)# Plot the image with ground truth boxesplt.figure(figsize=(10, 6))plt.title(title + " with Ground Truth Boxes")plt.imshow(image)ax = plt.gca()# Draw the ground truth boxes in bluefor box in target["boxes"]:rect = plt.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1],fill=False, color='blue', linewidth=2)ax.add_patch(rect)plt.show()# Plot the image with predicted boxesplt.figure(figsize=(10, 6))plt.title(title + " with Predicted Boxes")plt.imshow(image)ax = plt.gca()# Draw the predicted boxes in redfor box in prediction[0]["boxes"].cpu():rect = plt.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1],fill=False, color='red', linewidth=2)ax.add_patch(rect)plt.show()# Call the function for a random image from the train dataset
plot_predictions_on_image(model, train_dataset, "cuda", "Selected from Training Set")#######################################Val Set####################################### Call the function for a random image from the validation dataset
plot_predictions_on_image(model, val_dataset, "cuda", "Selected from Validation Set")

不解读了,给出GPT的咒语参考:

咒语:我有一批数据,存在“tuberculosis-phonecamera”文件夹中,包括两部分:

一部分是MTB的痰涂片抗酸染色图片,为jpg格式,命名为“tuberculosis-phone-0001.jpg”、“tuberculosis-phone-0002.jpg”等;

一部分是MTB的痰涂片抗酸染色图片对应的注释文件,主要内容是标注MTB的痰涂片抗酸染色图片中MTB的具体位置,是若干个红色框,为XML格式,命名为“tuberculosis-phone-0001.xml”、“tuberculosis-phone-0002.xml”等,我上传一个xml文件给你做例子;

我需要基于上面的数据,使用pytorch建立一个Mask R-CNN目标识别模型,去识别MTB的痰涂片抗酸染色图片中的MTB,并使用红色框标注出来。数据需要随机分为训练集(80%)和验证集(20%)。

看看结果:

(1)loss曲线图:

(2)性能指标:

(3)训练的图片测试结果:

(4)验证集的图片测试结果:

五、写在后面

直接使用预训练模型,而且模型并没有调参。但是训练集的准确率还是挺高的,验证集就差点意思了。需要更高的性能,还得认真研究如何调参。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/209309.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【黑马甄选离线数仓day01_项目介绍与环境准备】

1. 行业背景 1.1 电商发展历史 电商1.0: 初创阶段20世纪90年代&#xff0c;电商行业刚刚兴起&#xff0c;主要以B2C模式为主&#xff0c;如亚马逊、eBay等 ​ 电商2.0: 发展阶段21世纪初&#xff0c;电商行业进入了快速发展阶段&#xff0c;出现了淘宝、京东等大型电商平台&a…

LangChain 7 文本模型TextLangChain和聊天模型ChatLangChain

LangChain系列文章 LangChain 实现给动物取名字&#xff0c;LangChain 2模块化prompt template并用streamlit生成网站 实现给动物取名字LangChain 3使用Agent访问Wikipedia和llm-math计算狗的平均年龄LangChain 4用向量数据库Faiss存储&#xff0c;读取YouTube的视频文本搜索I…

linux下流媒体压力测试工具的使用

前言 因为领导要求做linux的推拉流时服务器压力测试&#xff0c;于是在网上找了找。一顿操作下来&#xff0c;发现很多软件盗用一款名为srs-bench的开源软件。 该代码仓库有详细的使用说明&#xff0c;而且可以在issues中找到可能会遇到的问题的解决办法 需要下载该仓库的源…

SpringBoot集成七牛云OSS详细介绍

&#x1f4d1;前言 本文主要SpringBoot集成七牛云OSS详细介绍的文章&#xff0c;如果有什么需要改进的地方还请大佬指出⛺️ &#x1f3ac;作者简介&#xff1a;大家好&#xff0c;我是青衿&#x1f947; ☁️博客首页&#xff1a;CSDN主页放风讲故事 &#x1f304;每日一句&a…

allegro画封装时使用坐标指令无效

使用坐标指令时显示&#xff1a;“Pick is outside the extent of the drawing…pick again” 这是因为你放的引脚已经超出你这个绘制界面的定义尺寸&#xff0c;需要到Setup->Design pararmeters…里面去将图幅改大一点&#xff0c;如下图所示&#xff1a; 然后点击Design…

gRPC 服务注册实现以及原理讲解

一、注册中心 为了高可用&#xff0c;生产环境我们的服务都是以集群的方式对外提供服务&#xff0c;集群的ip随时可能变化&#xff0c;比如重启&#xff0c;发布&#xff0c;扩容等。我们也就需要用一本 “通讯录”去保存和更新服务的节点&#xff0c;而这个通讯录就是注册中心…

使用Python的turtle模块创建一幅哆啦A梦

1.1引言&#xff1a; 在Python中&#xff0c;turtle模块是一个非常有趣且强大的工具&#xff0c;它允许我们以一个可视化和互动的方式学习编程。通过调用各种命令&#xff0c;我们可以引导turtle画出一个指定的图形。在本博客中&#xff0c;我们将使用turtle模块来绘制一幅哆啦…

气候变化和人类活动对中国植被固碳的贡献量化数据月度合成产品

简介&#xff1a; 气候变化和人类活动对中国植被固碳的贡献量化数据月度合成产品包括中国2001~2018年地表短波波段反照率、植被光合有效辐射吸收比、叶面积指数、森林覆盖度和非森林植被覆盖度、地表温度、地表净辐射、地表蒸散发、地上部分自养呼吸、地下部分自养呼吸、总初级…

ts实现合并数组对象中key相同的数据

背景 在平常的业务中&#xff0c;后端同学会返回以下类似的结构数据 // 后端返回的数据结构 [{ id: 1, product_id: 1, pid_name: "Asia", name: "HKG01" },{ id: 2, product_id: 1, pid_name: "Asia", name: "SH01" },{ id: 3, pro…

2023亚太杯数学建模C题思路代码 - 我国新能源电动汽车的发展趋势

1 赛题 问题C 我国新能源电动汽车的发展趋势 新能源汽车是指以先进技术原理、新技术、新结构的非常规汽车燃料为动力来源( 非常规汽车燃料指汽油、柴油以外的燃料&#xff09;&#xff0c;将先进技术进行汽车动力控制和驱动相结 合的汽车。新能源汽车主要包括四种类型&#x…

华大基因认知障碍基因检测服务,助力认知障碍疾病防控

认知障碍是一种严重的神经系统疾病&#xff0c;对人类的脑健康产生了重大影响。据报告显示&#xff0c;在我国65岁以上的人群中&#xff0c;存在轻度认知障碍的患者约为3,800万&#xff0c;而中重度痴呆患者则约为1,500万&#xff0c;患病人口数量庞大。这种疾病不仅会对患者的…

电线电缆行业生产管理怎么数字化?

行业介绍 随着市场环境的变化和现代生产管理理念的不断更新&#xff0c;电缆的生产模式也在发生转变&#xff0c;批量小&#xff0c;规格多&#xff0c;交期短的新型制造需求逐年上升&#xff0c;所以企业车间管理的重要性越发凸显&#xff0c;作为企业良性运营的关键&#xf…