【6D位姿估计】FoundationPose 跑通demo 训练记录

前言

本文记录在FoundationPose中,跑通基于CAD模型为输入的demo,输出位姿信息,可视化结果。

然后分享NeRF物体重建部分的训练,以及RGBD图为输入的demo。

1、搭建环境

方案1:基于docker镜像(推荐)

首先下载开源代码:https://github.com/NVlabs/FoundationPose

然后执行下面命令,拉取镜像,并构建镜像环境

cd docker/
docker pull wenbowen123/foundationpose && docker tag wenbowen123/foundationpose foundationpose
bash docker/run_container.sh
bash build_all.sh

构建完成后,可以用docker exec 进入镜像容器中。

方案2:基于Conda(比较麻烦)

首先安装 Eigen3

cd $HOME && wget -q https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz && \
tar -xzf eigen-3.4.0.tar.gz && \
cd eigen-3.4.0 && mkdir build && cd build
cmake .. -Wno-dev -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-std=c++14 ..
sudo make install
cd $HOME && rm -rf eigen-3.4.0 eigen-3.4.0.tar.gz

然后参考下面命令,创建conda环境

# create conda environment
create -n foundationpose python=3.9# activate conda environment
conda activate foundationpose# install dependencies
python -m pip install -r requirements.txt# Install NVDiffRast
python -m pip install --quiet --no-cache-dir git+https://github.com/NVlabs/nvdiffrast.git# Kaolin (Optional, needed if running model-free setup)
python -m pip install --quiet --no-cache-dir kaolin==0.15.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.0.0_cu118.html# PyTorch3D
python -m pip install --quiet --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt200/download.html# Build extensions
CMAKE_PREFIX_PATH=$CONDA_PREFIX/lib/python3.9/site-packages/pybind11/share/cmake/pybind11 bash build_all_conda.sh

2、基于CAD模型为输入的demo

首先,下载模型权重,里面包括两个文件夹;点击下载(模型权重)

在工程目录中创建weights/,将下面两个文件夹放到里面。

然后,下载测试数据,里面包括两个压缩文件;点击下载(demo数据)

在工程目录中创建demo_data/,加压文件,将下面两个文件放到里面。

 运行 run_demo.py,实现CAD模型为输入的demo

python run_demo.py --debug 2

如果是服务器运行,没有可视化的,需要注释两行代码:

    if debug>=1:
      center_pose = pose@np.linalg.inv(to_origin)
      vis = draw_posed_3d_box(reader.K, img=color, ob_in_cam=center_pose, bbox=bbox)
      vis = draw_xyz_axis(color, ob_in_cam=center_pose, scale=0.1, K=reader.K, thickness=3, transparency=0, is_input_rgb=True)

      # cv2.imshow('1', vis[...,::-1])
      # cv2.waitKey(1)

然后看到demo_data/mustard0/,里面生成了ob_in_cam、track_vis文件夹

ob_in_cam 是位姿估计的结果,用txt文件存储,示例文件:

6.073544621467590332e-01 -2.560715079307556152e-01 7.520291209220886230e-01 -4.481770694255828857e-01
-7.755840420722961426e-01 -3.960975110530853271e-01 4.915038347244262695e-01 1.187708452343940735e-01
1.720167100429534912e-01 -8.817789554595947266e-01 -4.391765296459197998e-01 8.016449213027954102e-01
0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00

track_vis 是可视化结果,能看到多张图片:

3、NeRF物体重建训练

下载训练数据,Linemod和YCB-V两个公开数据集的示例:

点击下载(RGBD参考数据)

示例1:训练Linemod数据集

修改代码bundlesdf/run_nerf.py,修改为use_refined_mask=False,即98行:

mesh = run_one_ob(base_dir=base_dir, cfg=cfg, use_refined_mask=False)

 然后执行命令:

python bundlesdf/run_nerf.py --ref_view_dir /DATASET/lm_ref_views --dataset linemod

 如果是服务器运行,没有可视化的,需要安装xvfb

sudo apt-get update
sudo apt-get install -y xvfb

 然后执行命令:

xvfb-run -s "-screen 0 1024x768x24" python bundlesdf/run_nerf.py --ref_view_dir model_free_ref_views/lm_ref_views --dataset linemod

因为训练NeRF需要渲染的,使用xvfb进行模拟。

能看到打印信息:

bundlesdf/run_nerf.py:61: DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning disappear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.rgb = imageio.imread(color_file)
[compute_scene_bounds()] compute_scene_bounds_worker start
[compute_scene_bounds()] compute_scene_bounds_worker done
[compute_scene_bounds()] merge pcd
[compute_scene_bounds()] compute_translation_scales done
translation_cvcam=[0.00024226 0.00356217 0.00056694], sc_factor=19.274929219577043
[build_octree()] Octree voxel dilate_radius:1
[__init__()] level:0, vox_pts:torch.Size([1, 3]), corner_pts:torch.Size([8, 3])
[__init__()] level:1, vox_pts:torch.Size([8, 3]), corner_pts:torch.Size([27, 3])
[__init__()] level:2, vox_pts:torch.Size([64, 3]), corner_pts:torch.Size([125, 3])
[draw()] level:2
[draw()] level:2
level 0, resolution: 32
level 1, resolution: 37
level 2, resolution: 43
level 3, resolution: 49
level 4, resolution: 56
level 5, resolution: 64
level 6, resolution: 74
level 7, resolution: 85
level 8, resolution: 98
level 9, resolution: 112
level 10, resolution: 128
level 11, resolution: 148
level 12, resolution: 169
level 13, resolution: 195
level 14, resolution: 223
level 15, resolution: 256
GridEncoder: input_dim=3 n_levels=16 level_dim=2 resolution=32 -> 256 per_level_scale=1.1487 params=(26463840, 2) gridtype=hash align_corners=False
sc_factor 19.274929219577043
translation [0.00024226 0.00356217 0.00056694]
[__init__()] denoise cloud
[__init__()] Denoising rays based on octree cloud
[__init__()] bad_mask#=3
rays torch.Size([128387, 12])
[train()] train progress 0/1001
[train_loop()] Iter: 0, valid_samples: 524161/524288, valid_rays: 2048/2048, loss: 309.0942383, rgb_loss: 0.0216732, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 301.6735840, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 7.2143111, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.1152707,[train()] train progress 100/1001
[train()] train progress 200/1001
[train()] train progress 300/1001
[train()] train progress 400/1001
[train()] train progress 500/1001
Saved checkpoints at model_free_ref_views/lm_ref_views/ob_0000001/nerf/model_latest.pth
[train_loop()] Iter: 500, valid_samples: 518554/524288, valid_rays: 2026/2048, loss: 1.0530750, rgb_loss: 0.0009063, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.2142579, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.8360301, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.0008409,[extract_mesh()] query_pts:torch.Size([42875, 3]), valid:42875
[extract_mesh()] Running Marching Cubes
[extract_mesh()] done V:(4536, 3), F:(8986, 3)
[train()] train progress 600/1001
[train()] train progress 700/1001
[train()] train progress 800/1001
[train()] train progress 900/1001
[train()] train progress 1000/1001
Saved checkpoints at model_free_ref_views/lm_ref_views/ob_0000001/nerf/model_latest.pth
[train_loop()] Iter: 1000, valid_samples: 519351/524288, valid_rays: 2029/2048, loss: 0.4827633, rgb_loss: 0.0006563, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.0935674, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.3876466, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.0001022,[extract_mesh()] query_pts:torch.Size([42875, 3]), valid:42875
[extract_mesh()] Running Marching Cubes
[extract_mesh()] done V:(5265, 3), F:(10328, 3)
[extract_mesh()] query_pts:torch.Size([42875, 3]), valid:42875
[extract_mesh()] Running Marching Cubes
[extract_mesh()] done V:(5265, 3), F:(10328, 3)
[<module>()] OpenGL_accelerate module loaded
[<module>()] Using accelerated ArrayDatatype
[mesh_texture_from_train_images()] Texture: Texture map computation
project train_images 0/16
project train_images 1/16
project train_images 2/16
project train_images 3/16
project train_images 4/16
project train_images 5/16
project train_images 6/16
project train_images 7/16
project train_images 8/16
project train_images 9/16
project train_images 10/16
project train_images 11/16
project train_images 12/16
project train_images 13/16
project train_images 14/16
project train_images 15/16

重点留意,损失的变化:

[train()] train progress 0/1001
[train_loop()] Iter: 0, valid_samples: 524161/524288, valid_rays: 2048/2048, loss: 309.0942383, rgb_loss: 0.0216732, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 301.6735840, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 7.2143111, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.1152707,

[train()] train progress 100/1001
[train()] train progress 200/1001
[train()] train progress 300/1001
[train()] train progress 400/1001
[train()] train progress 500/1001
Saved checkpoints at model_free_ref_views/lm_ref_views/ob_0000001/nerf/model_latest.pth
[train_loop()] Iter: 500, valid_samples: 518554/524288, valid_rays: 2026/2048, loss: 1.0530750, rgb_loss: 0.0009063, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.2142579, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.8360301, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.0008409,

默认训练1000轮,训练也挺快的。

在lm_ref_views/ob_0000001/中生成了nerf文件夹,存放下面文件:

在lm_ref_views/ob_0000001/model中,生成了的model.obj,后续模型推理或demo直接使用它。

示例2:训练YCB-V数据集

python bundlesdf/run_nerf.py --ref_view_dir /DATASET/ycbv/ref_views_16 --dataset ycbv

如果是服务器运行,没有可视化的,需要安装xvfb

sudo apt-get update
sudo apt-get install -y xvfb

 然后执行命令:

xvfb-run -s "-screen 0 1024x768x24" python bundlesdf/run_nerf.py --ref_view_dir /DATASET/ycbv/ref_views_16 --dataset ycbv

因为训练NeRF需要渲染的,使用xvfb进行模拟。

4、RGBD图输入demo

这里以Linemod数据集为示例,首先下面测试数据集

点击下载(测试数据)

然后加压文件,存放路径:FoundationPose-main/model_free_ref_views/lm_test_all

官方代码有问题,需要替换两个代码:datareader.py、run_linemod.py

run_linemod.py

# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.from Utils import *
import json,uuid,joblib,os,sys
import scipy.spatial as spatial
from multiprocessing import Pool
import multiprocessing
from functools import partial
from itertools import repeat
import itertools
from datareader import *
from estimater import *
code_dir = os.path.dirname(os.path.realpath(__file__))
sys.path.append(f'{code_dir}/mycpp/build')
import yaml
import redef get_mask(reader, i_frame, ob_id, detect_type):if detect_type=='box':mask = reader.get_mask(i_frame, ob_id)H,W = mask.shape[:2]vs,us = np.where(mask>0)umin = us.min()umax = us.max()vmin = vs.min()vmax = vs.max()valid = np.zeros((H,W), dtype=bool)valid[vmin:vmax,umin:umax] = 1elif detect_type=='mask':mask = reader.get_mask(i_frame, ob_id)if mask is None:return Nonevalid = mask>0elif detect_type=='detected':mask = cv2.imread(reader.color_files[i_frame].replace('rgb','mask_cosypose'), -1)valid = mask==ob_idelse:raise RuntimeErrorreturn validdef run_pose_estimation_worker(reader, i_frames, est:FoundationPose=None, debug=0, ob_id=None, device='cuda:0'):torch.cuda.set_device(device)est.to_device(device)est.glctx = dr.RasterizeCudaContext(device=device)result = NestDict()for i, i_frame in enumerate(i_frames):logging.info(f"{i}/{len(i_frames)}, i_frame:{i_frame}, ob_id:{ob_id}")print("\n### ", f"{i}/{len(i_frames)}, i_frame:{i_frame}, ob_id:{ob_id}")video_id = reader.get_video_id()color = reader.get_color(i_frame)depth = reader.get_depth(i_frame)id_str = reader.id_strs[i_frame]H,W = color.shape[:2]debug_dir =est.debug_dirob_mask = get_mask(reader, i_frame, ob_id, detect_type=detect_type)if ob_mask is None:logging.info("ob_mask not found, skip")result[video_id][id_str][ob_id] = np.eye(4)return resultest.gt_pose = reader.get_gt_pose(i_frame, ob_id)pose = est.register(K=reader.K, rgb=color, depth=depth, ob_mask=ob_mask, ob_id=ob_id)logging.info(f"pose:\n{pose}")if debug>=3:m = est.mesh_ori.copy()tmp = m.copy()tmp.apply_transform(pose)tmp.export(f'{debug_dir}/model_tf.obj')result[video_id][id_str][ob_id] = posereturn result, posedef run_pose_estimation():wp.force_load(device='cuda')reader_tmp = LinemodReader(opt.linemod_dir, split=None)print("## opt.linemod_dir:", opt.linemod_dir)debug = opt.debuguse_reconstructed_mesh = opt.use_reconstructed_meshdebug_dir = opt.debug_dirres = NestDict()glctx = dr.RasterizeCudaContext()mesh_tmp = trimesh.primitives.Box(extents=np.ones((3)), transform=np.eye(4)).to_mesh()est = FoundationPose(model_pts=mesh_tmp.vertices.copy(), model_normals=mesh_tmp.vertex_normals.copy(), symmetry_tfs=None, mesh=mesh_tmp, scorer=None, refiner=None, glctx=glctx, debug_dir=debug_dir, debug=debug)# ob_idmatch = re.search(r'\d+$', opt.linemod_dir)if match:last_number = match.group()ob_id = int(last_number)else:print("No digits found at the end of the string")# for ob_id in reader_tmp.ob_ids:if ob_id:if use_reconstructed_mesh:print("## ob_id:", ob_id)print("## opt.linemod_dir:", opt.linemod_dir)print("## opt.ref_view_dir:", opt.ref_view_dir)mesh = reader_tmp.get_reconstructed_mesh(ref_view_dir=opt.ref_view_dir)else:mesh = reader_tmp.get_gt_mesh(ob_id)# symmetry_tfs = reader_tmp.symmetry_tfs[ob_id]  # !!!!!!!!!!!!!!!!args = []reader = LinemodReader(opt.linemod_dir, split=None)video_id = reader.get_video_id()# est.reset_object(model_pts=mesh.vertices.copy(), model_normals=mesh.vertex_normals.copy(), symmetry_tfs=symmetry_tfs, mesh=mesh)  # rawest.reset_object(model_pts=mesh.vertices.copy(), model_normals=mesh.vertex_normals.copy(), mesh=mesh) # !!!!!!!!!!!!!!!!print("### len(reader.color_files):", len(reader.color_files))for i in range(len(reader.color_files)):args.append((reader, [i], est, debug, ob_id, "cuda:0"))# vis Datato_origin, extents = trimesh.bounds.oriented_bounds(mesh)bbox = np.stack([-extents/2, extents/2], axis=0).reshape(2,3)os.makedirs(f'{opt.linemod_dir}/track_vis', exist_ok=True)outs = []i = 0for arg in args[:200]:print("### num:", i)out, pose = run_pose_estimation_worker(*arg)outs.append(out)center_pose = pose@np.linalg.inv(to_origin)img_color = reader.get_color(i)vis = draw_posed_3d_box(reader.K, img=img_color, ob_in_cam=center_pose, bbox=bbox)vis = draw_xyz_axis(img_color, ob_in_cam=center_pose, scale=0.1, K=reader.K, thickness=3, transparency=0, is_input_rgb=True)imageio.imwrite(f'{opt.linemod_dir}/track_vis/{reader.id_strs[i]}.png', vis)i = i + 1for out in outs:for video_id in out:for id_str in out[video_id]:for ob_id in out[video_id][id_str]:res[video_id][id_str][ob_id] = out[video_id][id_str][ob_id]with open(f'{opt.debug_dir}/linemod_res.yml','w') as ff:yaml.safe_dump(make_yaml_dumpable(res), ff)print("Save linemod_res.yml OK !!!")if __name__=='__main__':parser = argparse.ArgumentParser()code_dir = os.path.dirname(os.path.realpath(__file__))parser.add_argument('--linemod_dir', type=str, default="/guopu/FoundationPose-main/model_free_ref_views/lm_test_all/000015", help="linemod root dir") # lm_test_all  lm_testparser.add_argument('--use_reconstructed_mesh', type=int, default=1)parser.add_argument('--ref_view_dir', type=str, default="/guopu/FoundationPose-main/model_free_ref_views/lm_ref_views/ob_0000015")parser.add_argument('--debug', type=int, default=0)parser.add_argument('--debug_dir', type=str, default=f'/guopu/FoundationPose-main/model_free_ref_views/lm_test_all/debug') # lm_test_all  lm_testopt = parser.parse_args()set_seed(0)detect_type = 'mask'   # mask / box / detectedrun_pose_estimation()

datareader.py

# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.from Utils import *
import json,os,sysBOP_LIST = ['lmo','tless','ycbv','hb','tudl','icbin','itodd']
BOP_DIR = os.getenv('BOP_DIR')def get_bop_reader(video_dir, zfar=np.inf):if 'ycbv' in video_dir or 'YCB' in video_dir:return YcbVideoReader(video_dir, zfar=zfar)if 'lmo' in video_dir or 'LINEMOD-O' in video_dir:return LinemodOcclusionReader(video_dir, zfar=zfar)if 'tless' in video_dir or 'TLESS' in video_dir:return TlessReader(video_dir, zfar=zfar)if 'hb' in video_dir:return HomebrewedReader(video_dir, zfar=zfar)if 'tudl' in video_dir:return TudlReader(video_dir, zfar=zfar)if 'icbin' in video_dir:return IcbinReader(video_dir, zfar=zfar)if 'itodd' in video_dir:return ItoddReader(video_dir, zfar=zfar)else:raise RuntimeErrordef get_bop_video_dirs(dataset):if dataset=='ycbv':video_dirs = sorted(glob.glob(f'{BOP_DIR}/ycbv/test/*'))elif dataset=='lmo':video_dirs = sorted(glob.glob(f'{BOP_DIR}/lmo/lmo_test_bop19/test/*'))elif dataset=='tless':video_dirs = sorted(glob.glob(f'{BOP_DIR}/tless/tless_test_primesense_bop19/test_primesense/*'))elif dataset=='hb':video_dirs = sorted(glob.glob(f'{BOP_DIR}/hb/hb_test_primesense_bop19/test_primesense/*'))elif dataset=='tudl':video_dirs = sorted(glob.glob(f'{BOP_DIR}/tudl/tudl_test_bop19/test/*'))elif dataset=='icbin':video_dirs = sorted(glob.glob(f'{BOP_DIR}/icbin/icbin_test_bop19/test/*'))elif dataset=='itodd':video_dirs = sorted(glob.glob(f'{BOP_DIR}/itodd/itodd_test_bop19/test/*'))else:raise RuntimeErrorreturn video_dirsclass YcbineoatReader:def __init__(self,video_dir, downscale=1, shorter_side=None, zfar=np.inf):self.video_dir = video_dirself.downscale = downscaleself.zfar = zfarself.color_files = sorted(glob.glob(f"{self.video_dir}/rgb/*.png"))self.K = np.loadtxt(f'{video_dir}/cam_K.txt').reshape(3,3)self.id_strs = []for color_file in self.color_files:id_str = os.path.basename(color_file).replace('.png','')self.id_strs.append(id_str)self.H,self.W = cv2.imread(self.color_files[0]).shape[:2]if shorter_side is not None:self.downscale = shorter_side/min(self.H, self.W)self.H = int(self.H*self.downscale)self.W = int(self.W*self.downscale)self.K[:2] *= self.downscaleself.gt_pose_files = sorted(glob.glob(f'{self.video_dir}/annotated_poses/*'))self.videoname_to_object = {'bleach0': "021_bleach_cleanser",'bleach_hard_00_03_chaitanya': "021_bleach_cleanser",'cracker_box_reorient': '003_cracker_box','cracker_box_yalehand0': '003_cracker_box','mustard0': '006_mustard_bottle','mustard_easy_00_02': '006_mustard_bottle','sugar_box1': '004_sugar_box','sugar_box_yalehand0': '004_sugar_box','tomato_soup_can_yalehand0': '005_tomato_soup_can',}def get_video_name(self):return self.video_dir.split('/')[-1]def __len__(self):return len(self.color_files)def get_gt_pose(self,i):try:pose = np.loadtxt(self.gt_pose_files[i]).reshape(4,4)return poseexcept:logging.info("GT pose not found, return None")return Nonedef get_color(self,i):color = imageio.imread(self.color_files[i])[...,:3]color = cv2.resize(color, (self.W,self.H), interpolation=cv2.INTER_NEAREST)return colordef get_mask(self,i):mask = cv2.imread(self.color_files[i].replace('rgb','masks'),-1)if len(mask.shape)==3:for c in range(3):if mask[...,c].sum()>0:mask = mask[...,c]breakmask = cv2.resize(mask, (self.W,self.H), interpolation=cv2.INTER_NEAREST).astype(bool).astype(np.uint8)return maskdef get_depth(self,i):depth = cv2.imread(self.color_files[i].replace('rgb','depth'),-1)/1e3depth = cv2.resize(depth, (self.W,self.H), interpolation=cv2.INTER_NEAREST)depth[(depth<0.1) | (depth>=self.zfar)] = 0return depthdef get_xyz_map(self,i):depth = self.get_depth(i)xyz_map = depth2xyzmap(depth, self.K)return xyz_mapdef get_occ_mask(self,i):hand_mask_file = self.color_files[i].replace('rgb','masks_hand')occ_mask = np.zeros((self.H,self.W), dtype=bool)if os.path.exists(hand_mask_file):occ_mask = occ_mask | (cv2.imread(hand_mask_file,-1)>0)right_hand_mask_file = self.color_files[i].replace('rgb','masks_hand_right')if os.path.exists(right_hand_mask_file):occ_mask = occ_mask | (cv2.imread(right_hand_mask_file,-1)>0)occ_mask = cv2.resize(occ_mask, (self.W,self.H), interpolation=cv2.INTER_NEAREST)return occ_mask.astype(np.uint8)def get_gt_mesh(self):ob_name = self.videoname_to_object[self.get_video_name()]YCB_VIDEO_DIR = os.getenv('YCB_VIDEO_DIR')mesh = trimesh.load(f'{YCB_VIDEO_DIR}/models/{ob_name}/textured_simple.obj')return meshclass BopBaseReader:def __init__(self, base_dir, zfar=np.inf, resize=1):self.base_dir = base_dirself.resize = resizeself.dataset_name = Noneself.color_files = sorted(glob.glob(f"{self.base_dir}/rgb/*"))if len(self.color_files)==0:self.color_files = sorted(glob.glob(f"{self.base_dir}/gray/*"))self.zfar = zfarself.K_table = {}with open(f'{self.base_dir}/scene_camera.json','r') as ff:info = json.load(ff)for k in info:self.K_table[f'{int(k):06d}'] = np.array(info[k]['cam_K']).reshape(3,3)self.bop_depth_scale = info[k]['depth_scale']if os.path.exists(f'{self.base_dir}/scene_gt.json'):with open(f'{self.base_dir}/scene_gt.json','r') as ff:self.scene_gt = json.load(ff)self.scene_gt = copy.deepcopy(self.scene_gt)   # Release file handle to be pickle-able by joblibassert len(self.scene_gt)==len(self.color_files)else:self.scene_gt = Noneself.make_id_strs()def make_scene_ob_ids_dict(self):with open(f'{BOP_DIR}/{self.dataset_name}/test_targets_bop19.json','r') as ff:self.scene_ob_ids_dict = {}data = json.load(ff)for d in data:if d['scene_id']==self.get_video_id():id_str = f"{d['im_id']:06d}"if id_str not in self.scene_ob_ids_dict:self.scene_ob_ids_dict[id_str] = []self.scene_ob_ids_dict[id_str] += [d['obj_id']]*d['inst_count']def get_K(self, i_frame):K = self.K_table[self.id_strs[i_frame]]if self.resize!=1:K[:2,:2] *= self.resizereturn Kdef get_video_dir(self):video_id = int(self.base_dir.rstrip('/').split('/')[-1])return video_iddef make_id_strs(self):self.id_strs = []for i in range(len(self.color_files)):name = os.path.basename(self.color_files[i]).split('.')[0]self.id_strs.append(name)def get_instance_ids_in_image(self, i_frame:int):ob_ids = []if self.scene_gt is not None:name = int(os.path.basename(self.color_files[i_frame]).split('.')[0])for k in self.scene_gt[str(name)]:ob_ids.append(k['obj_id'])elif self.scene_ob_ids_dict is not None:return np.array(self.scene_ob_ids_dict[self.id_strs[i_frame]])else:mask_dir = os.path.dirname(self.color_files[0]).replace('rgb','mask_visib')id_str = self.id_strs[i_frame]mask_files = sorted(glob.glob(f'{mask_dir}/{id_str}_*.png'))ob_ids = []for mask_file in mask_files:ob_id = int(os.path.basename(mask_file).split('.')[0].split('_')[1])ob_ids.append(ob_id)ob_ids = np.asarray(ob_ids)return ob_idsdef get_gt_mesh_file(self, ob_id):raise RuntimeError("You should override this")def get_color(self,i):color = imageio.imread(self.color_files[i])if len(color.shape)==2:color = np.tile(color[...,None], (1,1,3))  # Gray to RGBif self.resize!=1:color = cv2.resize(color, fx=self.resize, fy=self.resize, dsize=None)return colordef get_depth(self,i, filled=False):if filled:depth_file = self.color_files[i].replace('rgb','depth_filled')depth_file = f'{os.path.dirname(depth_file)}/0{os.path.basename(depth_file)}'depth = cv2.imread(depth_file,-1)/1e3else:depth_file = self.color_files[i].replace('rgb','depth').replace('gray','depth')depth = cv2.imread(depth_file,-1)*1e-3*self.bop_depth_scaleif self.resize!=1:depth = cv2.resize(depth, fx=self.resize, fy=self.resize, dsize=None, interpolation=cv2.INTER_NEAREST)depth[depth<0.1] = 0depth[depth>self.zfar] = 0return depthdef get_xyz_map(self,i):depth = self.get_depth(i)xyz_map = depth2xyzmap(depth, self.get_K(i))return xyz_mapdef get_mask(self, i_frame:int, ob_id:int, type='mask_visib'):'''@type: mask_visib (only visible part) / mask (projected mask from whole model)'''pos = 0name = int(os.path.basename(self.color_files[i_frame]).split('.')[0])if self.scene_gt is not None:for k in self.scene_gt[str(name)]:if k['obj_id']==ob_id:breakpos += 1mask_file = f'{self.base_dir}/{type}/{name:06d}_{pos:06d}.png'if not os.path.exists(mask_file):logging.info(f'{mask_file} not found')return Noneelse:# mask_dir = os.path.dirname(self.color_files[0]).replace('rgb',type)# mask_file = f'{mask_dir}/{self.id_strs[i_frame]}_{ob_id:06d}.png'raise RuntimeErrormask = cv2.imread(mask_file, -1)if self.resize!=1:mask = cv2.resize(mask, fx=self.resize, fy=self.resize, dsize=None, interpolation=cv2.INTER_NEAREST)return mask>0def get_gt_mesh(self, ob_id:int):mesh_file = self.get_gt_mesh_file(ob_id)mesh = trimesh.load(mesh_file)mesh.vertices *= 1e-3return meshdef get_model_diameter(self, ob_id):dir = os.path.dirname(self.get_gt_mesh_file(self.ob_ids[0]))info_file = f'{dir}/models_info.json'with open(info_file,'r') as ff:info = json.load(ff)return info[str(ob_id)]['diameter']/1e3def get_gt_poses(self, i_frame, ob_id):gt_poses = []name = int(self.id_strs[i_frame])for i_k, k in enumerate(self.scene_gt[str(name)]):if k['obj_id']==ob_id:cur = np.eye(4)cur[:3,:3] = np.array(k['cam_R_m2c']).reshape(3,3)cur[:3,3] = np.array(k['cam_t_m2c'])/1e3gt_poses.append(cur)return np.asarray(gt_poses).reshape(-1,4,4)def get_gt_pose(self, i_frame:int, ob_id, mask=None, use_my_correction=False):ob_in_cam = np.eye(4)best_iou = -np.infbest_gt_mask = Nonename = int(self.id_strs[i_frame])for i_k, k in enumerate(self.scene_gt[str(name)]):if k['obj_id']==ob_id:cur = np.eye(4)cur[:3,:3] = np.array(k['cam_R_m2c']).reshape(3,3)cur[:3,3] = np.array(k['cam_t_m2c'])/1e3if mask is not None:  # When multi-instance exists, use mask to determine which onegt_mask = cv2.imread(f'{self.base_dir}/mask_visib/{self.id_strs[i_frame]}_{i_k:06d}.png', -1).astype(bool)intersect = (gt_mask*mask).astype(bool)union = (gt_mask+mask).astype(bool)iou = float(intersect.sum())/union.sum()if iou>best_iou:best_iou = ioubest_gt_mask = gt_maskob_in_cam = curelse:ob_in_cam = curbreakif use_my_correction:if 'ycb' in self.base_dir.lower() and 'train_real' in self.color_files[i_frame]:video_id = self.get_video_id()if ob_id==1:if video_id in [12,13,14,17,24]:ob_in_cam = ob_in_cam@self.symmetry_tfs[ob_id][1]return ob_in_camdef load_symmetry_tfs(self):dir = os.path.dirname(self.get_gt_mesh_file(self.ob_ids[0]))info_file = f'{dir}/models_info.json'with open(info_file,'r') as ff:info = json.load(ff)self.symmetry_tfs = {}self.symmetry_info_table = {}for ob_id in self.ob_ids:self.symmetry_info_table[ob_id] = info[str(ob_id)]self.symmetry_tfs[ob_id] = symmetry_tfs_from_info(info[str(ob_id)], rot_angle_discrete=5)self.geometry_symmetry_info_table = copy.deepcopy(self.symmetry_info_table)def get_video_id(self):return int(self.base_dir.split('/')[-1])class LinemodOcclusionReader(BopBaseReader):def __init__(self,base_dir='/mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD-O/lmo_test_all/test/000002', zfar=np.inf):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'lmo'self.K = list(self.K_table.values())[0]self.ob_ids = [1,5,6,8,9,10,11,12]self.ob_id_to_names = {1: 'ape',2: 'benchvise',3: 'bowl',4: 'camera',5: 'water_pour',6: 'cat',7: 'cup',8: 'driller',9: 'duck',10: 'eggbox',11: 'glue',12: 'holepuncher',13: 'iron',14: 'lamp',15: 'phone',}# self.load_symmetry_tfs()def get_gt_mesh_file(self, ob_id):mesh_dir = f'{BOP_DIR}/{self.dataset_name}/models/obj_{ob_id:06d}.ply'return mesh_dirclass LinemodReader(LinemodOcclusionReader):def __init__(self, base_dir='/mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD/lm_test_all/test/000001', zfar=np.inf, split=None):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'lm'if split is not None:  # train/testprint("## split is not None")with open(f'/mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD/Linemod_preprocessed/data/{self.get_video_id():02d}/{split}.txt','r') as ff:lines = ff.read().splitlines()self.color_files = []for line in lines:id = int(line)self.color_files.append(f'{self.base_dir}/rgb/{id:06d}.png')self.make_id_strs()self.ob_ids = np.setdiff1d(np.arange(1,16), np.array([7,3])).tolist()  # Exclude bowl and mug# self.load_symmetry_tfs()def get_gt_mesh_file(self, ob_id):root = self.base_dirprint(f'{root}/../')print(f'{root}/lm_models')print(f'{root}/lm_models/models/obj_{ob_id:06d}.ply')while 1:if os.path.exists(f'{root}/lm_models'):mesh_dir = f'{root}/lm_models/models/obj_{ob_id:06d}.ply'breakelse:root = os.path.abspath(f'{root}/../')mesh_dir = f'{root}/lm_models/models/obj_{ob_id:06d}.ply'breakreturn mesh_dirdef get_reconstructed_mesh(self, ref_view_dir):mesh = trimesh.load(os.path.abspath(f'{ref_view_dir}/model/model.obj'))return meshclass YcbVideoReader(BopBaseReader):def __init__(self, base_dir, zfar=np.inf):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'ycbv'self.K = list(self.K_table.values())[0]self.make_id_strs()self.ob_ids = np.arange(1,22).astype(int).tolist()YCB_VIDEO_DIR = os.getenv('YCB_VIDEO_DIR')self.ob_id_to_names = {}self.name_to_ob_id = {}# names = sorted(os.listdir(f'{YCB_VIDEO_DIR}/models/'))if os.path.exists(f'{YCB_VIDEO_DIR}/models/'):names = sorted(os.listdir(f'{YCB_VIDEO_DIR}/models/'))for i,ob_id in enumerate(self.ob_ids):self.ob_id_to_names[ob_id] = names[i]self.name_to_ob_id[names[i]] = ob_idelse:names = []if 0:# if 'BOP' not in self.base_dir:with open(f'{self.base_dir}/../../keyframe.txt','r') as ff:self.keyframe_lines = ff.read().splitlines()# self.load_symmetry_tfs()'''for ob_id in self.ob_ids:if ob_id in [1,4,6,18]:   # Cylinderself.geometry_symmetry_info_table[ob_id] = {'symmetries_continuous': [{'axis':[0,0,1], 'offset':[0,0,0]},],'symmetries_discrete': euler_matrix(0, np.pi, 0).reshape(1,4,4).tolist(),}elif ob_id in [13]:self.geometry_symmetry_info_table[ob_id] = {'symmetries_continuous': [{'axis':[0,0,1], 'offset':[0,0,0]},],}elif ob_id in [2,3,9,21]:   # Rectangle boxtfs = []for rz in [0, np.pi]:for rx in [0,np.pi]:for ry in [0,np.pi]:tfs.append(euler_matrix(rx, ry, rz))self.geometry_symmetry_info_table[ob_id] = {'symmetries_discrete': np.asarray(tfs).reshape(-1,4,4).tolist(),}else:pass'''def get_gt_mesh_file(self, ob_id):if 'BOP' in self.base_dir:mesh_file = os.path.abspath(f'{self.base_dir}/../../ycbv_models/models/obj_{ob_id:06d}.ply')else:mesh_file = f'{self.base_dir}/../../ycbv_models/models/obj_{ob_id:06d}.ply'return mesh_filedef get_gt_mesh(self, ob_id:int, get_posecnn_version=False):if get_posecnn_version:YCB_VIDEO_DIR = os.getenv('YCB_VIDEO_DIR')mesh = trimesh.load(f'{YCB_VIDEO_DIR}/models/{self.ob_id_to_names[ob_id]}/textured_simple.obj')return meshmesh_file = self.get_gt_mesh_file(ob_id)mesh = trimesh.load(mesh_file, process=False)mesh.vertices *= 1e-3tex_file = mesh_file.replace('.ply','.png')if os.path.exists(tex_file):from PIL import Imageim = Image.open(tex_file)uv = mesh.visual.uvmaterial = trimesh.visual.texture.SimpleMaterial(image=im)color_visuals = trimesh.visual.TextureVisuals(uv=uv, image=im, material=material)mesh.visual = color_visualsreturn meshdef get_reconstructed_mesh(self, ob_id, ref_view_dir):mesh = trimesh.load(os.path.abspath(f'{ref_view_dir}/ob_{ob_id:07d}/model/model.obj'))return meshdef get_transform_reconstructed_to_gt_model(self, ob_id):out = np.eye(4)return outdef get_visible_cloud(self, ob_id):file = os.path.abspath(f'{self.base_dir}/../../models/{self.ob_id_to_names[ob_id]}/visible_cloud.ply')pcd = o3d.io.read_point_cloud(file)return pcddef is_keyframe(self, i):color_file = self.color_files[i]video_id = self.get_video_id()frame_id = int(os.path.basename(color_file).split('.')[0])key = f'{video_id:04d}/{frame_id:06d}'return (key in self.keyframe_lines)class TlessReader(BopBaseReader):def __init__(self, base_dir, zfar=np.inf):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'tless'self.ob_ids = np.arange(1,31).astype(int).tolist()self.load_symmetry_tfs()def get_gt_mesh_file(self, ob_id):mesh_file = f'{self.base_dir}/../../../models_cad/obj_{ob_id:06d}.ply'return mesh_filedef get_gt_mesh(self, ob_id):mesh = trimesh.load(self.get_gt_mesh_file(ob_id))mesh.vertices *= 1e-3mesh = trimesh_add_pure_colored_texture(mesh, color=np.ones((3))*200)return meshclass HomebrewedReader(BopBaseReader):def __init__(self, base_dir, zfar=np.inf):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'hb'self.ob_ids = np.arange(1,34).astype(int).tolist()self.load_symmetry_tfs()self.make_scene_ob_ids_dict()def get_gt_mesh_file(self, ob_id):mesh_file = f'{self.base_dir}/../../../hb_models/models/obj_{ob_id:06d}.ply'return mesh_filedef get_gt_pose(self, i_frame:int, ob_id, use_my_correction=False):logging.info("WARN HomeBrewed doesn't have GT pose")return np.eye(4)class ItoddReader(BopBaseReader):def __init__(self, base_dir, zfar=np.inf):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'itodd'self.make_id_strs()self.ob_ids = np.arange(1,29).astype(int).tolist()self.load_symmetry_tfs()self.make_scene_ob_ids_dict()def get_gt_mesh_file(self, ob_id):mesh_file = f'{self.base_dir}/../../../itodd_models/models/obj_{ob_id:06d}.ply'return mesh_fileclass IcbinReader(BopBaseReader):def __init__(self, base_dir, zfar=np.inf):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'icbin'self.ob_ids = np.arange(1,3).astype(int).tolist()self.load_symmetry_tfs()def get_gt_mesh_file(self, ob_id):mesh_file = f'{self.base_dir}/../../../icbin_models/models/obj_{ob_id:06d}.ply'return mesh_fileclass TudlReader(BopBaseReader):def __init__(self, base_dir, zfar=np.inf):super().__init__(base_dir, zfar=zfar)self.dataset_name = 'tudl'self.ob_ids = np.arange(1,4).astype(int).tolist()self.load_symmetry_tfs()def get_gt_mesh_file(self, ob_id):mesh_file = f'{self.base_dir}/../../../tudl_models/models/obj_{ob_id:06d}.ply'return mesh_file

运行run_linemod.py:

python run_linemod.py

能看到文件夹model_free_ref_views/lm_test_all/000015/track_vis/

里面存放可视化结果:

分享完成~

本文先介绍到这里,后面会分享“6D位姿估计”的其它数据集、算法、代码、具体应用示例。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/696016.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

良心实用的电脑桌面便利贴,好用的便利贴便签小工具

在日常办公中&#xff0c;上班族经常需要记录临时任务、重要提醒或者突发的灵感。比如&#xff0c;在紧张的项目会议中&#xff0c;忽然想到一个改进的点子&#xff0c;或者是在处理邮件时&#xff0c;需要记下对某个客户的回复要点。在这些场景下&#xff0c;如果能直接在电脑…

电商平台接口自动化框架实践||电商API数据采集接口

电商数据采集接口 语言&#xff1a;python 接口自动化实现流程 红色为可实现/尚未完成 绿色为需要人工干预部分 自动生成测试用例模板&#xff08;俩种方式二选一&#xff09;&#xff1a; mimproxy&#xff0c;通过浏览器代理抓包方式&#xff0c;访问 H5 或者 web 页面&a…

JSR303数据校验 —— @Valid嵌套校验、集合校验

1. 依赖版本 &#xff08;1&#xff09;SpringBoot 3.1.11 &#xff08;2&#xff09;JDK17 2. Valid、Validated 简介 说明&#xff1a;在Spring框架中Valid默认不会对集合&#xff08;List、Set等&#xff09;内部的元素进行校验&#xff0c;需要将Spring提供的Validated注…

代码随想录 打卡day23,24,25

1 二叉搜索树的最小绝对差 注意审题&#xff0c;题目当值说到是一个二叉搜索树&#xff0c;因此我们只需进行中序遍历即可&#xff0c;然后得到一个有序数组之后进行编辑&#xff0c;统计出来最小差。 class solution{ private:vector<int> vec;void traversal(TreeNode…

cuttag学习笔记

由于课题可能用上cut&tag这个技术&#xff0c;遂跟教程学习一波&#xff0c;记录一下以便后续的学习&#xff08;主要是怕忘了&#xff09; 教程网址cut&tag教程 背景知识&#xff1a;靶标下裂解与标记&#xff08;Cleavage Under Targets & Tagmentation&#xf…

每日互动(个推)与您相约2024 AI+研发数字峰会(AiDD)上海站

伴随着人工智能在众多行业领域的广泛应用及其带来的颠覆性变革&#xff0c;软件的开发模式、方式和实践也将发生巨大的变化。 5月17-18日&#xff0c;2024 AI研发数字峰会&#xff08;AiDD&#xff09;上海站即将重磅开幕。峰会设置了15个主题论坛&#xff0c;策划60精彩议题内…

CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation

名词解释 1.特征重建 特征重建是一种机器学习中常用的技术&#xff0c;通常用于自监督学习或无监督学习任务。在特征重建中&#xff0c;模型被要求将输入数据经过编码器&#xff08;encoder&#xff09;转换成某种表示&#xff0c;然后再经过解码器&#xff08;decoder&#x…

有了这玩意,分分钟开发公众号功能!

大家好&#xff0c;我是程序员鱼皮。 不论在企业、毕设还是个人练手项目中&#xff0c;很多同学或多或少都会涉及微信相关生态的开发&#xff0c;例如微信支付、开放平台、公众号等等。 一般情况下&#xff0c;我们需要到官网查阅这些模块对应的 API 接口&#xff0c;自己编写…

windows打开防火墙指定端口(局域网访问本地项目)

windows打开防火墙指定端口(局域网访问本地项目) 本地运行了Vue前端项目&#xff0c;部署在5173端口&#xff0c;想让同事从局域网内访问项目&#xff0c;开放本机端口5173允许访问 在 Windows 上使用自带的防火墙&#xff0c;你可以按照以下步骤来允许局域网内其他设备对特定端…

【刷题】一篇文章搞定“位运算”

只要春天不死&#xff0c;就有迎春的花朵年年岁岁开放&#xff0c;生命讲涅槃&#xff0c;生生不息&#xff0c;并会以另一种形式永存。 – 路遥 《平凡的世界》 (◦′ᆺ‵◦) ♬ ✧❥✧.•✧♡✧ ℒℴѵℯ ✧♡✧•.❥ (◦′ᆺ‵◦) ♬ ✧❥✧.•✧♡✧ ℒℴѵℯ ✧♡✧•.❥…

C++——缺省参数与重载函数

目录 ​前言 一.缺省参数 1.1缺省参数概念 1.2缺省参数分类 注意事项&#xff1a; 二.函数重载 2.1函数重载概念 2.2c支持函数重载原理——命名修饰 前言 本篇文章主要讲述c中有关于缺少参数与函数重载的相关概念与实例&#xff0c;以下是本人拙见&#xff0c;如有错误…

Apple store 静安·苹果店欣赏

官网&#xff1a; https://www.apple.com/today/Apple 亚洲第一大商店&#xff1a;Apple 静安零售店现已在上海开幕 静安苹果欣赏