小机器人在现实世界中学会快速驾驶

小机器人在现实世界中学会快速驾驶
—强化学习加上预训练让机器人赛车手加速前进—

Without a lifetime of experience to build on like humans have (and totally take for granted), robots that want to learn a new skill often have to start from scratch. Reinforcement learning lets robots learn new skills through trial and error but, especially in the case of end-to-end vision-based control policies, it takes a lot of time: The real world is a weirdly lit, friction-filled, obstacle-y mess that robots can’t understand without a frequently impractical amount of effort.

如果没有像人类那样终生积累的经验(而且完全认为这是理所当然的),想要学习一项新技能的机器人往往不得不从头开始。强化学习可以让机器人通过试错来学习新技能,但尤其是在端到端基于视觉的控制策略的情况下,这需要大量时间:现实世界是一个光线怪异、充满摩擦、充满障碍的混乱世界,如果不付出很多的努力,机器人就无法理解。

Roboticists at the University of California at Berkeley have vastly sped up this process by doing the same kind of cheating that humans do—instead of starting from scratch, you start with some previous experience that helps get you going. By leveraging a “foundation model” that was pretrained on robots driving themselves around, the researchers were able to get a small-scale robotic rally car to teach itself to race around indoor and outdoor tracks, matching human performance after just 20 minutes of practice.

加州大学伯克利分校的机器人学家可能已经加快了这一过程,他们做了与人类相同的行为,不是从头开始,而是从以前的一些经验开始,这有助于你继续前进。通过利用一个预先训练过的机器人驾驶的“基础模型”,研究人员能够获得一辆小型机器人拉力车,教自己在室内和室外赛道上比赛,只需20分钟的练习就可以与人类的表现相匹配。

在这里插入图片描述

That first pretraining stage happens at your leisure, by manually driving a robot (that isn’t necessarily the one that will be doing the task you care about) around different environments. The goal isn’t to teach the robot to drive fast around a course but rather the basics of not running into stuff.

第一个预训练阶段发生在你空闲的时候,通过在不同的环境中手动驾驶机器人(不一定是要完成你关心的任务的机器人)。我们的目标不是教机器人在赛道上快速行驶,而是教机器人不要碰撞到其它物体的基本知识。

With that pretrained foundation model in place, when you then move over to the little robotic rally car, it no longer has to start from scratch. Instead, you can plop it onto the course you want it to learn, drive it around once slowly to show it where you want it to go, and then let it go fully autonomous, training itself to drive faster and faster. With a low-resolution, front-facing camera and some basic state estimation, the robot attempts to reach the next checkpoint on the course as quickly as possible, leading to some interesting emergent behaviors:

有了预先训练好的基础模型,当你转向小型机器人拉力车时,它不再需要从头开始。相反,你可以把它放在你想让它学习的课程上,慢慢地开它一圈,向它展示你想让它去哪里,然后让它完全自主训练自己开得越来越快。通过低分辨率、前置摄像头和一些基本状态估计,机器人试图尽快到达球场上的下一个检查点,从而产生一些有趣的突发行为:
The system learns the concept of a “racing line,” finding a smooth path through the lap and maximizing its speed through tight corners and chicanes. The robot learns to carry its speed into the apex, then brakes sharply to turn and accelerates out of the corner, to minimize the driving duration. With a low-friction surface, the policy learns to oversteer slightly when turning, drifting into the corner to achieve fast rotation without braking during the turn. In outdoor environments, the learned policy is also able to distinguish ground characteristics, preferring smooth, high-traction areas on and around concrete paths over areas with tall grass that impedes the robot’s motion.

该系统学习了“赛道”的概念,在单圈中找到一条平滑的路径,并在急转弯和弯道中最大限度地提高速度。机器人学会将自己的速度带到顶点,然后急刹车转弯并加速出弯,以最大限度地缩短驾驶时间。在低摩擦表面的情况下,该策略学会了在转弯时轻微转向过度,在转弯过程中漂移到弯道以实现快速旋转而不制动。在户外环境中,习得的策略也能够区分地面特征,更喜欢混凝土路径上及其周围光滑、高牵引力的区域,而不是有阻碍机器人运动的高草的区域。

The other clever bit here is the reset feature, which is necessary in real-world training. When training in simulation, it’s super easy to reset a robot that fails, but outside of simulation, a failure can (by definition) end the training if the robot gets itself stuck. That’s not a big deal if you want to spend all your time minding the robot while it learns, but if you have something better to do, the robot needs to be able to train autonomously from start to finish. In this case, if the robot hasn’t moved at least 0.5 meters in the previous 3 seconds, it knows that it’s stuck, and it will execute the simple behaviors of turning randomly, backing up, and then trying to drive forward again, which gets it unstuck eventually.

这里的另一个聪明之处是重置功能,这在现实世界的训练中是必不可少的。在模拟中训练时,重置失败的机器人非常容易,但在模拟之外,如果机器人陷入困境,失败可能(根据定义)结束训练。如果你想在机器人学习的同时花所有的时间思考它,那没什么大不了的,但如果你有更好的事情要做,机器人需要能够从头到尾自主训练。在这种情况下,如果机器人在前3秒内没有移动至少0.5米,它就会知道自己被卡住了,它会执行随机转弯、倒车,然后试图再次向前行驶的简单行为,最终会被卡住。

During indoor and outdoor experiments, the robot was able to learn aggressive driving comparable to that of a human expert after just 20 minutes of autonomous practice, which the researchers say “provides strong validation that deep reinforcement learning can indeed be a viable tool for learning real-world policies even from raw images, when combined with appropriate pretraining and implemented in the context of an autonomous training framework.” It’s going to take a lot more work to implement this sort of thing safely on a larger platform, but this little car is taking the first few laps in the right direction just as quickly as it possibly can.

在室内和室外实验中,只需20分钟的自主练习,机器人就能够学会与人类专家相当的激进驾驶,研究人员表示,这“有力地验证了深度强化学习确实是一种可行的工具,即使是从原始图像中学习现实世界的政策,只要与适当的预训练相结合,并在自主训练框架的背景下实现。”要在更大的平台上安全地实现这类事情,还需要做更多的工作,但这辆小车正以最快的速度朝着正确的方向跑完前几圈。

“FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing,” by Kyle Stachowicz, Arjun Bhorkar, Dhruv Shah, Ilya Kostrikov, and Sergey Levine from UC Berkeley, is available on arXiv.

加州大学伯克利分校的Kyle Stachowicz、Arjun Bhorkar、Dhruv Shah、Ilya Kostrikov和Sergey Levine的《FastRLAP:通过深度RL和自主练习学习高速驾驶的系统》可在arXiv上获得。

北京智能佳科技有限公司

400 099 1872

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/14678.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

系统吞吐量(TPS)、用户并发量、性能测试概念和公式

目录 PS:下面是性能测试的主要概念和计算公式,记录下: 一.系统吞度量要素: 二.系统吞吐量评估: 软件性能测试的基本概念和计算公式 一、软件性能的关注点 二、软件性能的几个主要术语 PS&…

hive数据的导入导出

一、hive 的数据导入 Linux本地文件以及数据格式: 在hive中创建表: create table t_user( id int ,name string ) row format delimited fields terminated by "," lines terminated by \n stored as textfile;stored as常见的几种格式 1.…

使用wordpress搭建WebStack导航网站记录

0 序言 首先,我来介绍下,这个webstack导航网站实际上是被做成了wordpress的一个主题,具体这个主题的下载地址如下: WordPress 版 WebStack 导航主题https://github.com/owen0o0/WebStack 我们不需要使用git clone命令&…

回归预测 | MATLAB实现CNN-BiGRU-Attention多输入单输出回归预测

回归预测 | MATLAB实现CNN-BiGRU-Attention多输入单输出回归预测 目录 回归预测 | MATLAB实现CNN-BiGRU-Attention多输入单输出回归预测预测效果基本介绍模型描述程序设计参考资料 预测效果 基本介绍 MATLAB实现CNN-BiGRU-Attention多输入单输出回归预测,CNN-GRU结合…

机器人轨迹生成:轨迹规划与路径规划

机器人轨迹生成涉及到轨迹规划和路径规划两个关键概念,它们是机器人运动控制中的重要组成部分。下面对轨迹规划和路径规划进行深入比较。 轨迹规划(Trajectory Planning): 定义:轨迹规划是指在机器人运动中确定机器人末…

Linux 文件属性

ubuntu命令行下输入以下命令: ls- al第一个字符表示“文件类型”,它是目录、文件或链接文件等。 文件类型后面的 9 个字符以 3 个为一组,第一组表示“文件所有者的权限”;第二组表示“用户组的权限”;第三组表示“其…

【Azure】解析 Microsoft Defender for Cloud:云安全的保护与管理

你在使用自己的电脑的时候,作为安全防护你可能直接装个杀毒软件,或者什么xx管家之类的,那么你是否有想过,如果我有一套云服务之后,我应该如何进行安全防护呢?本文带你了解在 Azure 云中的安全防护体系&…

Portraiture最新PS/LR 4.1.0.3皮肤修饰插件

Portraiture是一款惹人喜爱的PS磨皮插件。它能智能地对图像中的皮肤材质、头发、眉毛、睫毛等部位进行平滑和减少疵点处理,相对于Camera RAW,它能选择肌肤的色彩范围,对选择的部分进行单独处理。这样避免了其他部分同时被美化。 Portraiture…

SIFT(尺度不变特征变换)

Sift(尺度不变特征变换),全称是Scale Invariant Feature Transform Sift提取图像的局部特征,在尺度空间寻找极值点,并提取出其位置、尺度、方向信息。 Sfit的应用范围包括物体辨别、机器人地图感知与导航、影像拼接、…

开源网安受邀参加2023全球数字经济大会,分享软件安全落地实践经验

近日,2023全球数字经济大会数字安全生态建设专题论坛在京隆重举行。作为2023全球数字经济大会的重要组成部分,本次论坛围绕“数字安全生态建设”这一主题,邀请政府主管部门、行业专家学者、关键信息基础设施运营主体、数字安全企业、数据要素…

OpenCV使用putText将文字绘制到图像上

#include <opencv2/opencv.hpp>int main(int argc, char **argv) {cv::Mat image = cv::imread(

无源光网络(PON)介绍及其应用

文章目录 1、无源光网络&#xff08;PON&#xff09;介绍ONU&#xff08;Optical Network Unit&#xff09;&#xff0c;光网络单元OLT&#xff08;Optical line terminal&#xff09;&#xff0c;光线路终端 2、FTTH、FTTB、FTTR组网介绍FTTR组网规划 3、局端接入设备产品介绍…