谷歌DeepMind—运用深度强化学习为双足机器人学习敏捷足球技能-编程知识

谷歌DeepMind—运用深度强化学习为双足机器人学习敏捷足球技能

news/2025/2/7 7:32:27/文章来源:https://www.cnblogs.com/bjrobot/p/18584604

原文链接：OP3 Soccer

Take a look at the OP3 Powered by DYNAMIXEL

看看由DYNAMIXEL 驱动的OP3

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner—well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website: OP3 Soccer

我们研究了深度强化学习（Deep RL）是否能为一种低成本的小型仿人机器人合成复杂且安全的运动技能，这些技能可在动态环境中组合成复杂的行为策略。我们使用深度强化学习训练了一个拥有20个驱动关节的仿人机器人，让其参与简化的一对一（1v1）足球比赛。我们首先单独训练各项技能，然后在自我对战的场景中将这些技能进行端到端的组合。由此产生的策略展现出稳健且动态的运动技能，如快速跌倒恢复、行走、转身、踢球等，并且这些技能之间的转换流畅、稳定、高效，远远超出了人们对该机器人的直观预期。这些智能体还形成了对游戏的基本战略理解，并学会了例如预判球的运动轨迹和阻挡对手射门等技能。一系列的行为仅通过一组简单的奖励就得以实现。我们的智能体是在模拟环境中进行训练的，并能实现向真实机器人的零样本迁移。我们发现，尽管存在显著的未建模效应和不同机器人实例间的差异，但足够高频的控制、有针对性的动力学随机化以及模拟训练中的扰动相结合，仍能实现高质量的迁移。尽管这些机器人本身很脆弱，但通过训练过程中对硬件进行小幅修改以及对行为进行基本正则化，机器人能够学习到安全有效的运动方式，同时保持动态和敏捷的表现。事实上，尽管这些智能体的优化目标是得分，但在实验中，它们行走速度比脚本基准快156%，起身所需时间减少63%，踢球速度提高24%，同时还能高效组合各项技能以实现长期目标。有关这些新兴行为和完整1v1比赛的视频可在补充网站OP3 Soccer上查看。

Soccer players can tackle, get up, kick and chase a ball in one seamless motion. How could robots master these agile motor skills?

足球运动员能够流畅地完成抢断、起身、踢球和追球等一系列动作。机器人怎样才能掌握这些敏捷的运动技能呢？

Movie 1: Project overview

视频1：项目概述

We investigated the application of Deep Reinforcement Learning (Deep RL) for low-cost, miniature humanoid hardware in a dynamic environment, showing the method can synthesize sophisticated and safe movement skills making up complex behavioral strategies in a simplified one-versus-one (1v1) soccer game.

我们研究了深度强化学习（Deep RL）在动态环境中低成本小型仿人硬件上的应用，表明该方法能够在简化的一对一（1v1）足球比赛中合成复杂且安全的运动技能，从而构成复杂的行为策略。

Our agents, with 20 actuated joints, were trained in simulation using the MuJoCo physics engine, and transferred zero-shot to real robots. The agents use proprioception and game state features as observations. The trained soccer players exhibit robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more. They transition between these emergent skills automatically in a smooth, stable, and efficient manner, going beyond what might intuitively be expected from the platform. The agents also developed a basic strategic understanding of the game, learning to anticipate ball movements and to block opponent shots.

我们的智能体拥有20个驱动关节，使用MuJoCo物理引擎进行模拟训练，并能实现向真实机器人的零样本迁移。这些智能体将本体感受和游戏状态特征作为观察对象。经过训练的足球运动员展现出稳健且动态的运动技能，如快速跌倒恢复、行走、转身、踢球等。他们能够以流畅、稳定、高效的方式在这些新兴技能之间自动转换，超出了人们对该平台的直观预期。这些智能体还形成了对游戏的基本战略理解，并学会了预判球的运动轨迹和阻挡对手射门。

Movie 2: Behavior and skill highlights

视频2：行为与技能亮点

Recurring skills and strategies selected from typical one-versus-one play. The agent demonstrates agile skills including getting up and turning; reactive behavior including kicking a moving ball; object interaction including ball control; dynamic defensive blocking; strategical play including defensive positioning. The agent also quickly transitions between skills (turning, chasing, controlling, then kicking, for example), and combines them (frequently turning and kicking, for example).

从典型的一对一比赛中挑选出的重复技能和策略。智能体展示了敏捷的技能，包括起身和转身；反应行为，包括踢动球；目标交互，包括控球；动态防守拦截；战术玩法，包括防守定位。智能体还能在技能之间快速转换（例如转身、追逐、控球，然后踢球），并将它们组合起来（例如频繁转身和踢球）。

Movie 3: Comparison to scripted baseline controllers

视频3：与脚本基准控制器的比较

Certain key locomotion behaviors, including getting up, kicking, walking, and turning are available for the OP3 robot. This movie illustrates the baselines and a side-by-side comparison with the corresponding behaviors from the deep RL agent.

OP3机器人具备某些关键的移动行为，包括起身、踢球、行走和转身。本视频展示了这些行为的基准，并与深度强化学习（Deep RL）智能体的相应行为进行了对比。

Movie 4: Turning and kicking behaviors in simulation and in the real environment

视频4：模拟环境和真实环境中的转身和踢球行为

One of the agile behaviors we see during soccer play is the turning skill discovered by the agent, shown here in slow motion. It pivots on the corner of one foot and takes 2-3 steps to turn a 180 degrees. Although learned entirely in simulation, this behavior is successful on the OP3 after zero-shot transfer to the real robot, with perhaps surprisingly low sim-to-real gap given the highly optimized nature of the behavior. The agent's kicking behavior is also shown here in slow motion.

在足球比赛中，我们观察到的敏捷行为之一是智能体发现的转身技能，这里以慢动作展示。它以一只脚的脚尖为轴心，通过2-3步完成180度转身。尽管这一行为完全是在模拟环境中学习的，但在零样本迁移到真实机器人OP3后仍然成功，考虑到该行为的高度优化特性，模拟与真实之间的差距之小可能令人惊讶。智能体的踢球行为也在这里以慢动作展示。

视频展示链接：