机器人学习照着“YouTube”视频做饭

来源：互联网 2015-11-17

原文：英文

The University of Maryland project is developing a way for robots to learn to cook from videos (Image: Shutterstock)

Cooking, they say, is as much an art as a science, so it's no surprise that robots have a difficult time in the kitchen. Perhaps one day robot chefs will be as commonplace as blenders, but they will still need to learn their job. To help them, scientists at the University of Maryland and NICTA, Australia are working on ways for robots to learn how to cook by watching YouTube videos.

Cooking is an everyday task, but like walking, talking, and many other mundane things, we don't appreciate how difficult it is. Take, for example, the simple chef's knife. It's a simple 8-in length of triangular steel, yet in the hands of a skilled cook, it can replace almost every fancy gadget in the kitchen from the food processor to the garlic press.

That's because human beings are very good at manipulating objects. The human hand is amazingly versatile and human beings are endlessly inventive. The trick is to find a way to get a robot to even remotely do what a human can do when chopping an onion or whisking an egg – not to mention something more complicated.

The traditional way engineers have handled the problem is to simplify the task. That is, to break down the job and redesign it so it can be done with claws or pincers, or creating specialized manipulators that can do one task very well and nothing else, or overwhelm the problem with universal graspers like the Versaball.

Grasps identified by the robot

But this only goes so far. There are many tasks that still require the human touch and the robot needs to learn that touch if it's going to do the same thing. In some respects, this is a mechanical problem, but in most others, it's a matter of how to teach the robot. One way is to analyze the job and directly program the machine. Another is to use motion capture gloves or hand trackers to record the needed motions, and a third way is to guide the robot directly like a teacher showing a pupil how to slice some meat.

The Maryland and NICTA team are working on a more direct approach that allows the robot to learn for itself. In this case, by looking at videos of cooking instructions taken directly off the internet. The trick is to find a way for robots to learn how to study human actions, then turn them into commands that are within the ability of the machine to duplicate.

The team says that dealing with raw video isn't easy. Unlike special videos made in a lab to support an experiment, those found on YouTube and other services are unpredictable, with all sorts of scenery, backgrounds, lighting, and other complexities to sort out. This requires some sophisticated image recognition as well as techniques that allow the robot to break down the observed actions to an abstract "atomic" level. This is done by using a pair of Convolutional Neural Network (CNN) based recognition modules.

Steps in how the robot learns from videos

The key to the CNN is the artificial neuron, which is a mathematical function that imitates living neurons. These artificial neurons are hooked together to form an artificial neural network. For the cooking robot, these networks act like the human visual system, using overlapping layers of neural connections to study images. This overlapping provides very high resolution and the data from the image is very resistant to distortion as it's translated from one form to another.

In the case of the Maryland/NICTA system, there are two CNN visual recognition modules. One of these looks at the hands of the cook in the video and works out what sort of a grasp each one is doing. Meanwhile, the other determines how the hand and the object it's holding are moving, and by breaking down the movements, analyzing them and deducing how the robot can use the moves to complete its own tasks.

The robot looks for one of six basic types of grasps and studies how they are used and change through time in a video sequence. It can then decide which manipulator, if it has several, to choose to replicate the grasp, such as a vacuum gripper to hold something firmly, or a parallel gripper for more precision. In addition, it identifies the object being grasped, such as a knife, an apple, a bowl, a salmon, or a pot of yoghurt, among others.

The next step is to determine which one of ten common cooking scenarios, such as cutting, pouring, spreading, chopping, or peeling, is being carried out. That done, the system then identifies a much larger group of actions that make up the scenario, breaks them down, then determines how to duplicate them in a useful sequence called a "sentence." In this way, it can go from the chaos of "How to debone a ham?" to turning it into useful actions that the robot can perform.

The researchers say that in future they will work on refining the classifications and look at how to use the identified grasps to predict actions for a more detailed analysis as they work out the "grammar" of actions.

The results of the team's work will be presented at the 29th annual conference of the Association for the Advancement of Artificial Intelligence.

Source: University of Maryland (PDF) via RT

自动翻译仅供参考

机器人学习照着“YouTube”视频做饭

大学正在开发一种让机器人学会从视频中做饭做菜

烹饪他们说，既是一门艺术，一门科学，所以这是毫不奇怪的机器人有困难的时候在厨房里。也许有一天，机器人厨师将司空见惯搅拌机，但他们仍然需要学习他们的工作。为了帮助他们，科学家在马里兰和NICTA，澳大利亚的大学正在研究如何对机器人学习如何通过观看YouTube视频做饭。

烹饪是日常任务，但像走路，说话，和其他许多世俗的东西，我们不理解是多么的困难。举个例子来说，简单的厨师刀。这是一个简单的8三角钢的长度，但在一个熟练厨师的手，就可以从食品加工到压蒜器厨房取代几乎所有的花哨的小工具。

这是因为人类非常善于操纵对象。人类的手是惊人的多才多艺和人类是无限的发明。关键是要找到一个办法让一个机器人甚至远程做些什么人可以做切碎的洋葱或搅打鸡蛋的时候 - 更不用提一些更复杂的

传统的方式工程师处理的问题是简化。任务。也就是说，分解任务，并重新设计它，因此它可以用爪子或钳子，或创建专门的机械手，可以做一个任务非常好，没有别的，或者像Versaball普遍抓取压倒的问题来进行。

握确定机器人

不过这只是到目前为止。有迹象表明，仍然需要人情味许多任务和机器人需要学习的联系，如果它会做同样的事情。在某些方面，这是一个机械的问题，但在大多数其他人，这是一个如何教机器人的问题。一种方法是分析作业和直接编程的机器。另一种是用动作捕捉手套或手跟踪器来记录所需要的运动，而第三种方式是直接引导机器人就像一个老师表示学生如何切些肉。

马里兰和NICTA的团队正在研究一种更直接的方法，其允许机器人学会自明。在这种情况下，通过查看直接带下互联网烹饪指令视频。关键是要找到一种方法机器人来学习如何来研究人类的行为，然后把它们变成是机器复制的能力范围内的命令。

该小组说，处理原始视频是不容易的。不像做在实验室，以支持一个实验专用的视频，那些在YouTube和其他服务中是不可预知的，有各种各样的风景，背景，灯光，以及其他复杂理清。这需要一些复杂的图像识别以及技术，允许机器人分解所观察的行动，以抽象的“原子”的水平。这是通过使用一对卷积神经网络（CNN）基于识别模块中的机器人从视频如何学习完成。

步骤

的关键，CNN是人工神经元，这是一个数学函数模仿生活神经元。这些人工神经元都连接在一起，形成一个人工神经网络。用于烹饪机器人，这些网络像人的视觉系统，使用神经连接重叠层学习图像。这个重叠提供了非常高的分辨率和从图像数据要失真非常耐因为它是从一种形式转换到另一个。

在马里兰

NICTA系统的情况下，有两个CNN的视觉识别模块。其中一个看起来在视频的厨师之手的作品和把握每一个都做了什么样的。同时，其他确定如何手和它保持该物体在移动，以及通过分解动作，对其进行分析和推断如何机器人可以使用移动来完成它自己的任务。

机器人查找一六基本的它们是如何使用的，并通过时间改变的视频序列类型抓手和研究。然后，它可以决定哪个操纵器，如果它有几个，选择要复制的把握，如真空夹持器牢固地保持的东西，或平行夹钳，用于更精确。此外，它标识对象在抓握，例如刀，苹果，碗，鲑鱼，或酸奶的一盆，等等。

下一步是确定一个10的哪个通用烹饪场景，例如切割，浇注，铺展，劈柴，或剥落，正在紧张进行。该完成的，该系统随后识别一个更大的基团组成的场景的操作，断下来，然后确定如何复制它们称为一个有用序列“句子”。通过这种方式，它可以从混乱“如何去骨火腿？”以将其转化为有用的动作，机器人可以执行。

研究人员说，在未来，他们将工作细化分类，看看如何使用标识的抓手来预测行为进行更详细的分析，他们制定出“语法行动“

团队的工作成果将发表在协进人工智能的第29届年度会议

来源：通过RT马里兰（PDF）大学

Copyright © 2000-2026 中电网版权所有	京ICP备19016262号-2
增值电信业务经营许可证粤B2-20050142	京公网安备 11010802037127号

Tel: 010-62985649, 0755-33322333	Fax: 0755-33322099

机器人学习照着“YouTube”视频做饭

机器人学习照着“YouTube”视频做饭

我来说两句

猜你喜欢