100:00:06,480-->00:00:08,400
Good morning. We have a banger for you
200:00:08,400-->00:00:09,840
today. We're going to launch chatbt
300:00:09,840-->00:00:11,519
agent. But before jumping into that, I'd
400:00:11,519-->00:00:12,559
like to ask the team to introduce
500:00:12,559-->00:00:14,080
themselves. Starting with Yosh.600:00:14,080-->00:00:17,840
Hi, I'm Yash. I work on agent team and
700:00:17,840-->00:00:20,080
before that I used to work on operator.800:00:20,080-->00:00:22,560
Hi, I'm Jing. I work on agents research
900:00:22,560-->00:00:24,400
previously on deep research.1000:00:24,400-->00:00:26,000
Hi, I'm Casey. I'm a researcher on
1100:00:26,000-->00:00:27,920
agents formerly operator.1200:00:27,920-->00:00:30,560
Hi, I'm Issa. I'm a researcher on agent
1300:00:30,560-->00:00:32,640
formerly on deep research.1400:00:32,640-->00:00:34,880
So we we started launching agents
1500:00:34,880-->00:00:36,800
earlier this year. Uh we launched deep
1600:00:36,800-->00:00:38,879
research, we launched operator and
1700:00:38,879-->00:00:40,160
people were very excited about this.1800:00:40,160-->00:00:42,480
People could see that now uh AI was
1900:00:42,480-->00:00:44,640
going off to do complex tasks for them.2000:00:44,640-->00:00:46,079
But it became clear to us that what
2100:00:46,079-->00:00:48,000
people really wanted was for us to bring
2200:00:48,000-->00:00:49,760
those capabilities and more together.2300:00:49,760-->00:00:51,920
People wanted a unified agent that could
2400:00:51,920-->00:00:55,039
go off, use its own computer and do real
2500:00:55,039-->00:00:57,360
complex tasks for them, that could uh
2600:00:57,360-->00:00:59,359
seamlessly transition from thinking
2700:00:59,359-->00:01:01,520
about something to taking actions to
2800:01:01,520-->00:01:03,359
using lots of tools using the terminal,2900:01:03,359-->00:01:05,360
clicking around the web, even producing
3000:01:05,360-->00:01:06,880
things like spreadsheets and slides and
3100:01:06,880-->00:01:08,960
and much more. And wanted people want to
3200:01:08,960-->00:01:10,159
be able to do this over a long time
3300:01:10,159-->00:01:12,159
horizon and a sort of for universal
3400:01:12,159-->00:01:13,840
tasks. So the team has been working
3500:01:13,840-->00:01:16,400
super hard to bring that together. And
3600:01:16,400-->00:01:18,080
today we have chat with the agent. Um,3700:01:18,080-->00:01:19,680
it's probably easier to show it to you
3800:01:19,680-->00:01:21,439
than to keep talking about it. It is one
3900:01:21,439-->00:01:23,360
of the feel the aon moments for me to
4000:01:23,360-->00:01:25,280
watch it work. So, let's take a look.4100:01:25,280-->00:01:27,840
Awesome. Thanks, Sam. Hello, everyone.4200:01:27,840-->00:01:29,920
Very excited to share chat GBD agent
4300:01:29,920-->00:01:31,600
with everybody. And as Sam said, let's
4400:01:31,600-->00:01:33,759
just dive right into the demo. Okay, so
4500:01:33,759-->00:01:36,159
we are on Chad GBD as we all know and
4600:01:36,159-->00:01:39,119
love. And to turn on the agent mode, you
4700:01:39,119-->00:01:40,880
just click the tools menu and select
4800:01:40,880-->00:01:43,280
agent. You can also just type agent in
4900:01:43,280-->00:01:45,040
the composer bar and it'll take you to
5000:01:45,040-->00:01:47,520
agent mode. Um, Edward and I have a
5100:01:47,520-->00:01:49,360
wedding to go to later this year. Uh,5200:01:49,360-->00:01:51,119
it's for one of our mutual friends.5300:01:51,119-->00:01:52,560
Should we should we have the Asian
5400:01:52,560-->00:01:53,280
planet?5500:01:53,280-->00:01:55,680
Yeah, let's do it. I need an outfit. And
5600:01:55,680-->00:01:56,799
don't forget the gift.5700:01:56,799-->00:01:58,719
Okay, great. We won't forget the gift.5800:01:58,719-->00:02:00,240
Um, it's a little bit of a longer
5900:02:00,240-->00:02:01,680
prompt, so I have it copied in my
6000:02:01,680-->00:02:02,799
buffer, so I'm just going to go ahead
6100:02:02,799-->00:02:05,759
and paste it. Um, okay. So, let's see.6200:02:05,759-->00:02:07,360
Let's see what it says. Our friends are
6300:02:07,360-->00:02:08,640
getting married later this year, as I
6400:02:08,640-->00:02:10,720
said, Minia and Sarah. And we want the
6500:02:10,720-->00:02:12,879
agent to help us find an outfit that
6600:02:12,879-->00:02:15,520
matches the dress code. uh propose a few
6700:02:15,520-->00:02:17,840
options. Nice mid luxury taking into
6800:02:17,840-->00:02:21,040
account venue and weather. We also want
6900:02:21,040-->00:02:23,280
to find us some hotels and as Edward
7000:02:23,280-->00:02:25,760
said, don't forget the gift. Um so let's
7100:02:25,760-->00:02:27,840
see and
7200:02:27,840-->00:02:30,319
send the prompt away. As Sam said, agent
7300:02:30,319-->00:02:32,640
uses a computer. Uh so in the beginning
7400:02:32,640-->00:02:34,959
it sets up its environment. It it you
7500:02:34,959-->00:02:38,000
know it'll take a minute or two or not
7600:02:38,000-->00:02:39,680
really 5 seconds to set up its
7700:02:39,680-->00:02:41,440
environment. And in this case, as you
7800:02:41,440-->00:02:43,840
see, it understands the prompt. It's
7900:02:43,840-->00:02:46,319
asking for me for a clarification. I'm
8000:02:46,319-->00:02:48,000
just going to let it just continue and
8100:02:48,000-->00:02:51,120
work. Anyway, um I think it got confused
8200:02:51,120-->00:02:54,239
by saying,"Oh, where's the um what
8300:02:54,239-->00:02:55,680
exactly is the time of the date of the
8400:02:55,680-->00:02:57,200
wedding?" I think it'll figure out using
8500:02:57,200-->00:02:59,840
the website. Okay, cool. So, now it's
8600:02:59,840-->00:03:01,760
kicked off. It's starting the process,8700:03:01,760-->00:03:03,920
the prompt, and it's open up a browser.8800:03:03,920-->00:03:04,959
And to walk you through what's
8900:03:04,959-->00:03:06,800
happening, here's
9000:03:06,800-->00:03:09,040
Yeah. So, as mentioned, we gave the
9100:03:09,040-->00:03:10,879
agent access to its own virtual
9200:03:10,879-->00:03:13,280
computer, and the computer has many
9300:03:13,280-->00:03:14,720
different tools installed, and it can
9400:03:14,720-->00:03:16,239
choose which to use as it's working
9500:03:16,239-->00:03:18,640
through the task. So, in chat GPT, you
9600:03:18,640-->00:03:21,360
can see a visualization of the agent's
9700:03:21,360-->00:03:23,680
computer screen, and you can see
9800:03:23,680-->00:03:25,519
overlaid its chain of thought in text,9900:03:25,519-->00:03:27,200
and that's what it's thinking as it's
10000:03:27,200-->00:03:28,480
working through the task and deciding
10100:03:28,480-->00:03:30,799
what to do next. We gave the agent
10200:03:30,799-->00:03:32,400
access to two different ways to browse
10300:03:32,400-->00:03:34,560
the internet. First, we gave it a text
10400:03:34,560-->00:03:36,159
browser, and this is similar to the deep
10500:03:36,159-->00:03:38,000
research tool. And this is what lets it
10600:03:38,000-->00:03:40,159
really efficiently and quickly read many
10700:03:40,159-->00:03:43,440
web pages um um and search for them. And
10800:03:43,440-->00:03:45,040
we also gave it access to a visual
10900:03:45,040-->00:03:46,319
browser. And this is similar to the
11000:03:46,319-->00:03:48,239
operator tool. And this is what lets it
11100:03:48,239-->00:03:50,159
actually interact with the UI of a web
11200:03:50,159-->00:03:52,720
page. So it can um drag things. It can
11300:03:52,720-->00:03:54,879
use the cursor to click around. It can
11400:03:54,879-->00:03:57,280
open UI components. It can fill out
11500:03:57,280-->00:03:59,920
forms and enter text and text areas.11600:03:59,920-->00:04:02,560
It's very flexible. So those two tools
11700:04:02,560-->00:04:04,720
are very complimentary. And then we also
11800:04:04,720-->00:04:06,720
gave it access to its own terminal so
11900:04:06,720-->00:04:08,720
that it can run code and it can also
12000:04:08,720-->00:04:10,640
generate and analyze files like slide
12100:04:10,640-->00:04:12,879
decks and spreadsheets. And then through
12200:04:12,879-->00:04:14,560
the terminal it's also able to call
12300:04:14,560-->00:04:17,840
APIs. So both public APIs and APIs to
12400:04:17,840-->00:04:19,840
access your private data sources like
12500:04:19,840-->00:04:22,479
Google Drive, Google Calendar, GitHub,12600:04:22,479-->00:04:25,360
SharePoint and many others um and only
12700:04:25,360-->00:04:26,960if you explicitly connect them similar
12800:04:26,960-->00:04:28,960
to deep research connectors. And then it
12900:04:28,960-->00:04:31,680
also has access to the image gen API so
13000:04:31,680-->00:04:34,240
it can create nice visuals for um slide
13100:04:34,240-->00:04:36,080
decks and other things as it's working
13200:04:36,080-->00:04:38,240
through its tasks.13300:04:38,240-->00:04:40,800
How is deciding which tools to use here?13400:04:40,800-->00:04:42,560
Yes, we train the model to move between
13500:04:42,560-->00:04:44,160
these capabilities with reinforcement
13600:04:44,160-->00:04:46,080
learning. This is the first model we
13700:04:46,080-->00:04:48,880
trained that has access to this unified
13800:04:48,880-->00:04:52,000
tool box. A text browser, a GUI browser
13900:04:52,000-->00:04:53,840
and a terminal all in one virtual
14000:04:53,840-->00:04:57,120
machine. To guide its learning, we
14100:04:57,120-->00:04:59,360
created hard tasks that require using
14200:04:59,360-->00:05:01,919
all these tools. This allows the model
14300:05:01,919-->00:05:04,000
not only to learn how to use these
14400:05:04,000-->00:05:06,160
tools, but also when to use which tool
14500:05:06,160-->00:05:08,400
depending on the task at hand. At the
14600:05:08,400-->00:05:10,400
beginning of the training, the model
14700:05:10,400-->00:05:12,880
might attempt to use all these tools to
14800:05:12,880-->00:05:15,600
solve a relatively simple problem. Over
14900:05:15,600-->00:05:17,840
time, as we reward the model for solving
15000:05:17,840-->00:05:20,560
problems correctly and efficiently, the
15100:05:20,560-->00:05:24,080
model will have smarter tool choice.15200:05:24,080-->00:05:27,360
For example,if you ask a model to uh
15300:05:27,360-->00:05:29,039
find a restaurant with specific
15400:05:29,039-->00:05:31,919
requirements and make a reservation, the
15500:05:31,919-->00:05:34,479
model may typically just start a deep
15600:05:34,479-->00:05:36,160
research in the text browser to find
15700:05:36,160-->00:05:39,039
some candidates, then switch to the GUI
15800:05:39,039-->00:05:42,160
browser to view photos of food, uh check
15900:05:42,160-->00:05:45,600
availability, and complete the booking.16000:05:45,600-->00:05:48,000
Similarly,for creative task like
16100:05:48,000-->00:05:50,160
creating an artifact, the model will
16200:05:50,160-->00:05:51,680
first search online for public
16300:05:51,680-->00:05:54,479
resources, then switch to the terminal
16400:05:54,479-->00:05:57,039
to do some code editing to compile the
16500:05:57,039-->00:05:59,919
artifact and finally verify the final
16600:05:59,919-->00:06:02,960
outputs in the GUI browser. With this,16700:06:02,960-->00:06:05,600
we truly feel like we brought together
16800:06:05,600-->00:06:08,240
the best of deep research and operator
16900:06:08,240-->00:06:11,759
and added some extra sparkle.17000:06:11,759-->00:06:14,000
That's right. Yeah. So to put this
17100:06:14,000-->00:06:15,520
project in context, I want to give a bit
17200:06:15,520-->00:06:18,000
of history. So a few months ago, we
17300:06:18,000-->00:06:20,960
shipped operator in January and this was
17400:06:20,960-->00:06:23,120
our agent that lets you do online tasks
17500:06:23,120-->00:06:25,759
like book reservations and um send
17600:06:25,759-->00:06:27,840
emails and then two weeks later we
17700:06:27,840-->00:06:29,919
shipped deep research and deep research
17800:06:29,919-->00:06:31,919
is a tool that lets you do in-depth
17900:06:31,919-->00:06:35,759
internet research and output highquality
18000:06:35,759-->00:06:39,280
um um research reports. And after launch
18100:06:39,280-->00:06:41,039
we realized that actually these two
18200:06:41,039-->00:06:42,319
approaches are actually deeply
18300:06:42,319-->00:06:44,160
complimentary.18400:06:44,160-->00:06:46,400
Um for example operator has some trouble
18500:06:46,400-->00:06:48,720
reading super long articles. Um it has
18600:06:48,720-->00:06:50,400
to scroll. It takes a long time. But
18700:06:50,400-->00:06:51,759
that's something that deep research is
18800:06:51,759-->00:06:56,240
good at. Conversely operator uh uh deep
18900:06:56,240-->00:06:58,240
research isn't as good at interacting
19000:06:58,240-->00:07:00,319
with web pages interactive elements
19100:07:00,319-->00:07:03,199
visual uh highly visual web pages but
19200:07:03,199-->00:07:04,800
that's something that operator excels
19300:07:04,800-->00:07:08,639
at. So uh yeah we felt these approaches
19400:07:08,639-->00:07:11,120
were complimentary and then we we were
19500:07:11,120-->00:07:13,120
also looking at some customer feedback.19600:07:13,120-->00:07:14,880
So for example one of our most highly
19700:07:14,880-->00:07:17,120
requested features for deep research was
19800:07:17,120-->00:07:18,960
the ability to log into websites and
19900:07:18,960-->00:07:20,960
access authenticated sources. That's
20000:07:20,960-->00:07:22,880
something that operator can do.20100:07:22,880-->00:07:24,000
I've been waiting for that for a long20200:07:24,000-->00:07:24,560
time.20300:07:24,560-->00:07:26,160
Yeah.20400:07:26,160-->00:07:28,479
Um another thing is that we were looking
20500:07:28,479-->00:07:29,840
at the prompts that people were trying
20600:07:29,840-->00:07:31,520for operator and we saw that they were
20700:07:31,520-->00:07:32,880
actually more deep research type
20800:07:32,880-->00:07:35,199
prompts.for example, plan a trip and
20900:07:35,199-->00:07:38,240
then book it. And so, yeah, we we really
21000:07:38,240-->00:07:39,360
feel like we're bringing the best of
21100:07:39,360-->00:07:41,440
both worlds here. And on a personal
21200:07:41,440-->00:07:42,800
note, we've all been friends for a
21300:07:42,800-->00:07:44,160while, and it's really exciting to be
21400:07:44,160-->00:07:46,479
working together. So, speaking of
21500:07:46,479-->00:07:48,960
matches made in heaven, how is the
21600:07:48,960-->00:07:50,319
wedding planning going?21700:07:50,319-->00:07:51,759
It's amazing to watch. This is an
21800:07:51,759-->00:07:53,599
example of a task I hate doing. This can
21900:07:53,599-->00:07:55,520
like ruin like, you know, multiple hours
22000:07:55,520-->00:07:56,960for me as I get sucked into these rabbit
22100:07:56,960-->00:07:58,160
holes. So, just watching this as you
22200:07:58,160-->00:07:59,520
guys have been talking click through
22300:07:59,520-->00:08:01,199
this and just like do the whole thing is
22400:08:01,199-->00:08:03,360
really quite remarkable. Yeah, totally.22500:08:03,360-->00:08:06,560
Um, looks like it started off by
22600:08:06,560-->00:08:08,560
figuring out the weather. One of the
22700:08:08,560-->00:08:11,280
cool features, um, is that, you know, as
22800:08:11,280-->00:08:12,560
some of these tasks may take a little
22900:08:12,560-->00:08:14,160
bit longer, you can just go back and see
23000:08:14,160-->00:08:15,759
what it was doing. So, that's what we're
23100:08:15,759-->00:08:17,199
exactly going to do. Looks like it went
23200:08:17,199-->00:08:18,720
through the website to use the text
23300:08:18,720-->00:08:21,039
browser. Interestingly,for that, now
23400:08:21,039-->00:08:22,400
it's looking through the suits for23500:08:22,400-->00:08:23,919
Edward. I think it'll find something
23600:08:23,919-->00:08:25,360
good. Here you can see it switched over
23700:08:25,360-->00:08:27,199
to actually a visual browser to make
23800:08:27,199-->00:08:28,960
sure suit will look really good on
23900:08:28,960-->00:08:31,280
Edward.24000:08:31,280-->00:08:34,560
And now looks like yeah, it's got
24100:08:34,560-->00:08:36,880
chugging along, figuring out what to do.24200:08:36,880-->00:08:39,599
Um, and still on suits and now probably
24300:08:39,599-->00:08:41,919
getting to the gifts section. Um, okay,24400:08:41,919-->00:08:43,279
cool. So, this is going to take a while.24500:08:43,279-->00:08:44,959
As Sam said, these tasks sometimes can
24600:08:44,959-->00:08:46,160
take a long time. So, it's going to
24700:08:46,160-->00:08:47,680continue doing hopefully much faster
24800:08:47,680-->00:08:49,760
than we will do. Um, should we do24900:08:49,760-->00:08:51,600
something elsewhile it's doing it? I
25000:08:51,600-->00:08:53,519
think the team really wanted the um
25100:08:53,519-->00:08:55,279
stickers, some stickers for the for the
25200:08:55,279-->00:08:56,480
launch. Should we do that?25300:08:56,480-->00:08:57,279
Yeah, cool.25400:08:57,279-->00:08:59,040
All right. So, we have a team mascot,25500:08:59,040-->00:09:00,320
which is one of our colleagues, Bunny
25600