如何高效合并音视频文件（时间短消耗资源少）（一）

原创已于 2025-07-26 12:12:50 修改 · 490 阅读

CC 4.0 BY-SA版权

文章标签：

于 2025-07-23 10:21:30 首次发布

经常从资源网址下载音视频分离的文件，例如audio_file1.m4a和video_1.mp4，之后需要把这两个文件合并在一起。于是条件反射得想要利用剪映等第三方工具，进行音视频的封装。可惜不幸的是，这些软件虽然最后可能达到目的，但是要么是要收费，要么是消耗的计算资源太多（时间长、内存占用高，合并文件大）。因此，为了解决这个问题，写下来这篇博客。

预期成果：合并图所示的音视频文件

在这里插入图片描述

使用工具：小丸子音视频.exe

S1 打开软件：

图1 打开软件点击封装

S2 打开封装选项卡:

图2 添加下载好的音视频字幕文件点击封装

S3合并成功:

图3 合并过程

原始字幕xml格式

<?xml version="1.0" encoding="utf-8" ?><transcript><text start="6.48" dur="3.36">Good morning. We have a banger for you</text><text start="8.4" dur="3.119">today. We&amp;#39;re going to launch chatbt</text><text start="9.84" dur="2.719">agent. But before jumping into that, I&amp;#39;d</text><text start="11.519" dur="2.561">like to ask the team to introduce</text><text start="12.559" dur="5.281">themselves. Starting with Yosh.</text><text start="14.08" dur="6">Hi, I&amp;#39;m Yash. I work on agent team and</text><text start="17.84" dur="4.72">before that I used to work on operator.</text><text start="20.08" dur="4.32">Hi, I&amp;#39;m Jing. I work on agents research</text><text start="22.56" dur="3.44">previously on deep research.</text><text start="24.4" dur="3.52">Hi, I&amp;#39;m Casey. I&amp;#39;m a researcher on</text><text start="26" dur="4.56">agents formerly operator.</text><text start="27.92" dur="4.72">Hi, I&amp;#39;m Issa. I&amp;#39;m a researcher on agent</text><text start="30.56" dur="4.32">formerly on deep research.</text><text start="32.64" dur="4.16">So we we started launching agents</text><text start="34.88" dur="3.999">earlier this year. Uh we launched deep</text><text start="36.8" dur="3.36">research, we launched operator and</text><text start="38.879" dur="3.601">people were very excited about this.</text><text start="40.16" dur="4.48">People could see that now uh AI was</text><text start="42.48" dur="3.599">going off to do complex tasks for them.</text><text start="44.64" dur="3.36">But it became clear to us that what</text><text start="46.079" dur="3.681">people really wanted was for us to bring</text><text start="48" dur="3.92">those capabilities and more together.</text><text start="49.76" dur="5.279">People wanted a unified agent that could</text><text start="51.92" dur="5.44">go off, use its own computer and do real</text><text start="55.039" dur="4.32">complex tasks for them, that could uh</text><text start="57.36" dur="4.16">seamlessly transition from thinking</text><text start="59.359" dur="4">about something to taking actions to</text><text start="61.52" dur="3.84">using lots of tools using the terminal,</text><text start="63.359" dur="3.521">clicking around the web, even producing</text><text start="65.36" dur="3.6">things like spreadsheets and slides and</text><text start="66.88" dur="3.279">and much more. And wanted people want to</text><text start="68.96" dur="3.199">be able to do this over a long time</text><text start="70.159" dur="3.681">horizon and a sort of for universal</text><text start="72.159" dur="4.241">tasks. So the team has been working</text><text start="73.84" dur="4.24">super hard to bring that together. And</text><text start="76.4" dur="3.28">today we have chat with the agent. Um,</text><text start="78.08" dur="3.359">it&amp;#39;s probably easier to show it to you</text><text start="79.68" dur="3.68">than to keep talking about it. It is one</text><text start="81.439" dur="3.841">of the feel the aon moments for me to</text><text start="83.36" dur="4.48">watch it work. So, let&amp;#39;s take a look.</text><text start="85.28" dur="4.64">Awesome. Thanks, Sam. Hello, everyone.</text><text start="87.84" dur="3.76">Very excited to share chat GBD agent</text><text start="89.92" dur="3.839">with everybody. And as Sam said, let&amp;#39;s</text><text start="91.6" dur="4.559">just dive right into the demo. Okay, so</text><text start="93.759" dur="5.36">we are on Chad GBD as we all know and</text><text start="96.159" dur="4.721">love. And to turn on the agent mode, you</text><text start="99.119" dur="4.161">just click the tools menu and select</text><text start="100.88" dur="4.16">agent. You can also just type agent in</text><text start="103.28" dur="4.24">the composer bar and it&amp;#39;ll take you to</text><text start="105.04" dur="4.32">agent mode. Um, Edward and I have a</text><text start="107.52" dur="3.599">wedding to go to later this year. Uh,</text><text start="109.36" dur="3.2">it&amp;#39;s for one of our mutual friends.</text><text start="111.119" dur="2.161">Should we should we have the Asian</text><text start="112.56" dur="3.12">planet?</text><text start="113.28" dur="3.519">Yeah, let&amp;#39;s do it. I need an outfit. And</text><text start="115.68" dur="3.039">don&amp;#39;t forget the gift.</text><text start="116.799" dur="3.441">Okay, great. We won&amp;#39;t forget the gift.</text><text start="118.719" dur="2.961">Um, it&amp;#39;s a little bit of a longer</text><text start="120.24" dur="2.559">prompt, so I have it copied in my</text><text start="121.68" dur="4.079">buffer, so I&amp;#39;m just going to go ahead</text><text start="122.799" dur="4.561">and paste it. Um, okay. So, let&amp;#39;s see.</text><text start="125.759" dur="2.881">Let&amp;#39;s see what it says. Our friends are</text><text start="127.36" dur="3.36">getting married later this year, as I</text><text start="128.64" dur="4.239">said, Minia and Sarah. And we want the</text><text start="130.72" dur="4.8">agent to help us find an outfit that</text><text start="132.879" dur="4.961">matches the dress code. uh propose a few</text><text start="135.52" dur="5.52">options. Nice mid luxury taking into</text><text start="137.84" dur="5.44">account venue and weather. We also want</text><text start="141.04" dur="4.72">to find us some hotels and as Edward</text><text start="143.28" dur="4.56">said, don&amp;#39;t forget the gift. Um so let&amp;#39;s</text><text start="145.76" dur="4.559">see and</text><text start="147.84" dur="4.8">send the prompt away. As Sam said, agent</text><text start="150.319" dur="4.64">uses a computer. Uh so in the beginning</text><text start="152.64" dur="5.36">it sets up its environment. It it you</text><text start="154.959" dur="4.721">know it&amp;#39;ll take a minute or two or not</text><text start="158" dur="3.44">really 5 seconds to set up its</text><text start="159.68" dur="4.16">environment. And in this case, as you</text><text start="161.44" dur="4.879">see, it understands the prompt. It&amp;#39;s</text><text start="163.84" dur="4.16">asking for me for a clarification. I&amp;#39;m</text><text start="166.319" dur="4.801">just going to let it just continue and</text><text start="168" dur="6.239">work. Anyway, um I think it got confused</text><text start="171.12" dur="4.56">by saying, &amp;quot;Oh, where&amp;#39;s the um what</text><text start="174.239" dur="2.961">exactly is the time of the date of the</text><text start="175.68" dur="4.16">wedding?&amp;quot; I think it&amp;#39;ll figure out using</text><text start="177.2" dur="4.56">the website. Okay, cool. So, now it&amp;#39;s</text><text start="179.84" dur="4.08">kicked off. It&amp;#39;s starting the process,</text><text start="181.76" dur="3.199">the prompt, and it&amp;#39;s open up a browser.</text><text start="183.92" dur="2.88">And to walk you through what&amp;#39;s</text><text start="184.959" dur="4.081">happening, here&amp;#39;s</text><text start="186.8" dur="4.079">Yeah. So, as mentioned, we gave the</text><text start="189.04" dur="4.24">agent access to its own virtual</text><text start="190.879" dur="3.841">computer, and the computer has many</text><text start="193.28" dur="2.959">different tools installed, and it can</text><text start="194.72" dur="3.92">choose which to use as it&amp;#39;s working</text><text start="196.239" dur="5.121">through the task. So, in chat GPT, you</text><text start="198.64" dur="5.04">can see a visualization of the agent&amp;#39;s</text><text start="201.36" dur="4.159">computer screen, and you can see</text><text start="203.68" dur="3.52">overlaid its chain of thought in text,</text><text start="205.519" dur="2.961">and that&amp;#39;s what it&amp;#39;s thinking as it&amp;#39;s</text><text start="207.2" dur="3.599">working through the task and deciding</text><text start="208.48" dur="3.92">what to do next. We gave the agent</text><text start="210.799" dur="3.761">access to two different ways to browse</text><text start="212.4" dur="3.759">the internet. First, we gave it a text</text><text start="214.56" dur="3.44">browser, and this is similar to the deep</text><text start="216.159" dur="4">research tool. And this is what lets it</text><text start="218" dur="5.44">really efficiently and quickly read many</text><text start="220.159" dur="4.881">web pages um um and search for them. And</text><text start="223.44" dur="2.879">we also gave it access to a visual</text><text start="225.04" dur="3.199">browser. And this is similar to the</text><text start="226.319" dur="3.84">operator tool. And this is what lets it</text><text start="228.239" dur="4.481">actually interact with the UI of a web</text><text start="230.159" dur="4.72">page. So it can um drag things. It can</text><text start="232.72" dur="4.56">use the cursor to click around. It can</text><text start="234.879" dur="5.041">open UI components. It can fill out</text><text start="237.28" dur="5.28">forms and enter text and text areas.</text><text start="239.92" dur="4.8">It&amp;#39;s very flexible. So those two tools</text><text start="242.56" dur="4.16">are very complimentary. And then we also</text><text start="244.72" dur="4">gave it access to its own terminal so</text><text start="246.72" dur="3.92">that it can run code and it can also</text><text start="248.72" dur="4.159">generate and analyze files like slide</text><text start="250.64" dur="3.92">decks and spreadsheets. And then through</text><text start="252.879" dur="4.961">the terminal it&amp;#39;s also able to call</text><text start="254.56" dur="5.28">APIs. So both public APIs and APIs to</text><text start="257.84" dur="4.639">access your private data sources like</text><text start="259.84" dur="5.52">Google Drive, Google Calendar, GitHub,</text><text start="262.479" dur="4.481">SharePoint and many others um and only</text><text start="265.36" dur="3.6">if you explicitly connect them similar</text><text start="266.96" dur="4.72">to deep research connectors. And then it</text><text start="268.96" dur="5.28">also has access to the image gen API so</text><text start="271.68" dur="4.4">it can create nice visuals for um slide</text><text start="274.24" dur="4">decks and other things as it&amp;#39;s working</text><text start="276.08" dur="4.72">through its tasks.</text><text start="278.24" dur="4.32">How is deciding which tools to use here?</text><text start="280.8" dur="3.36">Yes, we train the model to move between</text><text start="282.56" dur="3.52">these capabilities with reinforcement</text><text start="284.16" dur="4.72">learning. This is the first model we</text><text start="286.08" dur="5.92">trained that has access to this unified</text><text start="288.88" dur="4.96">tool box. A text browser, a GUI browser</text><text start="292" dur="5.12">and a terminal all in one virtual</text><text start="293.84" dur="5.52">machine. To guide its learning, we</text><text start="297.12" dur="4.799">created hard tasks that require using</text><text start="299.36" dur="4.64">all these tools. This allows the model</text><text start="301.919" dur="4.241">not only to learn how to use these</text><text start="304" dur="4.4">tools, but also when to use which tool</text><text start="306.16" dur="4.24">depending on the task at hand. At the</text><text start="308.4" dur="4.48">beginning of the training, the model</text><text start="310.4" dur="5.2">might attempt to use all these tools to</text><text start="312.88" dur="4.96">solve a relatively simple problem. Over</text><text start="315.6" dur="4.96">time, as we reward the model for solving</text><text start="317.84" dur="6.24">problems correctly and efficiently, the</text><text start="320.56" dur="6.8">model will have smarter tool choice.</text><text start="324.08" dur="4.959">For example, if you ask a model to uh</text><text start="327.36" dur="4.559">find a restaurant with specific</text><text start="329.039" dur="5.44">requirements and make a reservation, the</text><text start="331.919" dur="4.241">model may typically just start a deep</text><text start="334.479" dur="4.56">research in the text browser to find</text><text start="336.16" dur="6">some candidates, then switch to the GUI</text><text start="339.039" dur="6.561">browser to view photos of food, uh check</text><text start="342.16" dur="5.84">availability, and complete the booking.</text><text start="345.6" dur="4.56">Similarly, for creative task like</text><text start="348" dur="3.68">creating an artifact, the model will</text><text start="350.16" dur="4.319">first search online for public</text><text start="351.68" dur="5.359">resources, then switch to the terminal</text><text start="354.479" dur="5.44">to do some code editing to compile the</text><text start="357.039" dur="5.921">artifact and finally verify the final</text><text start="359.919" dur="5.681">outputs in the GUI browser. With this,</text><text start="362.96" dur="5.28">we truly feel like we brought together</text><text start="365.6" dur="6.159">the best of deep research and operator</text><text start="368.24" dur="5.76">and added some extra sparkle.</text><text start="371.759" dur="3.761">That&amp;#39;s right. Yeah. So to put this</text><text start="374" dur="4">project in context, I want to give a bit</text><text start="375.52" dur="5.44">of history. So a few months ago, we</text><text start="378" dur="5.12">shipped operator in January and this was</text><text start="380.96" dur="4.799">our agent that lets you do online tasks</text><text start="383.12" dur="4.72">like book reservations and um send</text><text start="385.759" dur="4.16">emails and then two weeks later we</text><text start="387.84" dur="4.079">shipped deep research and deep research</text><text start="389.919" dur="5.84">is a tool that lets you do in-depth</text><text start="391.919" dur="7.361">internet research and output highquality</text><text start="395.759" dur="5.28">um um research reports. And after launch</text><text start="399.28" dur="3.039">we realized that actually these two</text><text start="401.039" dur="3.121">approaches are actually deeply</text><text start="402.319" dur="4.081">complimentary.</text><text start="404.16" dur="4.56">Um for example operator has some trouble</text><text start="406.4" dur="4">reading super long articles. Um it has</text><text start="408.72" dur="3.039">to scroll. It takes a long time. But</text><text start="410.4" dur="5.84">that&amp;#39;s something that deep research is</text><text start="411.759" dur="6.481">good at. Conversely operator uh uh deep</text><text start="416.24" dur="4.079">research isn&amp;#39;t as good at interacting</text><text start="418.24" dur="4.959">with web pages interactive elements</text><text start="420.319" dur="4.481">visual uh highly visual web pages but</text><text start="423.199" dur="5.44">that&amp;#39;s something that operator excels</text><text start="424.8" dur="6.32">at. So uh yeah we felt these approaches</text><text start="428.639" dur="4.481">were complimentary and then we we were</text><text start="431.12" dur="3.76">also looking at some customer feedback.</text><text start="433.12" dur="4">So for example one of our most highly</text><text start="434.88" dur="4.08">requested features for deep research was</text><text start="437.12" dur="3.84">the ability to log into websites and</text><text start="438.96" dur="3.92">access authenticated sources. That&amp;#39;s</text><text start="440.96" dur="3.04">something that operator can do.</text><text start="442.88" dur="1.68">I&amp;#39;ve been waiting for that for a long</text><text start="444" dur="2.16">time.</text><text start="444.56" dur="3.919">Yeah.</text><text start="446.16" dur="3.68">Um another thing is that we were looking</text><text start="448.479" dur="3.041">at the prompts that people were trying</text><text start="449.84" dur="3.04">for operator and we saw that they were</text><text start="451.52" dur="3.679">actually more deep research type</text><text start="452.88" dur="5.36">prompts. for example, plan a trip and</text><text start="455.199" dur="4.161">then book it. And so, yeah, we we really</text><text start="458.24" dur="3.2">feel like we&amp;#39;re bringing the best of</text><text start="459.36" dur="3.44">both worlds here. And on a personal</text><text start="461.44" dur="2.72">note, we&amp;#39;ve all been friends for a</text><text start="462.8" dur="3.679">while, and it&amp;#39;s really exciting to be</text><text start="464.16" dur="4.8">working together. So, speaking of</text><text start="466.479" dur="3.84">matches made in heaven, how is the</text><text start="468.96" dur="2.799">wedding planning going?</text><text start="470.319" dur="3.28">It&amp;#39;s amazing to watch. This is an</text><text start="471.759" dur="3.761">example of a task I hate doing. This can</text><text start="473.599" dur="3.361">like ruin like, you know, multiple hours</text><text start="475.52" dur="2.64">for me as I get sucked into these rabbit</text><text start="476.96" dur="2.56">holes. So, just watching this as you</text><text start="478.16" dur="3.039">guys have been talking click through</text><text start="479.52" dur="3.84">this and just like do the whole thing is</text><text start="481.199" dur="5.361">really quite remarkable. Yeah, totally.</text><text start="483.36" dur="5.2">Um, looks like it started off by</text><text start="486.56" dur="4.72">figuring out the weather. One of the</text><text start="488.56" dur="4">cool features, um, is that, you know, as</text><text start="491.28" dur="2.88">some of these tasks may take a little</text><text start="492.56" dur="3.199">bit longer, you can just go back and see</text><text start="494.16" dur="3.039">what it was doing. So, that&amp;#39;s what we&amp;#39;re</text><text start="495.759" dur="2.961">exactly going to do. Looks like it went</text><text start="497.199" dur="3.84">through the website to use the text</text><text start="498.72" dur="3.68">browser. Interestingly, for that, now</text><text start="501.039" dur="2.88">it&amp;#39;s looking through the suits for</text><text start="502.4" dur="2.96">Edward. I think it&amp;#39;ll find something</text><text start="503.919" dur="3.28">good. Here you can see it switched over</text><text start="505.36" dur="3.6">to actually a visual browser to make</text><text start="507.199" dur="4.081">sure suit will look really good on</text><text start="508.96" dur="5.6">Edward.</text><text start="511.28" dur="5.6">And now looks like yeah, it&amp;#39;s got</text><text start="514.56" dur="5.039">chugging along, figuring out what to do.</text><text start="516.88" dur="5.039">Um, and still on suits and now probably</text><text start="519.599" dur="3.68">getting to the gifts section. Um, okay,</text><text start="521.919" dur="3.04">cool. So, this is going to take a while.</text><text start="523.279" dur="2.881">As Sam said, these tasks sometimes can</text><text start="524.959" dur="2.721">take a long time. So, it&amp;#39;s going to</text><text start="526.16" dur="3.6">continue doing hopefully much faster</text><text start="527.68" dur="3.92">than we will do. Um, should we do</text><text start="529.76" dur="3.759">something else while it&amp;#39;s doing it? I</text><text start="531.6" dur="3.679">think the team really wanted the um</text><text start="533.519" dur="2.961">stickers, some stickers for the for the</text><text start="535.279" dur="2">launch. Should we do that?</text><text start="536.48" dur="2.56">Yeah, cool.</text><text start="537.279" dur="3.041">All right. So, we have a team mascot,</text><text start="539.04" dur="4.239">which is one of our colleagues, Bunny</text><text start="540.32" dur="5.76">Doodle. really really cute tell you. Um</text><text start="543.279" dur="5.201">and we&amp;#39;re going to try and bring um get</text><text start="546.08" dur="4.4">some laptop stickers for everybody. Uh</text><text start="548.48" dur="4.64">one of the favorite features for agent</text><text start="550.48" dur="4.56">is given that trajectories can take 15</text><text start="553.12" dur="4">minutes, 20 minutes, 30 minutes</text><text start="555.04" dur="4.08">depending on the complexity of the task.</text><text start="557.12" dur="3.44">Um a lot of times the you might need to</text><text start="559.12" dur="3.36">help the agent. Agent might need to ask</text><text start="560.56" dur="4.48">you clarifications, confirmations and</text><text start="562.48" dur="4.16">things like that. Um so I love to use it</text><text start="565.04" dur="3.12">on the go. So I&amp;#39;m going to use my mobile</text><text start="566.64" dur="3.6">phone to actually send the query this</text><text start="568.16" dur="4.72">time and then see how it goes.</text><text start="570.24" dur="5.279">Okay, so let&amp;#39;s see. Okay, so we are on</text><text start="572.88" dur="5.68">Chad Gibbdi. Uh I have already selected</text><text start="575.519" dur="5.041">the agent mode. I&amp;#39;ve also inputed our uh</text><text start="578.56" dur="4.48">cute mascot and I&amp;#39;m going to quickly</text><text start="580.56" dur="4.719">paste a query. So query says make some</text><text start="583.04" dur="4.88">swag for the team one by one laptop</text><text start="585.279" dur="7.68">stickers and order 500 of them. I&amp;#39;ll</text><text start="587.92" dur="7.359">also say I like sticker mule</text><text start="592.959" dur="4.241">which we have used in the past and send</text><text start="595.279" dur="4.801">it off.</text><text start="597.2" dur="4.88">Okay. So, just like it was doing on the</text><text start="600.08" dur="4">web, it&amp;#39;s going to take some time, think</text><text start="602.08" dur="5.04">about like what&amp;#39;s it doing, and it&amp;#39;ll</text><text start="604.08" dur="4.8">kick off kick off the query. And as it&amp;#39;s</text><text start="607.12" dur="4.08">going, it&amp;#39;ll take some time to kick it</text><text start="608.88" dur="3.6">off. Is it Oh, there we go. So, it&amp;#39;ll</text><text start="611.2" dur="3.52">start working on it. Looks like it&amp;#39;s</text><text start="612.48" dur="4.16">starting to create the anime art. It&amp;#39;ll</text><text start="614.72" dur="3.679">probably use image that Isa referred</text><text start="616.64" dur="3.6">earlier on to create hopefully an anime</text><text start="618.399" dur="3.361">art. We&amp;#39;ll see how it comes out. While</text><text start="620.24" dur="2.159">that&amp;#39;s going, anything else we want to</text><text start="621.76" dur="2.96">do?</text><text start="622.399" dur="3.921">Oh, yeah. I also need a pair of shoes</text><text start="624.72" dur="2.64">because my shoes got damaged.</text><text start="626.32" dur="2.24">How did they get damaged?</text><text start="627.36" dur="2.64">Uh, by the rain</text><text start="628.56" dur="2.24">in SF.</text><text start="630" dur="2.16">Yes.</text><text start="630.8" dur="3.44">Cool. All right. Uh, well, let&amp;#39;s get</text><text start="632.16" dur="8.16">Edward a pair of shoes as well. So, oh,</text><text start="634.24" dur="9.279">can you also find us um pair of men&amp;#39;s</text><text start="640.32" dur="3.92">dress black shoes in size</text><text start="643.519" dur="2.481">9.5?</text><text start="644.24" dur="3.68">9.5.</text><text start="646" dur="3.92">So, one of the key capabilities of the</text><text start="647.92" dur="4">model is being able to interrupt. I</text><text start="649.92" dur="3.84">think you know as trajectories take long</text><text start="651.92" dur="4.8">time or whatever time it&amp;#39;s really</text><text start="653.76" dur="5.36">important for us to for it to feel very</text><text start="656.72" dur="4.4">multi-turn so the users can interject</text><text start="659.12" dur="3.52">user can direct it user can give it more</text><text start="661.12" dur="3.2">guidance less guidance whatever we want</text><text start="662.64" dur="4.4">to do and that&amp;#39;s what we&amp;#39;re doing here</text><text start="664.32" dur="4.4">we essentially the the model was</text><text start="667.04" dur="3.2">chugging along figuring out all the</text><text start="668.72" dur="3.6">things that we had asked before and in</text><text start="670.24" dur="5.76">this case we essentially said hey can</text><text start="672.32" dur="5.84">you also uh get us a pair of men&amp;#39;s black</text><text start="676" dur="3.839">shoes and now it&amp;#39;s thinking and soon</text><text start="678.16" dur="3.84">enough hopefully it&amp;#39;ll take that into</text><text start="679.839" dur="3.761">account and keep going uh into its</text><text start="682" dur="3.12">trajectory. There we go. So, it said</text><text start="683.6" dur="3.28">acknowledge the interruption. It said,</text><text start="685.12" dur="4.48">&amp;quot;Okay, cool. I&amp;#39;ll also research men&amp;#39;s</text><text start="686.88" dur="4.8">black shoes in size 9.5.&amp;quot; Um, and then</text><text start="689.6" dur="3.52">it&amp;#39;ll probably get on its way. Um, but</text><text start="691.68" dur="2.56">maybe Issa can tell us a little bit more</text><text start="693.12" dur="3.2">about how that works.</text><text start="694.24" dur="3.839">Yeah, sure. So, as you can see, the</text><text start="696.32" dur="3.6">agent is very collaborative, and this</text><text start="698.079" dur="3.121">was really important to us when we were</text><text start="699.92" dur="2.96">training the model and building the</text><text start="701.2" dur="3.199">product. If you were asking another</text><text start="702.88" dur="2.639">person to do a task for you that would</text><text start="704.399" dur="2.56">take them a really long time to</text><text start="705.519" dur="3.281">complete, you&amp;#39;d probably give them some</text><text start="706.959" dur="3.681">instructions to start and then they</text><text start="708.8" dur="3.52">might ask you some clarifying questions</text><text start="710.64" dur="2.96">and then they&amp;#39;d start the task and maybe</text><text start="712.32" dur="3.12">realize, oh, they need more</text><text start="713.6" dur="3.28">clarification from you or they need your</text><text start="715.44" dur="3.12">permission to sign into something or do</text><text start="716.88" dur="3.36">something on your behalf and then you</text><text start="718.56" dur="4.08">might realize, oh, I forgot to mention</text><text start="720.24" dur="4">this thing or um what&amp;#39;s your status? How</text><text start="722.64" dur="3.12">are you doing? Can I help redirect you</text><text start="724.24" dur="3.52">if you&amp;#39;re getting along the wrong path</text><text start="725.76" dur="3.92">or something? And so similarly for these</text><text start="727.76" dur="3.759">really longrunning agentic tasks, it&amp;#39;s</text><text start="729.68" dur="3.92">very important that both the user and</text><text start="731.519" dur="4">the agent are able to initiate</text><text start="733.6" dur="3.6">communication with each other so that um</text><text start="735.519" dur="3.841">the agent is able to most effectively</text><text start="737.2" dur="3.36">help you with your tasks. And so this is</text><text start="739.36" dur="2.96">something that we actually trained into</text><text start="740.56" dur="3.6">the model. We trained it to be able to</text><text start="742.32" dur="3.92">ask clarifying questions, not every</text><text start="744.16" dur="4.64">single time like deep research. Um we</text><text start="746.24" dur="4.32">also asked it we also trained it to be</text><text start="748.8" dur="3.2">interruptible as Yash just showed. And</text><text start="750.56" dur="2.959">also sometimes it will ask you for</text><text start="752" dur="3.68">clarification and confirmation</text><text start="753.519" dur="4.56">mid-trajectory.</text><text start="755.68" dur="4.8">Yeah. And part of working with agent is</text><text start="758.079" dur="4">that well sometimes it&amp;#39;ll make mistakes.</text><text start="760.48" dur="3.599">And that&amp;#39;s why we felt it was important</text><text start="762.079" dur="3.841">to train the model to ask you for</text><text start="764.079" dur="5.2">confirmation at the last step of</text><text start="765.92" dur="5.599">important steps. Um so for example maybe</text><text start="769.279" dur="4.161">before it&amp;#39;s going to send the email um</text><text start="771.519" dur="3.201">it&amp;#39;ll ask you to take a look at the</text><text start="773.44" dur="2.639">draft and whether it makes sense and</text><text start="774.72" dur="4.48">whether there are any embarrassing</text><text start="776.079" dur="5.281">typos. Um, and if there are, then you</text><text start="779.2" dur="4.24">can either ask it to fix it or you can</text><text start="781.36" dur="4.719">directly take over the browser and jump</text><text start="783.44" dur="5.6">right into the um, agents environment</text><text start="786.079" dur="4.641">and correct it yourself. And that way it</text><text start="789.04" dur="4.64">feels collaborative and you can um,</text><text start="790.72" dur="4.4">really work with the agent.</text><text start="793.68" dur="3.599">Should we look at maybe one more demo?</text><text start="795.12" dur="4.48">We&amp;#39;ve got this uh, sort of fun tradition</text><text start="797.279" dur="3.841">in live streams of using uh, using our</text><text start="799.6" dur="3.44">newest models to sort of evaluate</text><text start="801.12" dur="3.12">themselves or do something kind of meta.</text><text start="803.04" dur="4.4">Anything like that we could do?</text><text start="804.24" dur="4.08">Yeah, let&amp;#39;s do it.</text><text start="807.44" dur="2">So um</text><text start="808.32" dur="2">I think people would love to know how</text><text start="809.44" dur="4.48">good the model is.</text><text start="810.32" dur="6.56">Yes. So this is a prompt we previously</text><text start="813.92" dur="5.039">gave the a agent yesterday. So basically</text><text start="816.88" dur="4.079">it asks the model to pull its own</text><text start="818.959" dur="4.481">evalution number from our Google job</text><text start="820.959" dur="4">connector and make some slides. So we</text><text start="823.44" dur="3.92">want to keep it simple like no</text><text start="824.959" dur="5.041">introduction no conclusion just present</text><text start="827.36" dur="4.8">the results with in the charts. As you</text><text start="830" dur="5.12">can see now the model is connecting to</text><text start="832.16" dur="5.44">the Google Drive API and uh then search</text><text start="835.12" dur="4.8">within API it right now it looks like</text><text start="837.6" dur="5.12">the first result is very relevant. So</text><text start="839.92" dur="5.039">it&amp;#39;s reading the first result.</text><text start="842.72" dur="5.2">Now it&amp;#39;s reading the first result uh in</text><text start="844.959" dur="7.841">details. Uh let&amp;#39;s accelerate this uh</text><text start="847.92" dur="7.359">replay. So then the model might read</text><text start="852.8" dur="4.159">from the result again and write some</text><text start="855.279" dur="4.24">code.</text><text start="856.959" dur="4.961">So here you can see that the model is</text><text start="859.519" dur="4.961">using the image generation model called</text><text start="861.92" dur="6.159">image generation tool to generate some</text><text start="864.48" dur="5.68">decorations for the slides.</text><text start="868.079" dur="5.32">And let&amp;#39;s see what&amp;#39;s the first slide the</text><text start="870.16" dur="3.239">model made.</text><text start="873.92" dur="4.479">So here the model is writing some code</text><text start="875.92" dur="5.2">that will be compiled to be the final</text><text start="878.399" dur="5.761">slides. So this is the first slide the</text><text start="881.12" dur="5.12">model make in this demo which looks okay</text><text start="884.16" dur="4.08">but it&amp;#39;s not polished enough.</text><text start="886.24" dur="3.92">One of the key feature in reinforcement</text><text start="888.24" dur="4">learning is that the model will re</text><text start="890.16" dur="4.96">review its own results and refine the</text><text start="892.24" dur="5.599">results to to deliver a good final</text><text start="895.12" dur="5.2">results. Let&amp;#39;s see what&amp;#39;s the finally</text><text start="897.839" dur="6.161">what the model give us.</text><text start="900.32" dur="7.199">We can click skip and then the model</text><text start="904" dur="5.04">give us a good uh PowerPoint file. So</text><text start="907.519" dur="6.521">it&amp;#39;s a real PowerPoint that you can</text><text start="909.04" dur="5">download and open it in any software.</text><text start="914.639" dur="7.521">Let&amp;#39;s open it in uh in the office. So</text><text start="919.279" dur="4.56">let&amp;#39;s present the slides the model just</text><text start="922.16" dur="4.96">generated.</text><text start="923.839" dur="6.641">First are two intelligence benchmarks.</text><text start="927.12" dur="6.399">Humanities last exam is a benchmark that</text><text start="930.48" dur="6.64">measures AI&amp;#39;s ability to solve a broad</text><text start="933.519" dur="6.801">range of subjects on hard problems. We</text><text start="937.12" dur="6.32">evaluate the models with two settings</text><text start="940.32" dur="5.6">with and without tool use.</text><text start="943.44" dur="5.28">We can see that the agent modes the raw</text><text start="945.92" dur="4.96">intelligence is already pretty nice and</text><text start="948.72" dur="6">with access to all tools nearly double</text><text start="950.88" dur="5.84">the performance to 42%.</text><text start="954.72" dur="4.64">When evaluating models on humanity&amp;#39;s</text><text start="956.72" dur="5.039">last exam, especially with the browsing</text><text start="959.36" dur="5.039">ability, we have a two-layer</text><text start="961.759" dur="5.921">decontamination that ensure that the</text><text start="964.399" dur="5.68">model doesn&amp;#39;t cheat on this benchmark.</text><text start="967.68" dur="4.159">Front TMS is a benchmark that measures</text><text start="970.079" dur="3.601">advanced mathematical reasoning ability</text><text start="971.839" dur="4.161">of models.</text><text start="973.68" dur="4.88">Different from our baseline of mini and</text><text start="976" dur="5.44">03 which use Python with function</text><text start="978.56" dur="4.88">coding. We give the agent model all</text><text start="981.44" dur="4.88">available tools like a browser, a</text><text start="983.44" dur="5.92">computer and a terminal. The agent</text><text start="986.32" dur="5.12">achieves new state art of 27% on this</text><text start="989.36" dur="5.08">benchmark with the help of all these</text><text start="991.44" dur="3">tools.</text><text start="994.639" dur="4.88">Next, we evaluated the model on two</text><text start="996.88" dur="4.639">agentic benchmarks. Web arena is a</text><text start="999.519" dur="4.081">benchmark that measures web agents</text><text start="1001.519" dur="5.76">ability so to solve real world web</text><text start="1003.6" dur="7.76">tasks. The agent model improves over</text><text start="1007.279" dur="7.12">previous O3 model that powers the core.</text><text start="1011.36" dur="4.88">Browse comp is a benchmark we introduced</text><text start="1014.399" dur="4.481">earlier this year that measures the</text><text start="1016.24" dur="6.08">browsing agents ability to search and</text><text start="1018.88" dur="4.959">find uh how to locate information.</text><text start="1022.32" dur="3.84">The agent model significantly</text><text start="1023.839" dur="7.84">outperforms 03 and deep research on this</text><text start="1026.16" dur="8.399">benchmark achieving 69% pass rate.</text><text start="1031.679" dur="5.28">Finally, we care about how the users</text><text start="1034.559" dur="5.36">will benefit from our model in the real</text><text start="1036.959" dur="4.96">world. Spreadsheet bench is a benchmark</text><text start="1039.919" dur="4.481">that measures the model&amp;#39;s ability to</text><text start="1041.919" dur="6.16">edit spreadsheets derived from the real</text><text start="1044.4" dur="6.08">world use case. Here the agent model</text><text start="1048.079" dur="5.921">with the liberal office and the computer</text><text start="1050.48" dur="6">tool can already solve 30% of the task</text><text start="1054" dur="5.84">when we give the model the access to the</text><text start="1056.48" dur="7.52">raw Excel file in the terminal which</text><text start="1059.84" dur="6.16">further boost the performance to 45%.</text><text start="1064" dur="4">Finally we evated the model on an</text><text start="1066" dur="3.76">internal banking benchmark. The bench</text><text start="1068" dur="4.559">this benchmark evaluated the model&amp;#39;s</text><text start="1069.76" dur="5.919">ability to to conduct first to third</text><text start="1072.559" dur="6.24">year investment bank uh banking analyst</text><text start="1075.679" dur="4.88">tasks such as like putting together a</text><text start="1078.799" dur="5.201">three statement financial model for</text><text start="1080.559" dur="5.601">Fortune uh 500 company in this</text><text start="1084" dur="4.08">benchmark. The agent model significantly</text><text start="1086.16" dur="5.6">outperforms the previous deep research</text><text start="1088.08" dur="5.839">and all three models. As you can see</text><text start="1091.76" dur="4.32">this model is one of the most powerful</text><text start="1093.919" dur="5.041">model we&amp;#39;ve ever trained.</text><text start="1096.08" dur="6.4">It&amp;#39;s not only good on benchmarks, it&amp;#39;s</text><text start="1098.96" dur="5.76">also capable of reasoning, browsing, and</text><text start="1102.48" dur="6">tackling real world tasks at a level</text><text start="1104.72" dur="6.88">that we cannot imagine three months ago.</text><text start="1108.48" dur="4.319">That&amp;#39;s right. Um, as Edward said, um, we</text><text start="1111.6" dur="3.68">think we&amp;#39;ve trained a very powerful</text><text start="1112.799" dur="5.441">model and a lot of the power comes from</text><text start="1115.28" dur="4.96">its ability to browse the internet. And</text><text start="1118.24" dur="4.16">as we know, the internet can be a scary</text><text start="1120.24" dur="4.88">place. There are all sorts of hackers</text><text start="1122.4" dur="6.08">trying to steal your information, scams,</text><text start="1125.12" dur="6">uh fishing attempts. Um and agent isn&amp;#39;t</text><text start="1128.48" dur="4.88">immune to all these things. Um one</text><text start="1131.12" dur="4.4">particular thing we&amp;#39;re worried about is</text><text start="1133.36" dur="3.76">a new uh attack called prompt</text><text start="1135.52" dur="4.32">injections.</text><text start="1137.12" dur="4.96">This is where let&amp;#39;s say you ask agent to</text><text start="1139.84" dur="4.56">buy you a book and you give it your</text><text start="1142.08" dur="4.16">credit card information to do that.</text><text start="1144.4" dur="4.159">Agent might stumble upon a malicious</text><text start="1146.24" dur="4.16">website that asks it, &amp;quot;Oh, enter your</text><text start="1148.559" dur="4.24">credit card information here. it&amp;#39;ll help</text><text start="1150.4" dur="4.8">you with your task. An agent, which is</text><text start="1152.799" dur="5.281">trained to be helpful, might decide</text><text start="1155.2" dur="4.56">that&amp;#39;s a good idea.</text><text start="1158.08" dur="4.24">We&amp;#39;ve done a lot of work to try to</text><text start="1159.76" dur="4.48">ensure that this doesn&amp;#39;t happen. We&amp;#39;ve</text><text start="1162.32" dur="4.8">trained our model to ignore suspicious</text><text start="1164.24" dur="4.799">instructions on on suspicious websites.</text><text start="1167.12" dur="4.88">We&amp;#39;ve also have uh we also have layers</text><text start="1169.039" dur="4.721">of monitors that kind of peer over the</text><text start="1172" dur="4.48">agent&amp;#39;s shoulder and watch it as it&amp;#39;s</text><text start="1173.76" dur="5.039">going um and stop the trajectory if</text><text start="1176.48" dur="5.439">anything looks suspicious. We can even</text><text start="1178.799" dur="5.361">update these in real time if new attacks</text><text start="1181.919" dur="4">are found in the wild.</text><text start="1184.16" dur="3.6">That said though, you know, this is a</text><text start="1185.919" dur="4.081">cutting edge product. This is a new</text><text start="1187.76" dur="3.52">surface and we can&amp;#39;t stop everything.</text><text start="1190" dur="2.559">And so that&amp;#39;s why I feel it&amp;#39;s very</text><text start="1191.28" dur="4.08">important for the audience to be aware</text><text start="1192.559" dur="4.881">of the risks involved in using agent.</text><text start="1195.36" dur="4.16">And um we encourage users to be</text><text start="1197.44" dur="3.68">proactive in kind of thinking about how</text><text start="1199.52" dur="3.36">they share their information. You know,</text><text start="1201.12" dur="5.679">if it&amp;#39;s highly sensitive information,</text><text start="1202.88" dur="5.919">maybe don&amp;#39;t share that. um maybe um uh</text><text start="1206.799" dur="4">use our features like takeover mode to</text><text start="1208.799" dur="4.081">directly input your credit credit card</text><text start="1210.799" dur="4.88">information into the browser instead of</text><text start="1212.88" dur="5.76">um giving it to agent. Um we feel like</text><text start="1215.679" dur="4.801">we&amp;#39;ve built a very powerful product but</text><text start="1218.64" dur="3.12">again it&amp;#39;s important for our users to</text><text start="1220.48" dur="2.8">understand the risk involved.</text><text start="1221.76" dur="3.76">Yeah, I really want to emphasize that I</text><text start="1223.28" dur="3.84">think this is a new level of capability</text><text start="1225.52" dur="3.279">in AI. It&amp;#39;s a new way to use AI, but</text><text start="1227.12" dur="3.679">there will be a new set of attacks that</text><text start="1228.799" dur="4.321">come with that. And society and the</text><text start="1230.799" dur="3.521">technology will have to evolve and learn</text><text start="1233.12" dur="3.039">how we&amp;#39;re going to mitigate things that</text><text start="1234.32" dur="3.04">we can&amp;#39;t even really imagine yet. Uh, as</text><text start="1236.159" dur="3.52">people start doing more and more work</text><text start="1237.36" dur="4.48">this way. Before I wrap up, should we</text><text start="1239.679" dur="2.401">check in on some of the tasks you kicked</text><text start="1241.84" dur="4.319">off?</text><text start="1242.08" dur="6.16">Yeah, let&amp;#39;s do it. Um, okay. So, I am</text><text start="1246.159" dur="5.681">going to open a new tab and make sure</text><text start="1248.24" dur="7.439">that we can see the progress of our um,</text><text start="1251.84" dur="6.319">stickers as well. Okay. Let&amp;#39;s see. All</text><text start="1255.679" dur="5.201">right. So, sounds like stickers are</text><text start="1258.159" dur="5.041">ready. Let me see what it actually Okay.</text><text start="1260.88" dur="5.84">So, cool thing. This is sort of the end</text><text start="1263.2" dur="5.28">end result of the took about 7 minutes.</text><text start="1266.72" dur="3.12">Highly likely figured out everything.</text><text start="1268.48" dur="3.199">We&amp;#39;ll go back and look at the trajectory</text><text start="1269.84" dur="3.839">and see how it did. But at the end</text><text start="1271.679" dur="3.681">result, it looks like it&amp;#39;s added to the</text><text start="1273.679" dur="3.681">cart. This is the subtotal. I can just</text><text start="1275.36" dur="4.64">go ahead and look at it and then figure</text><text start="1277.36" dur="4.24">out uh I can just take over at this</text><text start="1280" dur="3.039">point as Casey said to enter my credit</text><text start="1281.6" dur="3.6">card information and then place the</text><text start="1283.039" dur="4.081">order really quickly. model is asking</text><text start="1285.2" dur="4.08">for confirmations, etc. as it&amp;#39;s supposed</text><text start="1287.12" dur="3.919">to do. Let&amp;#39;s just quickly browse through</text><text start="1289.28" dur="4">the trajectory and see what it actually</text><text start="1291.039" dur="4.801">did. Oh, it looks like it generated some</text><text start="1293.28" dur="5.6">stickers. Oh, look at that. That&amp;#39;s what</text><text start="1295.84" dur="4.8">it generated sticker. Cool. So, yeah,</text><text start="1298.88" dur="3.679">that&amp;#39;s the task. I think I can at this</text><text start="1300.64" dur="3.279">point finish up by myself or I can ask</text><text start="1302.559" dur="4.161">the model to actually go ahead and do it</text><text start="1303.919" dur="5.921">for me as well. Let&amp;#39;s check on the</text><text start="1306.72" dur="6">wedding. Okay, great. Looks like it just</text><text start="1309.84" dur="5.68">finished in the nick of time. Uh, okay,</text><text start="1312.72" dur="5.12">cool. So in this case, as as we said, we</text><text start="1315.52" dur="6.399">were looking for hotel, stress, uh</text><text start="1317.84" dur="5.68">suits, and also shoes. So it&amp;#39;s come out</text><text start="1321.919" dur="3.921">with a pretty comprehensive report. It</text><text start="1323.52" dur="6.72">looks like wedding venue, date, when it</text><text start="1325.84" dur="5.76">is with the Zilla links, dress codes. It</text><text start="1330.24" dur="2.72">figured out like what the suit</text><text start="1331.6" dur="3.199">recommendation should be, where you can</text><text start="1332.96" dur="4.16">buy. Now I can go ahead and buy myself</text><text start="1334.799" dur="6.161">or I can ask the agent to go and buy for</text><text start="1337.12" dur="6.24">me. Um also figured out footwear hurdle</text><text start="1340.96" dur="6.16">options. It actually looked through all</text><text start="1343.36" dur="6">the oop sorry it looked through all the</text><text start="1347.12" dur="4.32">availability. You can see actually it</text><text start="1349.36" dur="3.76">gives screenshots of what it checked. In</text><text start="1351.44" dur="3.84">this case we use booking.com and it&amp;#39;s</text><text start="1353.12" dur="4.24">able to do that. Also has gift</text><text start="1355.28" dur="4.48">suggestions etc. And next step I can ask</text><text start="1357.36" dur="4.16">it as you said the agent says hey if you</text><text start="1359.76" dur="3.2">need assistance purchasing any item or</text><text start="1361.52" dur="3.36">have any further adjustments let me know</text><text start="1362.96" dur="3.36">so we can do that. Um, and I want to</text><text start="1364.88" dur="3.76">show one last demo which we didn&amp;#39;t</text><text start="1366.32" dur="4.96">really run live but I think it&amp;#39;s really</text><text start="1368.64" dur="4.24">cool and especially because the folks</text><text start="1371.28" dur="6.399">who are getting married are really into</text><text start="1372.88" dur="6.799">MLB. U so we asked the agent uh to go</text><text start="1377.679" dur="4.961">and build an optimal itinary for</text><text start="1379.679" dur="5.521">visiting all 30 MLB stadiums in just in</text><text start="1382.64" dur="5.519">case you&amp;#39;re thinking of a satical uh and</text><text start="1385.2" dur="5.76">then design the optimal route prioritize</text><text start="1388.159" dur="4.241">Hello Kitty nights and whatnot and</text><text start="1390.96" dur="2.56">present a final plan as a detailed</text><text start="1392.4" dur="3.04">spreadsheet. I&amp;#39;ll really quickly run</text><text start="1393.52" dur="4.72">through this. Um I think it&amp;#39;s just so</text><text start="1395.44" dur="5.28">fun to see. So again like as we have</text><text start="1398.24" dur="5.679">thrown shown throughout the the live</text><text start="1400.72" dur="5.52">stream it uses a multitude of tools uses</text><text start="1403.919" dur="4.88">container the terminal use using the</text><text start="1406.24" dur="4.16">browser working through all the details.</text><text start="1408.799" dur="4.401">It&amp;#39;ll probably use again back to the</text><text start="1410.4" dur="6.159">browser figuring out Hello Kitty nights</text><text start="1413.2" dur="6.32">and then sports stadium and whatnot. Oh</text><text start="1416.559" dur="5.521">let&amp;#39;s see did I miss the Oh go map.</text><text start="1419.52" dur="4.399">building a map using code to actually</text><text start="1422.08" dur="4.079">build it out and then overall we get</text><text start="1423.919" dur="4.961">like a pretty solid result I think at</text><text start="1426.159" dur="4.241">the end takes 25 minutes to work where</text><text start="1428.88" dur="3.039">does the season start and what not you</text><text start="1430.4" dur="5.36">have a spreadsheet that you can quickly</text><text start="1431.919" dur="6">view inside just right inside Chad GBD</text><text start="1435.76" dur="4.64">you can map the journey cool looking map</text><text start="1437.919" dur="4.321">I guess and that&amp;#39;s it so this is Chad</text><text start="1440.4" dur="3.6">GBD agent we hope you really like it and</text><text start="1442.24" dur="3.679">over to Sam</text><text start="1444" dur="3.44">amazing work all of you and and to your</text><text start="1445.919" dur="2.801">teams this is I think uh really</text><text start="1447.44" dur="3.28">something that&amp;#39;s going to help people</text><text start="1448.72" dur="3.52">get worked done uh and have more time to</text><text start="1450.72" dur="2.8">do the things they want to do. Um I</text><text start="1452.24" dur="3.12">think it&amp;#39;s it&amp;#39;s really amazing how much</text><text start="1453.52" dur="4.24">you&amp;#39;ve brought together to deliver this</text><text start="1455.36" dur="3.76">experience and watching the agent sort</text><text start="1457.76" dur="2.88">of use the internet, make these</text><text start="1459.12" dur="3.84">spreadsheets, make PowerPoints, whatever</text><text start="1460.64" dur="5.36">else uh and do all this work is is quite</text><text start="1462.96" dur="5.92">amazing. We&amp;#39;re going live today for pro</text><text start="1466" dur="4.72">plus and team users. Pro users will get</text><text start="1468.88" dur="3.84">uh 400 queries a month plus some team</text><text start="1470.72" dur="3.28">users will get 40 a month. Uh the</text><text start="1472.72" dur="3.439">rollout should be finished by the end of</text><text start="1474" dur="4.4">the day for pro and very soon for plus</text><text start="1476.159" dur="4.64">and team users. will try to be live for</text><text start="1478.4" dur="4.96">enterprise and edu by the end of this</text><text start="1480.799" dur="4.561">month. As Casey mentioned, although this</text><text start="1483.36" dur="4.72">is an extremely exciting new technology,</text><text start="1485.36" dur="4.16">there are new risks. Uh people learned</text><text start="1488.08" dur="2.8">how to use the internet generally pretty</text><text start="1489.52" dur="3.36">safely, although of course there are</text><text start="1490.88" dur="3.679">still scammers and other attacks. People</text><text start="1492.88" dur="3.2">are going to need to learn to use AI</text><text start="1494.559" dur="3.36">agents. Uh and societyy&amp;#39;s going to need</text><text start="1496.08" dur="4">to learn to build up defenses against</text><text start="1497.919" dur="4.161">attacks on AI agents as well. So we&amp;#39;re</text><text start="1500.08" dur="4.16">starting with a very robust system, lots</text><text start="1502.08" dur="3.599">of warnings. We will relax that over</text><text start="1504.24" dur="3.36">time as people get more comfortable with</text><text start="1505.679" dur="4.24">it. But we do want people to treat this</text><text start="1507.6" dur="4.48">as a new technology and a new risk</text><text start="1509.919" dur="4.88">surface and use all of the caution that</text><text start="1512.08" dur="4.64">Casey talked about. Um, but that said,</text><text start="1514.799" dur="3.36">we hope you&amp;#39;ll love it. Uh, this is</text><text start="1516.72" dur="3.92">still very early. We will improve it</text><text start="1518.159" dur="4.481">rapidly and we&amp;#39;re excited to see where</text><text start="1520.64" dur="5.8">it all goes. So, congrats again. Thank</text><text start="1522.64" dur="3.8">you very much. Hope you enjoy.</text></transcript>

格式转换

遗留问题：

无法成功合并字幕文件
批量合并的脚本：

@echo off
setlocal enabledelayedexpansion

set "input_dir=C:\Users\xx\Desktop\12个视频对应的音视频文件"
set "output_dir=C:\Users\xx\Desktop\\合并"
set "mp4box_path=F:\Program Files (x86)\MarukoToolbox\tools\mp4box.exe"

if not exist "%output_dir%" ( mkdir "%output_dir%" )

for %%f in ("%input_dir%\video_*.mp4") do (
    set "fullname=%%~nxf"
    set "filename=!fullname:~6,-4!"
    set "filename=!filename: (1)=!"
    set "audio_file=%input_dir%\audio_!filename!.mp4"

    echo Checking: "!audio_file!"
    if exist "!audio_file!" (
        set "output_file=%output_dir%\!filename!_Mux.mp4"
        echo Merging: %%f + !audio_file! → !output_file!
        "!mp4box_path!" -add "%%f#trackID=1" -add "!audio_file!" -new "!output_file!"
        if !errorlevel! neq 0 ( echo Error merging %%f ) else ( echo Success: !output_file! )
    ) else (
        echo Audio file not found for %%f → [!audio_file!]
    )
)

endlocal
pause