Wasting time at work?

一项调查显示,美国员工平均每天浪费约20%的工作时间,主要活动包括个人上网、社交闲聊等。研究发现,员工浪费时间的原因多样,如感到无聊、工作不足等。
 
Wasting time at work? You're not alone: survey
A survey found that U.S. workers waste about 20 percent of their working day.
Americans who feel bored and underpaid do work hard -- at surfing the Internet and catching up on gossip, according to a survey that found U.S. workers waste about 20 percent of their working day .
An online survey of 2,057 employees by online compensation company Salary.com found about six in every 10 workers admit to wasting time at work with the average employee wasting 1.7 hours of a typical 8.5 hour working day.
Personal Internet use topped the list as the leading time-wasting activity according to 34 percent of respondents, with 20.3 percent then listing socializing with co-workers and 17 percent conducting personal business as taking up time.
The reasons why people wasted time were varied with nearly 18 percent of respondents questioned by e-mail in June and July said boredom and not having enough to do was the main reason.
The second most popular reason for wasting time was having too long hours (13.9 percent), being underpaid (11.8 percent), and a lack of challenging work (11.1 percent).
While the amount of time wasted at work seems high, Bill Coleman, chief compensation officer at Salary.com. said the numbers have improved, with the amount of time wasted dropping 19 percent since Salary.com conducted its first annual survey on slacking at work in 2005. Then workers reported wasting 2.09 hours of their working day.
"I think (the decline) is really a result of the economy and that there's more business, more work available and less time to sit around wondering what you are going to do with your day," Coleman told reporters.
 
 
IV. METHOD In this section, we first present our formulation of VLN as a multi-turn process (Section IV-A). We then describe the ActiveVLN framework (Section IV-B), which enables learning from self-generated trajectories through active ex ploration (Section IV-B.2) and employs a dynamic early stopping strategy to enhance RL training efficiency (Sec tion IV-B.3). Finally, we provide engineering details for further acceleration of RL training (Section IV-C). A. Multi-Turn Paradigm Following video-based MLLMs, most prior end-to-end VLN models adopt the single-turn paradigm (Figure 2a), where each action is predicted from the instruction and past observations: at ∼ πθ(at | I,V1:t). (3) In contrast, we adopt the multi-turn paradigm (Figure 2b), where actions are modeled autoregressively from both past observations and actions: at ∼ πθ at | I,{(Vi,ai)}t−1 i=1, Vt . (4) Fig. 1: Overview of ActiveVLN. In Stage 1, ActiveVLN performs imitation learning (IL) using expert trajectories. In Stage 2, it conducts multi-turn reinforcement learning (RL), autonomously collecting trajectories in the simulator, receiving rewards that encourage progress toward the goal, and updating the policy via GRPO. Key components, including the dynamic early-stopping strategy, scene preloading, and scene caching, are incorporated to ensure efficient training during RL. (a) Single-turn paradigm: each action is predicted from the instruction and past observations only. (b) Multi-turn paradigm: actions are generated autoregressively from the instruction, past observations, and past actions. This allows training signals from future steps to backpropagate and refine earlier predictions. (see Section V-F.2). B. ActiveVLN Framework 1) Compact Action Prediction: The raw VLN-CE action space comprises four primitive actions: FORWARD, TURN LEFT, TURN RIGHT, and STOP. Prior work augments this space by merging consecutive actions of the same type [3] [5]. For instance, three FORWARD steps (each 25cm) can be merged into a single FORWARD 75cm action. This aug mentation diversifies action granularity and shortens episode length, improving training efficiency. Building on this, we adopt action chunking to further reduce episode length, where the agent predicts up to three future actions at once rather than a single action per step. 2) Active Exploration: Since MLLMs are not pre-trained for VLN tasks, we start with imitation learning (IL) using a small number of expert demonstrations to initialize the navigation policy. The IL objective is: nt LIL = − t i=1 log P(ai t | I,H<t,a<i t ), (5) where I is the navigation instruction, H<t is the interaction history up to step t, and a<i t are the actions already generated in the current chunk. Formally, the history is: Fig. 2: Comparison between the single-turn and multi-turn paradigms. H<t ={V1,A1,V2,A2,...,Vt−1,At−1,Vt}, (6) where Vt is the observation at step t, and At = {a1 t, a2 t, . . . , ant This paradigm offers several advantages. First, it naturally enables KV-cache reuse for efficient inference. Second, while the single-turn paradigm breaks actions within the same episode into independent pieces, the multi-turn formulation allows actions to be packed together, accelerating training. Most importantly, it enables gradients associated with trajec tory outcomes to be propagated to all preceding actions. We f ind this property to be crucial for the success of RL in VLN t } is a sequence of low-level actions. IL provides a good starting point, but it has a key lim itation: the agent only learns to mimic expert trajectories. Once it encounters unfamiliar situations, it has no mechanism to recover or adapt. To overcome this, we introduce active exploration through reinforcement learning (RL). Here, the agent is no longer restricted to expert data. Instead, it can interact with the environment on its own: given an instruc tion, it predicts an action, executes it in the simulator, and observes the outcome. By repeating this loop until it issues a stop action (or an exception occurs), the agent actively generates diverse trajectories, learns from its successes and failures, and gradually improves its policy. For optimization, we use Group Relative Policy Optimiza tion (GRPO) [31]. GRPO samples G candidate trajectories for the same instruction and compares them within the group. Trajectories that perform better than the group average are reinforced, while weaker ones are suppressed. The RL objective is:  LRL = E{oi} 1 clip G G |oi| 1 |oi| t=1 min πθ πθold Ai,t, i=1 πθ πθold , 1 − ϵ,1 + ϵ Ai,t , (7) where oi is the i-th trajectory with length |oi|, πθ and πθold are the new and old policies, ϵ is the clipping parameter, and Ai,t is the estimated advantage. We use a soft success reward: R=15·I(success) · dgoal 3 , (8) where dgoal is the geodesic distance to the goal. The indicator I(success) = 1 if the agent issues a valid stop within 3 meters of the goal, and 0 otherwise. In this way, the agent is no longer just imitating experts but is encouraged to explore actively, discover different ways of reaching the goal, and improve by trial and error. This self-driven learning process is the key to stronger generalization in unseen environments. 3) Dynamic Early-Stopping Strategy: Trajectory rollout time can account for over half of total RL training time, making it the primary bottleneck. We observe that exces sively long trajectories often dominate this time, and in most cases correspond to unsuccessful attempts in which the agent wanders aimlessly or explores irrelevant regions. To address this issue, we introduce a dynamic early stopping strategy that adaptively terminates unpromising rollouts. Specifically, a trajectory is stopped and marked as failed once its length exceeds a threshold Tmax, defined as: Tmax = αroll · |τ∗|, (9) where |τ∗| is the length of the oracle (expert trajectory), and αroll > 1 is a tolerance factor that specifies how much deviation from the oracle is acceptable (we set αroll = 2 in our experiments). This strategy avoids wasting computation on hopeless rollouts, while maintaining a balance between preventing excessive exploration and avoiding overly strict cutoffs, ultimately leading to more efficient training. C. Engineering Details in RL To further accelerate RL training, we adopt several tech niques: Scene caching stores frequently accessed scene data in memory, enabling faster loading when the same scene is revisited. Scene preloading pipelines scene loading with policy updates, reducing idle time during training. These techniques cut down scene-loading overhead and improve training efficiency. In addition, similar to [35], we decouple the simulator from the training server by deploying it as a standalone HTTP server, which allows scalable and parallel execution of multiple navigation environments.详细解释一下
10-25
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值