The sequence of the learning ORACLE.

本文提供了一条从SQL到高可用性的Oracle学习路径,并分享了作者的经验。建议初学者按顺序学习SQL、PL/SQL、数据库架构、备份恢复、性能调优及高可用性,通过1000小时的学习掌握Oracle。

<!-- [if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:PunctuationKerning/> <w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing> <w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery> <w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:Compatibility> <w:SpaceForUL/> <w:BalanceSingleByteDoubleByteWidth/> <w:DoNotLeaveBackslashAlone/> <w:ULTrailSpace/> <w:DoNotExpandShiftReturn/> <w:AdjustLineHeightInTable/> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:UseFELayout/> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--><!-- [if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" LatentStyleCount="156"> </w:LatentStyles> </xml><![endif]--><!-- [if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:普通表格; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} </style> <![endif]-->

The sequence of the learning ORACLE is as below:

SQL->PLSQL->architect->backup and recovery->performance tuning->high availability.

I recommend you to learn ORACLE according to the sequence per my past experience.

Now matter whether you want to take the certification exam or start out your first steps of DBA.

The SQL is easier than you think because it is not procedural so you need not to tell it how to do it. What you need to tell is only what you want to do including get data from the database, put data into it and modify the data that is in the database.

The SQL is general language which can be used in any relational database.

The PL SQL is procedural language of the SQL it only can be used in the ORACLE related products.

Then you can learn the database architect.

The knowing architect will help you understand the ORACLE better and can make the ORACLE at your disposal.

Understanding the architect will help you troubleshoot every problem effectively.

The backup and recovery is important for a DBA.

Knowing the backup and recovery theory and testing out every possible scenario also will help you future understanding the architect.

Performance tuning is more profound field in ORACLE.

Different people has different view and different way to do it.

The high availability is not so hard as you think.

As long as you get the change to set it up for several times you will find it is not mysterious as its conception implying.

Per the sequence I suggest and stick to learn it for 1000 hours.

You will master ORALE well.

IV. METHOD In this section, we first present our formulation of VLN as a multi-turn process (Section IV-A). We then describe the ActiveVLN framework (Section IV-B), which enables learning from self-generated trajectories through active ex ploration (Section IV-B.2) and employs a dynamic early stopping strategy to enhance RL training efficiency (Sec tion IV-B.3). Finally, we provide engineering details for further acceleration of RL training (Section IV-C). A. Multi-Turn Paradigm Following video-based MLLMs, most prior end-to-end VLN models adopt the single-turn paradigm (Figure 2a), where each action is predicted from the instruction and past observations: at ∼ π&theta;(at | I,V1:t). (3) In contrast, we adopt the multi-turn paradigm (Figure 2b), where actions are modeled autoregressively from both past observations and actions: at ∼ π&theta; at | I,{(Vi,ai)}t−1 i=1, Vt . (4) Fig. 1: Overview of ActiveVLN. In Stage 1, ActiveVLN performs imitation learning (IL) using expert trajectories. In Stage 2, it conducts multi-turn reinforcement learning (RL), autonomously collecting trajectories in the simulator, receiving rewards that encourage progress toward the goal, and updating the policy via GRPO. Key components, including the dynamic early-stopping strategy, scene preloading, and scene caching, are incorporated to ensure efficient training during RL. (a) Single-turn paradigm: each action is predicted from the instruction and past observations only. (b) Multi-turn paradigm: actions are generated autoregressively from the instruction, past observations, and past actions. This allows training signals from future steps to backpropagate and refine earlier predictions. (see Section V-F.2). B. ActiveVLN Framework 1) Compact Action Prediction: The raw VLN-CE action space comprises four primitive actions: FORWARD, TURN LEFT, TURN RIGHT, and STOP. Prior work augments this space by merging consecutive actions of the same type [3] [5]. For instance, three FORWARD steps (each 25cm) can be merged into a single FORWARD 75cm action. This aug mentation diversifies action granularity and shortens episode length, improving training efficiency. Building on this, we adopt action chunking to further reduce episode length, where the agent predicts up to three future actions at once rather than a single action per step. 2) Active Exploration: Since MLLMs are not pre-trained for VLN tasks, we start with imitation learning (IL) using a small number of expert demonstrations to initialize the navigation policy. The IL objective is: nt LIL = − t i=1 log P(ai t | I,H<t,a<i t ), (5) where I is the navigation instruction, H<t is the interaction history up to step t, and a<i t are the actions already generated in the current chunk. Formally, the history is: Fig. 2: Comparison between the single-turn and multi-turn paradigms. H<t ={V1,A1,V2,A2,...,Vt−1,At−1,Vt}, (6) where Vt is the observation at step t, and At = {a1 t, a2 t, . . . , ant This paradigm offers several advantages. First, it naturally enables KV-cache reuse for efficient inference. Second, while the single-turn paradigm breaks actions within the same episode into independent pieces, the multi-turn formulation allows actions to be packed together, accelerating training. Most importantly, it enables gradients associated with trajec tory outcomes to be propagated to all preceding actions. We f ind this property to be crucial for the success of RL in VLN t } is a sequence of low-level actions. IL provides a good starting point, but it has a key lim itation: the agent only learns to mimic expert trajectories. Once it encounters unfamiliar situations, it has no mechanism to recover or adapt. To overcome this, we introduce active exploration through reinforcement learning (RL). Here, the agent is no longer restricted to expert data. Instead, it can interact with the environment on its own: given an instruc tion, it predicts an action, executes it in the simulator, and observes the outcome. By repeating this loop until it issues a stop action (or an exception occurs), the agent actively generates diverse trajectories, learns from its successes and failures, and gradually improves its policy. For optimization, we use Group Relative Policy Optimiza tion (GRPO) [31]. GRPO samples G candidate trajectories for the same instruction and compares them within the group. Trajectories that perform better than the group average are reinforced, while weaker ones are suppressed. The RL objective is:  LRL = E{oi} 1 clip G G |oi| 1 |oi| t=1 min π&theta; π&theta;old Ai,t, i=1 π&theta; π&theta;old , 1 − ϵ,1 + ϵ Ai,t , (7) where oi is the i-th trajectory with length |oi|, π&theta; and π&theta;old are the new and old policies, ϵ is the clipping parameter, and Ai,t is the estimated advantage. We use a soft success reward: R=15·I(success) · dgoal 3 , (8) where dgoal is the geodesic distance to the goal. The indicator I(success) = 1 if the agent issues a valid stop within 3 meters of the goal, and 0 otherwise. In this way, the agent is no longer just imitating experts but is encouraged to explore actively, discover different ways of reaching the goal, and improve by trial and error. This self-driven learning process is the key to stronger generalization in unseen environments. 3) Dynamic Early-Stopping Strategy: Trajectory rollout time can account for over half of total RL training time, making it the primary bottleneck. We observe that exces sively long trajectories often dominate this time, and in most cases correspond to unsuccessful attempts in which the agent wanders aimlessly or explores irrelevant regions. To address this issue, we introduce a dynamic early stopping strategy that adaptively terminates unpromising rollouts. Specifically, a trajectory is stopped and marked as failed once its length exceeds a threshold Tmax, defined as: Tmax = αroll · |τ∗|, (9) where |τ∗| is the length of the oracle (expert trajectory), and αroll > 1 is a tolerance factor that specifies how much deviation from the oracle is acceptable (we set αroll = 2 in our experiments). This strategy avoids wasting computation on hopeless rollouts, while maintaining a balance between preventing excessive exploration and avoiding overly strict cutoffs, ultimately leading to more efficient training. C. Engineering Details in RL To further accelerate RL training, we adopt several tech niques: Scene caching stores frequently accessed scene data in memory, enabling faster loading when the same scene is revisited. Scene preloading pipelines scene loading with policy updates, reducing idle time during training. These techniques cut down scene-loading overhead and improve training efficiency. In addition, similar to [35], we decouple the simulator from the training server by deploying it as a standalone HTTP server, which allows scalable and parallel execution of multiple navigation environments.详细解释一下
10-25
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值