Failover and Load balancing in Oracle

本文详细介绍了 Oracle Net 的高级特性,包括故障转移和负载均衡的功能实现方式。故障转移分为连接时间故障转移和透明应用程序故障转移(TAF),并提供了具体的配置示例。负载均衡则从客户端和服务端两个角度进行了阐述。

 今天是2014-03-19,在逛论坛时发现有个朋友写的文章特别有价值,特此转载。注:(

Advanced features of Oracle Net include failover and load balancing. These features are actually interrelated in as much as you usually don’t have one without the other. While they are mostly used in a RAC environment, they can be set up in a single instance environment as well.

FAILOVER:
In the context of Oracle Net, failover refers to the mechanism of switching over to an alternate resource when connection to the primary resource gets terminated due to any reason. Connection failure may be broadly categorized as:

  • Those that occur while making the initial connection.
  • Those that occur after a connection has been successfully established.

The first category of connection failure can be dealt with comparatively easily. If your attempt to connect to an instance fails, you can make the connection attempt again, but this time to a backup instance. As long as you have backup instances configured, you can continue trying to connect, until a successful connection is established. This process is technically termed as Connect Time Failover.
The second category of connection failure is of the type that happens after a successful connection has already been established, and subsequently the connection gets terminated.
In such cases the application normally has to handle all the nitty gritty of reconnecting to a backup instance, reestablishing the session environment, and resubmitting any work lost, because of the break in the connection. The technical name for this type of failover mechanism is transparent application failover or TAF for short.

Ok, now let’s have a look at the mechanism of setting up these features. Let’s start with Connect Time Failover. Although Connect Time Failover is a RAC environment mechanism, you can still use it with non-RAC environments, where you have a mechanism like standby database in place. Connect Time Failover can be achieved by the simple expedient of configuring a Net Service Name through which clients may connect to the standby database, whenever they cannot access the primary database.
This can even be achieved with a single net service name provided you configure
a) Multiple listener addresses within a description
b) Multiple descriptions within a description list
The difference between the two is that while the former will use the same connect data for all the listener addresses, that latter may have separate connect data for each configured listener address. Below, I provide an example of both types of configuration:
Multiple Listener addresses within a description:

SAIBAL.GHOSH=
			(DESCRIPTION=
			(ADDRESS_LIST=
			(ADDRESS= (PROTOCOL=TCP) (HOST=TEST_DB) (PORT=1521))
			(ADDRESS= (PROTOCOL=TCP) (HOST=MY_DB) (PORT=1521))
			(FAILOVER= TRUE)
					)
			(CONNECT_DATA=
			(SERVICE_NAME= saibal)
					)
				)


An important issue to be aware of at this point is that Connect Time Failover only works for you, if you are using dynamic registration. This means that this feature won’t work if you have something like this configured in your listener.ora:

SID_LIST_MYLISTENER=
				(SID_LIST=
				(SID_DESC=
				(GLOBAL_DBNAME=sales.us.acme.com)
                                                (ORACLE_HOME=/s01/app/oracle/ora/prod)
				(SID_NAME=sghosh)
					)
				     )


Any reference to Global_dbname and you can forget about Connect Time Failover. By default, failover is enabled when you specify multiple addresses. However, you can disable failover by specifying (FAILOVER=false). When failover is disabled, Oracle Net will attempt to connect using the first address, and if that attempt fails, no further attempts will be made and Oracle Net will generate an error.

Multiple descriptions within a description list:

SAIBAL.GHOSH=
			(DESCRIPTION_LIST=
			(FAILOVER=true)
			(LOAD_BALANCE=false)
			(DESCRIPTION=
			(ADDRESS= (PROTOCOL=TCP) (HOST=TEST_DB) (PORT=1521))
			(CONNECT_DATA=
			(SERVICE_NAME=saibal))
) 
			
			(DESCRIPTION=
			(ADDRESS= (PROTOCOL=TCP) (HOST=MY_DB) (PORT=1521))
			(CONNECT_DATA=
			(SERVICE_NAME= test_saibal))
)
			      )


Notice that in the description list above, I have put (FAILOVER=true) and (LOAD_BALANCE=false). There is no real need to put (FAILOVER=true), as it is the default behavior, however, (LOAD_BALANCE=false) does not represent default behavior, and I have to set it to false to stop client load balancing which is enabled by default, if I am using multiple description lists. When client load balancing is enabled, Oracle Net will randomly choose descriptions from the description list to make a client connection.

Now, let’s look at how TAF works. Unlike connect time failover, which is invoked before the connection is made, TAF comes into play after the connection is made (and then, broken). If the connection is lost while the application is running, Oracle Net will transparently reconnect the application to another instance accessing the same database. The failover is done gracefully and uses failover aware APIs built into OCI.
TAF supports two types of failover: SESSION and SELECT. A SESSION failover connection is not over ambitious. It just fails over to a backup instance. All work in progress at that point are irrevocably lost. SELECT failover is more intricate in as much as it enables some type of read only application to failover without losing the work in progress. If a SELECT statement was in progress at the time of the termination of the connection, then as the connection is reestablished to a backup instance, Oracle Net re-executes the SELECT statement and positions the cursor in a manner that the client can seamlessly continue fetching the rows. But that’s about all that TAF has to offer. It doesn’t have any mechanism to recover DML statements that were in progress, when the failure occurred, or even for SELECT statements, you lose global temporary tables, package states and session settings.
TAF supports two failover methods: BASIC and PRECONNECT. In BASIC method, you connect to the backup instance when the primary connection fails. In the PRECONNECT method, you connect to the backup instance at the same time you connect to the primary instance. This has the obvious benefit of having a backup connection available all of the time, thus reducing the time of ‘failover’. But the downside is that you have to pay the extra ‘cost’ in terms of resources spent, of having a backup connection open all the while.
TAF is configured by adding a FAILOVER_MODE parameter to the CONNECT_DATA parameter for a net service name. Since you cannot configure TAF using the Net Manager, you have to use either OEM or manually edit tnsnames.ora. If you are going to use TAF, then you have to have a backup instance in place. That means you have to configure two net service names—one to connect to the primary instance and the other to connect to the backup instance. The example below shows a TAF configuration where connections to TEST will failover to TEST_BKUP.

TEST=
(DESCRIPTION=
            (ADDRESS= (PROTOCOL=TCP) (HOST=NEW_HOST) (PORT=1521))
				(CONNECT_DATA=
				(SERVICE_NAME=saibal.ghosh)
				(FAILOVER_MODE= (TYPE=SELECT) (METHOD=BASIC) (BACKUP=TEST_BKUP))
                                                      )
                                                  )

		TEST_BKUP =
(DESCRIPTION=
         (ADDRESS= (PROTOCOL=TCP) (HOST=BKUP_HOST) (PORT=1526))
				(CONNECT_DATA=
                                                     (SERVICE_NAME= sbkup.ghosh)
                                                                          )
                                                                 
)


The definition of TEST contains a FAILOVER_MODE entry that specifies the name of the net service name to which a connection is to be made, should the TEST connection happen to fail. In this particular example, I have ‘failed over’ to an entirely different database, not to another instance accessing the same database. I did this to show that TAF can work either way, but if you are failing over to a different database, then you should keep the two databases in sync by using a mechanism like standby database.
Normally, TAF makes only a single attempt to connect to the backup instance. However, if you specify the RETRIES and DELAY parameters, you can force TAF to make multiple connection attempts to the backup instance. The following example shows TAF configured for 10 retries, each at an interval of fifteen seconds.

PROD_BKUP=
(DESCRIPTION=
      (ADDRESS= (PROTOCOL=TCP) (HOST=ss-bkup) (PORT=1521))
			(CONNECT_DATA=
				(SERVICE_NAME=saibal6)
                                                   (FAILOVER_MODE=
 (TYPE=SELECT) (METHOD= BASIC)(BACKUP=saibal6_bkup) (RETRIES= 10)
                                                                                                                                              (DELAY= 15))
                                                                             )
                                                                  
)


RETRIES and DELAY parameters may be gainfully employed where you are using a standby database to failover to and such a database may take a few moments to be brought up.
If we want to have the entire failover mechanism configured, so that we can take advantage of both connect time failover, as well as transparent application failover, we can put something like the following in place:

PROD=
          (DESCRIPTION=
           (ADDRESS=(PROTOCOL=TCP) (HOST=ss01-main) (PORT=1521))
	   (ADDRESS=(PROTOCOL=TCP) (HOST=ss02-main) (PORT=1521))
	    (CONNECT_DATA=
              (SERVICE_NAME=saibal6)
(FAILOVER_MODE= 
(TYPE=SELECT)(METHOD=BASIC)(BACKUP=PROD_BKUP)                                                                                                                                           ) 
)
PROD_BKUP=
    (DESCRIPTION=
     (ADDRESS= (PROTOCOL=TCP) (HOST=ss02-main) (PORT=1521))
        (CONNECT_DATA=                                                
            (SERVICE_NAME=saibal6)
                                                                  )


)
Now, let’s analyze the scenario above. The failover set up is from ss01-main to ss02-main. But a problem will occur where the initial connection is made to ss-02-main because of connect time failover; then we are already connected to the backup mode. If that fails, we have had it, there’s nowhere else to go. And what’s more, if you are thinking of using a description list, with several descriptions, that won’t work either. TAF settings are picked up from the first connect_data entry encountered, so other descriptions in the description list become useless.

LOAD BALANCING
Load balancing may be defined as distributing a job or piece of work over multiple resources. RAC is an ideal environment for distributing load amongst multiple instances accessing the same physical database. Other environments may also suitably configured to invoke load balancing, and in the following few paragraphs I show how load balancing can be set up.
CLIENT LOAD BALANCING: You can configure load balancing either at the client end or at the server end. Client load balancing is configured at the net service name level, and it is done simply by providing multiple addresses in an address list, or multiple descriptions in a description list. When load balancing is enabled, Oracle Net chooses an address to connect to in a random order rather than sequential order. This has the effect of clients connecting through
addresses which are picked up at random and no one address is overloaded. But significantly, there is no guarantee that just because clients are being connected through different addresses picked at random, there is an even distribution of workload at the server end. To do that, you will need to configure load balancing at the server end which is discussed below.
CONNECTION LOAD BALANCING: This feature improves connection performance by allowing the listener distribute new connections to different dispatchers and instances. The listener is in a position to do so because due to dynamic registration, the current load of instances and dispatchers are available with the listener. This allows the listener to balance the load across dispatchers and instances while connecting client connection requests.
Connection load balancing may be done in:

  • Single instance shared server configuration
  • Multiple instance shared server configuration
  • Multiple instance dedicated server connection

Load balancing is done in the following order: In case of dedicated server configuration it is:

  • Least loaded node
  • Least loaded instance

In case of shared server, a listener selects a dispatcher in the following order:

  • Least loaded node
  • Least loaded instance
  • Least loaded dispatcher for that instance

Connection load balancing may also be combined with client load balancing to leverage load balancing activity. While the listener will load-balance connections to dispatchers/instances, client load balancing will distribute load of handling new connections over more than one listener. Add failover at the client end and you will be setting up a robust system, which will insulate the clients from potential connection failures, while at the same time keeping an eye on performance.

 

该数据集通过合成方式模拟了多种发动机在运行过程中的传感器监测数据,旨在构建一个用于机械系统故障检测的基准资源,特别适用于汽车领域的诊断分析。数据按固定时间间隔采集,涵盖了发动机性能指标、异常状态以及工作模式等多维度信息。 时间戳:数据类型为日期时间,记录了每个数据点的采集时刻。序列起始于2024年12月24日10:00,并以5分钟为间隔持续生成,体现了对发动机运行状态的连续监测。 温度(摄氏度):以浮点数形式记录发动机的温度读数。其数值范围通常处于60至120摄氏度之间,反映了发动机在常规工况下的典型温度区间。 转速(转/分钟):以浮点数表示发动机曲轴的旋转速度。该参数在1000至4000转/分钟的范围内随机生成,符合多数发动机在正常运转时的转速特征。 燃油效率(公里/升):浮点型变量,用于衡量发动机的燃料利用效能,即每升燃料所能支持的行驶里程。其取值范围设定在15至30公里/升之间。 振动_X、振动_Y、振动_Z:这三个浮点数列分别记录了发动机在三维空间坐标系中各轴向的振动强度。测量值标准化至0到1的标度,较高的数值通常暗示存在异常振动,可能与潜在的机械故障相关。 扭矩(牛·米):以浮点数表征发动机输出的旋转力矩,数值区间为50至200牛·米,体现了发动机的负载能力。 功率输出(千瓦):浮点型变量,描述发动机单位时间内做功的速率,取值范围为20至100千瓦。 故障状态:整型分类变量,用于标识发动机的异常程度,共分为四个等级:0代表正常状态,1表示轻微故障,2对应中等故障,3指示严重故障。该列作为分类任务的目标变量,支持基于传感器数据预测故障等级。 运行模式:字符串类型变量,描述发动机当前的工作状态,主要包括:怠速(发动机运转但无负载)、巡航(发动机在常规负载下平稳运行)、重载(发动机承受高负荷或高压工况)。 数据集整体包含1000条记录,每条记录对应特定时刻的发动机性能快照。其中故障状态涵盖从正常到严重故障的四级分类,有助于训练模型实现故障预测与诊断。所有数据均为合成生成,旨在模拟真实的发动机性能变化与典型故障场景,所包含的温度、转速、燃油效率、振动、扭矩及功率输出等关键传感指标,均为影响发动机故障判定的重要因素。 资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值