Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire

Build Your Own Oracle RAC 10gRelease 2 Cluster on Linux and FireWire
by Jeffrey Hunter- OTN

Oracle RAC O2CB Cluster Service

Before we can do anything with OCFS2 like formatting or mounting the file system, we need to first have OCFS2's cluster stack, O2CB, running(which it will be as a result of the configuration process performed above). The stack includes the following services:

  • NM:Node Manager that keep track of all the nodes in the cluster.conf
  • HB:Heart beat service that issues up/down notifications when nodes join or leave the cluster
  • TCP:Handles communication between the nodes
  • DLM:Distributed lock manager that keeps track of all locks, its owners and status
  • CONFIGFS:User space driven configuration file system mounted at /config
  • DLMFS:User space interface to the kernel space DLM

All of the above cluster services have been packaged in theo2cbsystem service (/etc/init.d/o2cb). Here is a short listing of some of the more useful commands and options for theo2cbsystem service.

  • /etc/init.d/o2cb status
    Module "configfs": Not loaded Filesystem "configfs": Not mounted Module "ocfs2_nodemanager": Not loaded Module "ocfs2_dlm": Not loaded Module "ocfs2_dlmfs": Not loaded Filesystem "ocfs2_dlmfs": Not mounted
    Note that with this example, all of the services are not loaded. I did an "unload" right before executing the "status" option. If you were to check the status of the o2cb service immediately after configuring OCFS using ocfs2console utility, they would all be loaded.

  • /etc/init.d/o2cb load
    Loading module "configfs": OK Mounting configfs filesystem at /config: OK Loading module "ocfs2_nodemanager": OK Loading module "ocfs2_dlm": OK Loading module "ocfs2_dlmfs": OK Mounting ocfs2_dlmfs filesystem at /dlm: OK
    Loads all OCFS modules.

  • /etc/init.d/o2cb online ocfs2
    Starting cluster ocfs2: OK
    The above command will online the cluster we created, ocfs2.

  • /etc/init.d/o2cb offline ocfs2
    Unmounting ocfs2_dlmfs filesystem: OK Unloading module "ocfs2_dlmfs": OK Unmounting configfs filesystem: OK Unloading module "configfs": OK
    The above command will offline the cluster we created, ocfs2.

  • /etc/init.d/o2cb unload
    Cleaning heartbeat on ocfs2: OK Stopping cluster ocfs2: OK
    The above command will unload all OCFS modules.

Configure O2CB to Start on Boot

You now need to configure the on-boot properties of the OC2B driver so that the cluster stack services will start on each boot. All the tasks within this section will need to be performedon both nodes in the cluster.

Note:At the time of writing this guide, OCFS2 contains a bug wherein the driver does not get loaded on each boot even after configuring the on-boot properties to do so. After attempting to configure the on-boot properties to start on each boot according to the official OCFS2 documentation, you will still get the following error on each boot:
... Mounting other filesystems: mount.ocfs2: Unable to access cluster service Cannot initialize cluster mount.ocfs2: Unable to access cluster service Cannot initialize cluster[FAILED]...
Red Hat changed the way the service is registered between chkconfig-1.3.11.2-1 and chkconfig-1.3.13.2-1. The O2CB script used to work with the former.

Before attempting to configure the on-boot properties:

  • REMOVE the following lines in/etc/init.d/o2cb
    ### BEGIN INIT INFO # Provides: o2cb # Required-Start: # Should-Start: # Required-Stop: # Default-Start: 2 3 5 # Default-Stop: # Description: Load O2CB cluster services at system boot. ### END INIT INFO
  • Re-register the o2cb service.
    #chkconfig --del o2cb#chkconfig --add o2cb#chkconfig --list o2cbo2cb 0:off 1:off 2:on 3:on 4:on 5:on 6:off #ll /etc/rc3.d/*o2cb*lrwxrwxrwx 1 root root 14 Sep 29 11:56 /etc/rc3.d/S24o2cb -> ../init.d/o2cb
    The service should beS24o2cbin the default runlevel.

After resolving this bug, you can continue to set the on-boot properties as follows:

#/etc/init.d/o2cb offline ocfs2#/etc/init.d/o2cb unload#/etc/init.d/o2cb configureConfiguring the O2CB driver.
This will configure the on-boot properties of the O2CB driver. The following questions will determine whether the driver is loaded on boot. The current values will be shown in brackets ('[]'). Hitting <ENTER> without typing an answer will keep that current value. Ctrl-C will abort.
Load O2CB driver on boot (y/n) [n]:yCluster to start on boot (Enter "none" to clear) [ocfs2]:ocfs2Writing O2CB configuration: OK Loading module "configfs": OK Mounting configfs filesystem at /config: OK Loading module "ocfs2_nodemanager": OK Loading module "ocfs2_dlm": OK Loading module "ocfs2_dlmfs": OK Mounting ocfs2_dlmfs filesystem at /dlm: OK Starting cluster ocfs2: OK

Format the OCFS2 Filesystem

If the O2CB cluster is offline, start it. The format operation needs the cluster to be online, as it needs to ensure that the volume is not mounted on some node in the cluster.

Create the OCFS2 Filesystem

Unlike the other tasks in this section, creating the OCFS2 filesystem should only be executedon one node in the RAC cluster. You will be executing all commands in this section fromlinux1only.

Note that it is possible to create and mount the OCFS2 file system using either the GUI toolocfs2consoleor the command-line toolmkfs.ocfs2. From theocfs2consoleutility, use the menu[Tasks] - [Format].

See the instructions below on how to create the OCFS2 file system using the command-line toolmkfs.ocfs2.

To create the filesystem, use the Oracle executablemkfs.ocfs2. For the purpose of this example, I run the following command only fromlinux1as therootuser account:

$su -#mkfs.ocfs2 -b 4K -C 32K -N 4 -L oradatafiles /dev/sda1mkfs.ocfs2 1.0.2 Filesystem label=oradatafiles Block size=4096 (bits=12) Cluster size=32768 (bits=15) Volume size=1011675136 (30873 clusters) (246984 blocks) 1 cluster groups (tail covers 30873 clusters, rest cover 30873 clusters) Journal size=16777216 Initial number of node slots: 4 Creating bitmaps: done Initializing superblock: done Writing system files: done Writing superblock: done Writing lost+found: done mkfs.ocfs2 successful

Mount the OCFS2 Filesystem

Now that the file system is created, you can mount it. Let's first do it using the command-line, then I'll show how to include it in the/etc/fstabto have it mount on each boot. Mounting the filesystem will need to be performedon all nodes in the Oracle RAC clusteras therootuser account.

First, here is how to manually mount the OCFS2 file system from the command line. Remember, this needs to be performed as therootuser account:

$su -#mount -t ocfs2 -o datavolume /dev/sda1 /u02/oradata/orcl

If the mount was successful, you will simply got your prompt back. You should, however, run the following checks to ensure the fil system is mounted correctly.

Let's use themountcommand to ensure that the new filesystem is really mounted. This should be performed on all nodes in the RAC cluster:

#mount/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) usbfs on /proc/bus/usb type usbfs (rw) /dev/hda1 on /boot type ext3 (rw) none on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) cartman:SHARE2 on /cartman type nfs (rw,addr=192.168.1.120) configfs on /config type configfs (rw) ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)/dev/sda1 on /u02/oradata/orcl type ocfs2 (rw,_netdev,datavolume)

Note:You are using thedatavolumeoption to mount the new filesystem here. Oracle database usersmustmount any volume that will contain the Voting Disk file, Cluster Registry (OCR), Data files, Redo logs, Archive logs, and Control files with thedatavolumemount option so as to ensure that the Oracle processes open the files with theo_directflag.

Any other type of volume, including an Oracle home (not used in this guide), shouldnotbe mounted with this mount option.

The volume will mount after a short delay, usually around five seconds. It does so to let the heartbeat thread stabilize. In a future release, Oracle plans to add support for a global heartbeat, which will make most mounts instantaneous.

Configure OCFS to Mount Automatically at Startup

Let's review what you've done so far. You downloaded and installed OCFS2, which will be used to store the files needed byCluster Managerfiles. After going through the install, you loaded the OCFS2 module into the kernel and then formatted the clustered filesystem. Finally, you mounted the newly created filesystem. This section walks through the steps responsible for mounting the new OCFS2 file system each time the machine(s) are booted.

Start by adding the following line to the/etc/fstabfileon all nodes in the RAC cluster:

/dev/sda1 /u02/oradata/orcl ocfs2 _netdev,datavolume 0 0

Notice the_netdevoption for mounting this filesystem. The_netdevmount option is a must for OCFS2 volumes; it indicates that the volume is to be mounted after the network is started and dismounted before the network is shutdown.

Now, let's make sure that theocfs2.kokernel module is being loaded and that the file system will be mounted during the boot process.

If you have been following along with the examples in this article, the actions to load the kernel module and mount the OCFS2 file system should already be enabled. However, you should still check those options by running the followingon all nodes in the RAC clusteras therootuser account:

$su -#chkconfig --list o2cbo2cb 0:off 1:off 2:on 3:on4:on5:on6:off
The flags that I have marked inboldshould be set to "on".

Check Permissions on New OCFS2 Filesystem

Use thelscommand to check ownership. The permissions should be set to 0775 with owner "oracle" and group "dba". If this is not the case for all nodes in the cluster (which was the case for me), then it is very possible that the "oracle" UID (175in this example) and/or the "dba" GID (115in this example) are not the same across all nodes.

Let's first check the permissions:

#ls -ld /u02/oradata/orcldrwxr-xr-x 3root root4096 Sep 29 12:11 /u02/oradata/orcl
As you can see from the listing above, theoracleuser account (and thedbagroup) will not be able to write to this directory. Let's fix that:
#chown oracle.dba /u02/oradata/orcl#chmod 775 /u02/oradata/orcl
Let's now go back and re-check that the permissions are correct for each node in the cluster:
#ls -ld /u02/oradata/orcldrwxrwxr-x 3oracle dba4096 Sep 29 12:11 /u02/oradata/orcl

Adjust the O2CB Heartbeat Threshold

This is a very important section when configuring OCFS2 for use by Oracle Clusterware's two shared files on our FireWire drive. During testing, I was able to install and configure OCFS2, format the new volume, and finally install Oracle Clusterware (with its two required shared files; the voting disk and OCR file), located on the new OCFS2 volume. I was able to install Oracle Clusterware and see the shared drive, however, during my evaluation I was receiving many lock-ups and hanging after about 15 minutes when the Clusterware software was running on both nodes. It always varied on which node would hang (eitherlinux1orlinux2in my example). It also didn't matter whether there was a high I/O load or none at all for it to crash (hang).

Keep in mind that the configuration you are creating is a rather low-end setup being configured with slow disk access with regards to the FireWire drive. This is by no means a high-end setup and susceptible to bogus timeouts.

After looking through the trace files for OCFS2, it was apparent that access to the voting disk was too slow (exceeding the O2CB heartbeat threshold) and causing the Oracle Clusterware software (and the node) to crash.

The solution I used was to simply increase the O2CB heartbeat threshold from its default setting of 7, to 301 (and in some cases as high as 900). This is a configurable parameter that is used to compute the time it takes for a node to "fence" itself.

First, let's see how to determine what the O2CB heartbeat threshold is currently set to. This can be done by querying the/procfile system as follows:

#cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold7
The value is 7, but what does this value represent? Well, it is used in the formula below to determine the fence time (in seconds):
[fence time in seconds] = (O2CB_HEARTBEAT_THRESHOLD - 1) * 2
So, with a O2CB heartbeat threshold of 7, you would have a fence time of:
(7 - 1) * 2 = 12 seconds
You need a much larger threshold (600 seconds to be exact) given your slower FireWire disks. For 600 seconds, you will want a O2CB_HEARTBEAT_THRESHOLD of 301 as shown below:
(301 - 1) * 2 = 600 seconds

Let's see now how to increase the O2CB heartbeat threshold from 7 to 301. This will need to be performed on both nodes in the cluster. You first need to modify the file/etc/sysconfig/o2cband set O2CB_HEARTBEAT_THRESHOLD to 301:

# O2CB_ENABELED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=ocfs2 # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.O2CB_HEARTBEAT_THRESHOLD=301

After modifying the file/etc/sysconfig/o2cb, you need to alter theo2cbconfiguration. Again, this should be performed on all nodes in the cluster.

#umount /u02/oradata/orcl/#/etc/init.d/o2cb unload#/etc/init.d/o2cb configureLoad O2CB driver on boot (y/n) [y]:yCluster to start on boot (Enter "none" to clear) [ocfs2]:ocfs2Writing O2CB configuration: OK Loading module "configfs": OK Mounting configfs filesystem at /config: OK Loading module "ocfs2_nodemanager": OK Loading module "ocfs2_dlm": OK Loading module "ocfs2_dlmfs": OK Mounting ocfs2_dlmfs filesystem at /dlm: OK Starting cluster ocfs2: OK
You can now check again to make sure the settings took place in for the o2cb cluster stack:
#cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold301

Important Note:The value of 301 used for the O2CB heartbeat threshold will not work for all the FireWire drives listed in this guide. Use the following chart to determine the O2CB heartbeat threshold value that should be used.

FireWire Drive
O2CB Heartbeat Threshold Value
Maxtor OneTouch II 300GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G300)
301
Maxtor OneTouch II 250GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G250)
301
Maxtor OneTouch II 200GB USB 2.0 / IEEE 1394a External Hard Drive - (E01A200)
301
LaCie Hard Drive, Design by F.A. Porsche 250GB, FireWire 400 - (300703U)
600
LaCie Hard Drive, Design by F.A. Porsche 160GB, FireWire 400 - (300702U)
600
LaCie Hard Drive, Design by F.A. Porsche 80GB, FireWire 400 - (300699U)
600
Dual Link Drive Kit, FireWire Enclosure, ADS Technologies - (DLX185)
901
Maxtor OneTouch 250GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A250)
600
Maxtor OneTouch 200GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A200)
600

Reboot Both Nodes

Before starting the next section, this would be a good place to reboot all of the nodes in the RAC cluster. When the machines come up, ensure that the cluster stack services are being loaded and the new OCFS2 file system is being mounted:

#mount/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) usbfs on /proc/bus/usb type usbfs (rw) /dev/hda1 on /boot type ext3 (rw) none on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) cartman:SHARE2 on /cartman type nfs (rw,addr=192.168.1.120) configfs on /config type configfs (rw) ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)/dev/sda1 on /u02/oradata/orcl type ocfs2 (rw,_netdev,datavolume)
You should also verify that the O2CB heartbeat threshold is set correctly (to our new value of 301):
#cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold301

How to Determine OCFS2 Version

To determine which version of OCFS2 is running, use:

#cat /proc/fs/ocfs2/versionOCFS2 1.0.4 Fri Aug 26 12:31:58 PDT 2005 (build 0a22e88ab648dc8d2a1f9d7796ad101c)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值