RAC:real application clusters introduction
ORACLE RAC原理:
在一个应用环境当中,所有的服务器使用和管理同一个数据库,目的是为了分散每一台服务器的工作量,硬件上至少需要两台以上的服务器,而且还需一个共享存储设备。还需要两类软件,一个是集群软件,另外一个就是Oracle数据库中的RAC组件,所有服务器上的OS都应该是同一类OS。
根据负载均衡的配置策略,当一个客户端发送请求到某一台服务的listener后,这台服务器根据负载均衡策略,会把请求发送给本机的RAC组件处理,也可能会发送给另外一台服务器的RAC组件处理,处理完请求后,RAC会通过集群软件来访问我们的共享存储设备。
逻辑结构上,每一个参加集群的节点有一个独立的instance,这些instance访问同一个数据库。节点之间通过集群软件的通讯层(communication layer)来进行通讯。同时为了减少IO的消耗,存在了一个全局缓存服务,因此每一个数据库的instance,都保留了一份相同的数据库cache。
RAC中的特点是:
每一个节点的instance都有自己的SGA,background process,redo logs,undo表空间
所有节点都共享一份datafiles和controlfiles
Oracle缓存融合技术(Cache fusion):
- 保证缓存的一致性
- 减少共享磁盘IO的消耗
缓存融合原理:
- 其中一个节点会从共享数据库中读取一个block到db cache中
- 这个节点会在所有的节点进行交叉db block copy
- 当任何一个节点缓存被修改的时候,就会在节点之间进行缓存修改
- 为了达到存储的一致最终修改的结果也会写到磁盘上
ClusterWare组件:
有四种服务
- Crsd - 集群资源服务
- Cssd - 集群同步服务
- Evmd - 事件管理服务
- oprocd - 节点检测监控
三类Resource
- VIP - 虚拟IP地址(Virtual IP)
- OCR - Oracle Cluster Registry(集群注册文件),记录每个节点的相关信息
- Voting Disk - Establishes quorum (表决磁盘),仲裁机制用于仲裁多个节点向共享节点同时写的行为,这样做是为了避免发生冲突
RAC组件:
提供过了额外的进程,用来维护数据库
- LMS - Gobal Cache Service Process 全局缓存服务进程
- LMD - Global Enqueue Service Daemon 全局查询服务守护进程
- LMON - Global Enqueue Service Monitor全局查询服务监视进程
- LCK0 - Instance Enqueue Process 实例查询进程
规划
拓扑图:
IP规划
Node |
Public IP(Bond0) |
Heartbeat(eth2) |
Private IP |
System |
hostname |
Memory |
RAC1 |
192.168.100.241/24 |
192.168.90.1/24 |
eth3:192.168.80.1/24 |
Centos 6.4 |
rac1.example.com |
3G |
RAC2 |
192.168.100.242/24 |
192.168.90.2/24 |
eth3:192.168.80.1/24 |
Centos 6.4 |
rac2.example.com |
3G |
Storage |
- |
- |
bond0:192.168.80.3/24 |
CentOS 6.4 |
iscsi |
512M |
Storage磁盘规划
存储组件 |
文件系统 |
卷大小 |
ASM卷组名 |
ASM冗余 |
ASM磁盘组 |
OCR/表决磁盘 |
ASM |
2G |
+CRS |
External |
DISK1 |
数据库文件 |
ASM |
40G |
+RACDB_DATA |
External |
DISK2 |
快速恢复区 |
ASM |
40G |
+FRA |
External |
DISK3 |
安装前准备
环境:Centos 6.4
Oracle 11.2.0.4
附件:
hosts
cvuqdisk-1.0.9-1.rpm
pdksh-5.2.14-1.i386.rpm
plsql+instantclient_11_2安装包
利用ISCSI搭建后台存储
1 2 3 4 5
| [root@iscsi ~] Target 1: iqn.disk1 Target 2: iqn.disk2 Target 3: iqn.disk3 [root@iscsi ~]
|
rac1和rac2挂载后分区
1 2 3 4 5
| [root@rac1 software] Disk /dev/sdb: 2147 MB, 2147483648 bytes Disk /dev/sdc: 42.9 GB, 42949672960 bytes Disk /dev/sdd: 42.9 GB, 42949672960 bytes [root@rac1 software]
|
1 2 3 4 5 6 7 8 9 10
| [root@rac1 ~] rac1.example.com [root@rac1 ~] [root@rac2 ~] rac2.example.com [root@rac2 ~] [root@rac1 ~] PING rac2 (192.168.100.242) 56(84) bytes of data. 64 bytes from rac2 (192.168.100.242): icmp_seq=1 ttl=64 time=0.622 ms 64 bytes from rac2 (192.168.100.242): icmp_seq=2 ttl=64 time=0.369 ms
|
在rac1和rac2上需要做地址解析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| [root@rac1 ~] 192.168.100.241 rac1 rac1.example.com 192.168.100.242 rac2 rac2.example.com 192.168.100.244 rac1-vip 192.168.100.245 rac2-vip 192.168.100.246 racscan 192.168.90.1 rac1-priv 192.168.90.2 rac2-priv 192.168.80.1 rac1-s 192.168.80.2 rac2-s 192.168.80.3 iscsi [root@rac1 ~]
|
关闭防火墙和Selinux
要求:内存至少2G,swap:16GB内存以内内存的1.5或者1倍,16GB内存以上设置16GB
1 2 3 4 5 6 7
| [root@rac1 software] Linux rac1 2.6.32-358.el6.x86_64 [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] tmpfs 10G 0 10G 0% /dev/shm [root@rac1 ~]
|
1 2 3 4 5 6 7 8 9 10 11
| [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~]
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| [root@rac1 ~] oracle soft nproc 2047 oracle hard nproc 16384 oracle soft nofile 1024 oracle hard nofile 65536 oracle soft stack 10240 grid soft nproc 2047 grid hard nproc 16384 grid soft nofile 1024 grid hard nofile 65536 grid soft stack 10240 [root@rac1 ~] [root@rac1 ~] session required pam_limits.so [root@rac1 ~] [root@rac1 ~] fs.aio-max-nr = 1048576 fs.file-max = 6815744 kernel.shmall = 2097152 kernel.shmmax = 4294967295 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 [root@rac1 ~]
|
1 2 3 4 5 6 7
| [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~]
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| [root@rac1 ~] [oracle@rac1 ~]$ tail -n 8 .bash_profile export ORACLE_BASE=/u01/app/oracle export ORACLE_SID=racdb1 export ORACLE_HOME=$ORACLE_BASE/product/11.2.0/db_1 export LD_LIBARY_PATH=$ORACLE_HOME/lib export PATH=$ORACLE_HOME/bin:$PATH export ORACLE_UNQNAME=racdb umask 022 [oracle@rac1 ~]$ source .bash_profile [oracle@rac1 ~]$ echo $ORACLE_HOME /u01/app/oracle/product/11.2.0/db_1 [oracle@rac1 ~]$ echo $ORACLE_BASE /u01/app/oracle [oracle@rac1 ~]$ [grid@rac1 ~]$ tail -n 7 .bash_profile export ORACLE_BASE=/u01/app/grid export ORACLE_SID=+ASM1 export ORACLE_HOME=/u01/app/11.2.0/grid export LD_LIBRARY_PATH=$ORACLE_HOME/lib export PATH=$ORACLE_HOME/bin:$PATH umask 022 [grid@rac1 ~]$ source .bash_profile [gr[grid@rac1 ~]$ echo $ORACLE_BASE /u01/app/grid [grid@rac1 ~]$ echo $ORACLE_HOME /u01/app/11.2.0/grid [grid@rac1 ~]$ [grid@rac2 ~]$ grep SID .bash_profile export ORACLE_SID=+ASM2 [grid@rac2 ~]$ [oracle@rac2 ~]$ grep SID .bash_profile export ORACLE_SID=racdb2 [oracle@rac2 ~]$ 以上步骤第二个节点也同样操作
|
1 2 3 4 5
| [grid@rac1 sshsetup]$ pwd /software/grid/sshsetup [grid@rac1 sshsetup]$ ./sshUserSetup.sh -user grid -hosts rac2.example.com -advanced -exverify -confirm -noPromptPassphrase [grid@rac1 sshsetup]$
|
检查下会不会出现下面的情况。如果出现在测试SSH互信时会包INS-06006的错误
安装cvuqdisk,发现共享磁盘(rac1和rac2都安装)
1 2 3
| [root@rac1 ~] [root@rac1 software] [root@rac1 software]
|
使用UDEV绑定ASM
1 2 3 4 5 6 7
| declare -i num =0 for i in b c d; do let num=$num+1 echo "KERNEL==\"sd*\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/\$name\", RESULT==\"`/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/sd$i`\", NAME=\"asm-disk$num\", OWNER=\"grid\", GROUP=\"asmadmin\", MODE=\"0660\"" >> /etc/udev/rules.d/12-oracle-asmdevices.rules done
|
1 2 3 4
| [root@rac1 ~] Starting udev: [ OK ] [root@rac1 ~]
|
1 2 3 4 5
| KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="1IET_00010001", NAME="asm-disk1", OWNER="grid", GROUP="asmadmin", MODE="0660" KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="1IET_00020002", NAME="asm-disk2", OWNER="grid", GROUP="asmadmin", MODE="0660" KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="1IET_00030003", NAME="asm-disk3", OWNER="grid", GROUP="asmadmin", MODE="0660"
|
固化磁盘后Linux就无法读取到/dev/sdb、/dev/sdc、/dev/sdd了,在rac1上固化磁盘后,rac2上也需要执行下上面的脚本,注意匹配PROGRAM的值,需要与节点一的相同
注:如果使用UDEV绑定磁盘后所属组不是asmadmin,需要手动改下,否则在创建数据库(dbca)时会报错,如下图:
1 2
| [root@rac1 ~] [root@rac1 ~]
|
更改后:
安装Grid
用grid用户登录
1 2
| [grid@rac1 grid]$ export DISPLAY=192.168.100.251:0.0 [grid@rac1 grid]$ ./runInstaller
|
需要在/etc/hosts里面解析racscan,否则会报:
1
| [INS-40718] Single Client Access Name (SCAN):RACSCAN1 could not be resolved.
|
但racscan不是真实存在的地址
Test和Setup均能通过
1
| [INS-40912] Virtual host name: rac1-vip is assigned to another system on the network.
|
参考:https://community.oracle.com/thread/2594182
如果出现上面的报错信息,看下/etc/hosts里面的Virtual IP是否存在,ping一下
在grid用户的环境变量,$ORACLE_HOME 不能是 $ORACLE_BASE 子目录,否则会报错:
1
| ORACLE 11G RAC [INS-32026] The Software Location specified should not
|
因为这里使用是的udev绑定,这个警告可以忽略
1 2 3 4 5 6 7 8 9
| [root@rac1 ~] Changing permissions of /u01/app/oraInventory. Adding read,write permissions for group. Removing read,write,execute permissions for world. Changing groupname of /u01/app/oraInventory to oinstall. The execution of the script is complete. [root@rac1 ~] [root@rac1 ~]
|
执行上面这个脚本如果报下面的错误
解决方法:
再执行/u01/app/11.2.0/grid/root.sh就没问题了
执行过程如下:
rac1_u01_app_11.2.0_grid_root.sh执行过程
rac1_u01_app_11.2.0_grid_root.sh执行过程
在rac1和rac2上都执行下↑↑↑(/u01/app/oraInventory/orainstRoot.sh和/u01/app/11.2.0/grid/root.sh)
在Install Product过程中如果出现下面的情况:(忽略)
如果出现这个错误且能ping通racscan地址(192.168.100.246),则可忽略
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
| [grid@rac1 grid]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [grid@rac1 grid]$ olsnodes -n rac1 1 rac2 2 [grid@rac1 grid]$ crsctl check ctss CRS-4701: The Cluster Time Synchronization Service is in Active mode. CRS-4702: Offset (in msec): 0 [grid@rac1 grid]$ srvctl status asm -a ASM is running on rac2,rac1 ASM is enabled. [grid@rac1 grid]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 2592 Available space (kbytes) : 259528 ID : 1975731354 Device/File Name : +CRS Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user [grid@rac1 grid]$ crsctl query css votedisk -- ----- ----------------- --------- --------- 1. ONLINE 65b9ef5913044fdbbff0d1b75e91172e (/dev/asm-disk1) [CRS] Located 1 voting disk(s). [grid@rac1 grid]$
|
创建ASM磁盘
grid用户登录
1 2 3 4
| [grid@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [grid@rac1 ~]$ xhost + access control disabled, clients can connect from any host [grid@rac1 ~]$ asmca
|
安装database
用Oracle用户登录
1 2 3 4 5
| [oracle@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [oracle@rac1 ~]$ xhost + access control disabled, clients can connect from any host [oracle@rac1 ~]$ cd /software/database [oracle@rac1 database]$ ./runInstaller
|
做Oracle用户SSH互信(按照之前grid用户一样做互信)
1
| [oracle@rac1 sshsetup]$ ./sshUserSetup.sh -user oracle -hosts rac2.example.com -advanced -exverify -confirm -noPromptPassphrase
|
节点rac1和节点rac2都执行下面的脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| [root@rac1 ~] Performing root user operation for Oracle 11g The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/app/oracle/product/11.2.0/db_1 Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite. The contents of "oraenv" have not changed. No need to overwrite. The contents of "coraenv" have not changed. No need to overwrite. Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Finished product-specific root actions. [root@rac1 ~]
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| [root@rac1 ~] [grid@rac1 ~]$ crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.CRS.dg ora....up.type 0/5 0/ ONLINE ONLINE rac1 ora.FRA.dg ora....up.type 0/5 0/ ONLINE ONLINE rac1 ora....ER.lsnr ora....er.type 0/5 0/ ONLINE ONLINE rac1 ora....N1.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE rac1 ora....DATA.dg ora....up.type 0/5 0/ ONLINE ONLINE rac1 ora.asm ora.asm.type 0/5 0/ ONLINE ONLINE rac1 ora.cvu ora.cvu.type 0/5 0/0 ONLINE ONLINE rac1 ora.gsd ora.gsd.type 0/5 0/ OFFLINE OFFLINE ora....network ora....rk.type 0/5 0/ ONLINE ONLINE rac1 ora.oc4j ora.oc4j.type 0/1 0/2 ONLINE ONLINE rac1 ora.ons ora.ons.type 0/3 0/ ONLINE ONLINE rac1 ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 OFFLINE OFFLINE ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip ora....t1.type 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 OFFLINE OFFLINE ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip ora....t1.type 0/0 0/0 ONLINE ONLINE rac2 ora.scan1.vip ora....ip.type 0/0 0/0 ONLINE ONLINE rac1 [grid@rac1 ~]$
|
创建数据库
创建数据库:(用oracle用户登录)
1 2 3 4
| [oracle@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [oracle@rac1 ~]$ xhost + access control disabled, clients can connect from any host [oracle@rac1 ~]$ dbca
|
如果SGA and PGA内存不够在安装数据库时会出现下面的错误:
1 2
| ORA-00838: Specified value of MEMORY_TARGET is too small, needs to be at least 1408M ORA-01078:failure in processing system parameters
|
参考:http://yfshare.blog.51cto.com/8611708/1671927
如果安装到85%报上面这个错,查看到闪回区没空间了
解决方法:删除多余的归档文件,或设置较大的db_recovery_file_dest_size
检查集群运行状态
1 2
| srvctl status database -d racdb crs_stat -t -v
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
| [oracle@rac1 ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 11-JUL-2015 01:07:17 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production Start Date 10-JUL-2015 15:38:11 Uptime 0 days 9 hr. 29 min. 6 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora Listener Log File /u01/app/grid/diag/tnslsnr/rac1/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.244)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.241)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "racdb" has 1 instance(s). Instance "racdb1", status READY, has 1 handler(s) for this service... Service "racdbXDB" has 1 instance(s). Instance "racdb1", status READY, has 1 handler(s) for this service... The command completed successfully [oracle@rac1 ~]$ [oracle@rac1 ~]$ sqlplus / as sysdba SQL> select status from v$instance; STATUS ------------ OPEN SQL>
|
配置EM管理器
如果启动EM web界面管理工具,出现下面的问题,则使用emca -config dbcontrol db重建EM
1 2
| [oracle@rac1 ~]$ emca -deconfig dbcontrol db -cluster
|
删除集群EM执行过程
emctl命令
重建EM
用oracle用户登录
1 2 3 4
| [oracle@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [oracle@rac1 ~]$ xhost + access control disabled, clients can connect from any host [oracle@rac1 ~]$ dbca
|
注:
1 2 3
| [oracle@rac1 ~]$emctl status dbconsole Environment variable ORACLE_UNQNAME not defined. Please set ORACLE_UNQNAME to database unique name. [oracle@rac1 ~]$
|
如果报上面的错误,是oracle用户的ORACLE_UNQNAME环境变量未设置。如果报下面的错误,则是ORACLE_UNQNAME设置不正确
正确设置
oracle RAC集群默认是开机自启。根据配置不同,花费的时间不一样
配置PL/SQL
使用PL/SQL登录oracle RAC
安装plsql+ora10client
plsql+instantclient_11_2安装包
C:\Ora10InstantClient\network\admin\tnsnames.ora(在ora10client安装目录下新建network/admin目录,在RAC服务器上把$ORACLE_HOME/network/admin/tnsnames.ora文件放到ora10client安装目录下的admin目录下)
1 2 3
| [root@rac1 ~] 192.168.100.246 racscan [root@rac1 ~]
|
格式为:把HOST = racscan改为HOST = racscan对应的地址
1 2 3 4 5 6 7 8 9
| RACDB = (DESCRIPTION = (ADDRESS = (PROTOCOL =TCP)(HOST = 192.168.100.246)(PORT= 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) ) )
|
打开PL/SQL—>Tools—>Preferences—>Connection
本文出自”Jack Wang Blog”:http://www.yfshare.vip/2017/04/12/部署Oracle-RAC集群/