数据库启动时有莫名的提示map size mismatch; abort

环境:centos 6.4_64  oracle 10.2.0.4.8

问题描述:

SQL> startup nomount
mbind: Invalid argument
mbind: Invalid argument
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory
map size mismatch; abort
: No such file or directory
map size mismatch; abort
: No such file or directory
map size mismatch; abort
: No such file or directory
map size mismatch; abort
: No such file or directory
mbind: Invalid argument
mbind: Invalid argument
mbind: Invalid argument
mbind: Invalid argument
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory
map size mismatch; abort
: No such file or directory
map size mismatch; abort
: No such file or directory
map size mismatch; abort
: No such file or directory
map size mismatch; abort
: No such file or directory
ORACLE instance started.

解决方式:

startup nomount 之后查一下以下信息 并贴出来


SQL> col name for a40
SQL> col value for a40
SQL> col describ for a40
SQL> set linesize 130
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2  FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3  WHERE x.inst_id = USERENV ('Instance')
  4  AND y.inst_id = USERENV ('Instance')
  5  AND x.indx = y.indx
  6  AND upper(x.ksppinm) like '%NUMA%';


NAME                                     VALUE                                    DESCRIB
---------------------------------------- ---------------------------------------- ----------------------------------------
_rm_numa_sched_enable                    FALSE                                    Is Resource Manager (RM) related NUMA sc
                                                                                  heduled policy enabled


_NUMA_pool_size                          Not specified                            aggregate size in bytes of NUMA pool
_enable_NUMA_optimization                TRUE                                     Enable NUMA specific optimizations
_NUMA_instance_mapping                   Not specified                            Set of nodes that this instance should r
                                                                                  un on


_rm_numa_simulation_pgs                  0                                        number of PGs for numa simulation in res
                                                                                  ource manager




NAME                                     VALUE                                    DESCRIB
---------------------------------------- ---------------------------------------- ----------------------------------------
_rm_numa_simulation_cpus                 0                                        number of cpus per PG for numa simulatio
                                                                                  n in resource manager


_db_block_numa                           4                                        Number of NUMA nodes


alter system set "_enable_NUMA_optimization"= false scope=spfile;
alter system set "_db_block_numa"=1 scope=spfile;


startup force nomount;

Description

Oracle NUMA (Non Uniform Memory Architecture) support can be used with large SMP multiprocessor environments with NUMA hardware. When enabled Oracle NUMA support facilitates efficient use of underlying NUMA hardware and may improve database performance.

Oracle NUMA support needs the right combination of hardware, operating system and Oracle version.

With 10.2.0.4 and 11.1.0.7 patchsets, Oracle NUMA support can be enabled on common Operating Systems like AIX, HP-UX, Solaris, Linux and Windows if the underlying hardware characteristic is NUMA.

When running an Oracle database with NUMA support in a NUMA capable environment, Oracle will by default detect if the hardware and operating system are NUMA capable and enable Oracle NUMA support.

Some OS upgrades/patches may enable NUMA (For example on Linux NUMA is enabled with kernel release 2.6.9-67). Care should be taken before enabling NUMA support or leaving it on by default. Please see below under the “Recommendation” section. Contact your hardware vendor for recommendation and information on your system and operating system NUMA capabilities
Likelihood of Occurrence

The symptoms described in the following section generally occur when:

    Oracle database NUMA support is enabled
    Both operating system and hardware are NUMA capable.

And :

    Database workload is memory constrained (or apply too much memory pressure on a given NUMA memory pool)
    Dynamic reconfiguration events change the characteristics of the hardware or partition (e.g number of CPUs, memory available).

Some issues are OS/hardware specific. See bugs caused by NUMA section below.

Dynamic reconfiguration events removing resources from NUMA system such as an entire cell and all its processors are not supported (Please review Note:761065.1)
Possible Symptoms

The problems manifest usually with crashes with internal errors including:

    ORA-4031
    ORA-600 with argument KSKRECONFIGNUMA2
    ORA-600 with argument KSBASEND_INTERNAL
    ORA-600 with argument KSMHEAP_ALLOC1
    ORA-27302: FAILURE OCCURRED AT: SSKGXPCRE3


Workaround or Resolution
Recommendation

    Customers who have their SLA’s unaffected with NUMA enabled can continue to run with no changes.
    It is strongly recommended to customers who want to enable NUMA do sufficient testing before going into production.
    Apply all the bug fixes or patchset required for your Oracle database version. Fixes for all known NUMA issues in the Oracle database is available for download. Please review the known bugs section.

To disable NUMA consult the section "Steps to disable NUMA" covering the instructions to disable NUMA at the Oracle database level. To disable NUMA at the operating system or hardware level contact your hardware vendor.

Please review the "Caution" section below when disabling NUMA.
Caution

    Disabling or enabling NUMA can change application performance.
    It is strongly recommended to evaluate the performance after disabling or before enabling NUMA in a test environment.
    Operating system and/or hardware configuration may need to be tuned or reconfigured when disabling Oracle NUMA support. Consult your hardware vendor for more information or recommendation

Steps to disable NUMA

    Customers can download and apply patch for Bug 8199533 to disable NUMA support. This is a database patch and should be applied to the Database home. This patch is available for common platforms on 10.2.0.4 and 11.1.0.7 releases.
    If you apply patch for Bug 8199533 then Oracle will no longer enabled NUMA support by default even if it detects a NUMA capable environment.
    Oracle support does not recommended using init.ora parameter "_enable_NUMA_optimization" to disable NUMA. Customers should apply fix for Patch 8199533 to disable NUMA . The patch is rolling upgradeable.
    This patch does not need to be applied to the ASM home. However if the same Oracle home is used for both RDBMS and ASM instances then this patch can be applied to the Oracle home.

To enable NUMA optimization after applying patch 8199533, set init.ora parameter _enable_NUMA_optimization=TRUE





上面的参数禁用了 NUMA , NUMA的介绍可以参考 上面的文档,  不推荐 使用 Linux 6.2 + 10gR2这样的组合, 如果使用的话 本质上我也建议禁用NUMA


<think>好的,我现在要解决用户提到的“size mismatch error”问题。首先,我需要了解这个错误通常出现在什么情况下。根据编程经验,size mismatch通常与数据结构的大小不匹配有关,比如数组、张量或矩阵的维度不一致。例如,在矩阵乘法中,如果第一个矩阵的列数不等于第二个矩阵的行数,就会引发这样的错误。 接下来,我应该考虑用户可能遇到的具体场景。用户可能在处理数值计算、机器学习模型(如PyTorch或TensorFlow)或者数组操作时遇到这个错误。例如,在神经网络中,如果某一层的输出维度与下一层的输入维度不匹配,就会出现size mismatch。这时需要检查各层的参数设置是否正确,尤其是输入和输出的特征数。 另外,用户提供的引用内容中有提到关于符号钩子的使用(project.hook_symbol('symbol_name', ReplacementCheckEquals()))[^3],这可能涉及到代码中的符号绑定或替换,如果替换的符号与原有符号的参数或返回类型不一致,也可能导致size不匹配的错误。需要检查替换后的函数或方法是否与原函数在参数数量和类型上保持一致。 还可能涉及到构建工具的问题,比如用户提到的CMake配置(cmake -DLUAC_COMPATIBLE_FORMAT=ON -G "Visual Studio 15 2017 Win64" ..)[^2],如果项目中的某些库或模块在编译时链接了不兼容的版本,也可能间接导致运行时的大小不匹配错误。需要确认所有依赖项的版本是否兼容,并重新编译以确保一致性。 此外,用户引用的错误信息提到了系统级问题(Message:System error,node killed during node restart by other node)[^1],这可能与分布式计算或并发执行中的资源竞争有关。例如,在并行处理数据时,如果不同节点间的数据分片大小不一致,可能导致操作时的size mismatch。需要检查分布式环境下的数据划分策略和通信机制是否正确。 总结起来,解决size mismatch error需要从以下几个方面入手: 1. 检查数据结构(如数组、张量)的维度是否匹配,特别是在涉及数学运算或模型层间传递时。 2. 验证函数或方法的参数和返回值是否符合预期,特别是在使用钩子或替换符号时。 3. 确保所有依赖库和编译配置的一致性,避免因版本不兼容导致的问题。 4. 在分布式环境中,确认数据分片和节点间通信的正确性。 接下来,我需要将这些思路整理成步骤清晰的解决方案,帮助用户逐步排查和解决问题。同时,生成相关问题以进一步覆盖用户可能的需求。</think>### 解决 Size Mismatch 错误的步骤 **1. 检查数据维度或形状** - **数学运算场景**:若涉及矩阵乘法(如 $A \times B$),确保矩阵 $A$ 的列数等于矩阵 $B$ 的行数。例如: $$(m \times n) \times (n \times p) = (m \times p)$$ 若维度不匹配,调整输入数据或转置矩阵。 - **机器学习框架**(如 PyTorch/TensorFlow): ```python # 检查张量形状 print(tensor1.shape, tensor2.shape) # 调整维度:使用reshape、view或expand tensor1 = tensor1.view(batch_size, -1) ``` **2. 验证函数参数与返回值** - 若使用了符号钩子(如 `hook_symbol`),确保替换函数与原函数参数一致[^3]: ```python def original_func(x, y): return x + y # 替换函数参数需一致 def replacement_func(x, y): return x * y ``` **3. 检查依赖库版本与编译配置** - 若涉及编译错误(如 CMake 配置),确保第三方库的版本兼容性: ```bash # 清理并重新编译 rm -rf build/ mkdir build && cd build cmake -DLUAC_COMPATIBLE_FORMAT=ON -G "Visual Studio 15 2017 Win64" .. make ``` **4. 分布式环境排查** - 若错误与分布式节点相关,检查数据分片逻辑: ```python # 确保所有节点的数据分片大小一致 data_shard = data.chunk(num_nodes, dim=0) ``` --- ### 相关问题 1. **如何调试 PyTorch 中的张量维度不匹配问题?** 2. **CMake 编译时出现链接错误,如何解决库版本冲突?** 3. **分布式训练中如何保证数据分片的一致性?** --- 通过上述步骤逐步排查,可定位并解决大多数 `size mismatch` 错误。若问题仍存在,建议提供具体代码片段以进一步分析。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值