How to Check Device UUID or File System UUID. (文档 ID 1505398.1)

本文介绍在Linux环境下如何检测设备UUID、文件系统UUID及LVM2 UUID的方法。文章详细阐述了不同版本的Linux系统中获取UUID的具体步骤,并提供了实际操作案例。

How to Check Device UUID or File System UUID. (文档 ID 1505398.1)

如何检测设备的UUID或者文件系统的UUID

转到底部


In this Document

Goal

 

Solution


 

APPLIES TO:使用版本

Linux OS - Version Oracle Linux 5.0 and later
Linux x86-64
Linux x86

GOAL目标

In Linux, sometimes the name of devices or file systems are not persistent which will bring system in trouble, in such situation specify UUID (universally unique identifier) is the solution to identify the only unique component in the system.

在Linux中,有时设备或文件系统的名称不是永久的,这将使系统陷入麻烦,在这种情况下,指定UUID(通用唯一标识符)是识别系统中唯一唯一组件的解决方案。

SOLUTION方法

1. Device uuid

In OL5.x:

# scsi_id -u -g -s /block/sda
35000c50032387713

In OL6.x:

# scsi_id --whitelisted /dev/sdd
3600144f0da627ad70000503ad6ce0006

Or:

# udevadm info --query=all --path=/sys/block/sda
P: /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:2:0/0:2:0:0/block/sda
N: sda
W: 99
S: block/8:0
S: disk/by-id/scsi-364403a78570b200018ac2cd20575ec04
S: disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0
S: disk/by-id/wwn-0x64403a78570b200018ac2cd20575ec04
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:2:0/0:2:0:0/block/sda
E: MAJOR=8
E: MINOR=0
E: DEVNAME=/dev/sda
E: DEVTYPE=disk
E: SUBSYSTEM=block
E: MPATH_SBIN_PATH=/sbin
E: ID_SCSI=1
E: ID_VENDOR=LSI
E: ID_VENDOR_ENC=LSI
E: ID_MODEL=MRSASRoMB-4i
E: ID_MODEL_ENC=MRSASRoMB-4i
E: ID_REVISION=2.12
E: ID_TYPE=disk
E: ID_SERIAL_RAW=364403a78570b200018ac2cd20575ec04
E: ID_SERIAL=364403a78570b200018ac2cd20575ec04
E: ID_SERIAL_SHORT=64403a78570b200018ac2cd20575ec04
E: ID_WWN=0x64403a78570b2000
E: ID_WWN_VENDOR_EXTENSION=0x18ac2cd20575ec04
E: ID_WWN_WITH_EXTENSION=0x64403a78570b200018ac2cd20575ec04
E: ID_SCSI_SERIAL=0004ec7505d22cac1800200b57783a40
E: ID_BUS=scsi
E: ID_PATH=pci-0000:01:00.0-scsi-0:2:0:0
E: ID_PART_TABLE_TYPE=dos
E: LVM_SBIN_PATH=/sbin
E: DEVLINKS=/dev/block/8:0 /dev/disk/by-id/scsi-364403a78570b200018ac2cd20575ec04 /dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0 /dev/disk/by-id/wwn-0x64403a78570b200018ac2cd20575ec04

For multipath devices:

# multipath -ll -v

360080e500024a048000004044f3c64ee dm-0 SUN,LCSM100_F
size=95G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 7:0:0:0 sdb 8:16  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 8:0:0:0 sdk 8:160 active ghost running

Note: the device uuid is fixed value, the uuid of dm-mp device should be identical with the uuid of its paths. In most of situation could not be modified【修改,加减adj,v】 unless the device supports dynamic uuid feature.

注意:设备uuid是固定值,dm-mp设备的uuid应该与其路径的uuid相同。 在大多数情况下不能修改,除非设备支持动态uuid功能。

Usage:

The device uuid often being used to persistent the device name or dm-mpath name, following example bind the wwid with name oraasm1 persistently.

设备uuid通常用于持久化设备名称或dm-mpath名称,下面的示例将名称为oraasm1的wwid持久地绑定。

       multipath {
               wwid                  36006048caf0b141598afa8e2875797a1
               alias                   oraasm1
       }

Note: the partition (such as sda1 sdb1) does not have uuid.

注意:分区(如sda1 sdb1)没有uuid。

2. File system uuid

In OL5.x:

# blkid /dev/sda1

/dev/sda1: LABEL="/boot1" UUID="ae298adb-1b94-42a0-9dc9-a121c7561a5b" TYPE="ext3" SEC_TYPE="ext2" 

# /lib/udev/vol_id /dev/sda1
ID_FS_USAGE=filesystem
ID_FS_TYPE=ext3
ID_FS_VERSION=1.0
ID_FS_UUID=ae298adb-1b94-42a0-9dc9-a121c7561a5b
ID_FS_LABEL=/boot1
ID_FS_LABEL_SAFE=boot1

Note: the /dev/sdxx must be formated as file system.

Usage:

Could specify uuid in /etc/fstab to bind the device with mount directory persistently.

可以在/ etc / fstab中指定uuid以永久性地将设备与安装目录绑定。

UUID=xxx-xxx-xxx-xxx            /mount_dir                   ext3    defaults        1 2

Note: file system uuid will be changed after re-create file system.

注意:文件系统uuid将在重新创建文件系统后更改。

3. LVM2 uuid

# pvs -v
  PV         VG   Fmt  Attr PSize  PFree  DevSize PV UUID   
  /dev/sda2  vg0  lvm2 a--  48.81G     0   48.83G xCJzmN-oJmL-kMFl-JCrb-lfoH-movY-6x6K6O
  /dev/sda3  vg0  lvm2 a--  48.81G     0   48.83G 9iXmmM-kKqV-OYDb-eSVN-ymCw-wwVk-uY6fXo

# lvs
  LV       VG   #Seg Attr   LSize   Maj Min KMaj KMin Origin Snap%  Move Copy%  Log Convert LV UUID                               
  lvroot   vg0     3 -wi-ao 146.44G  -1  -1 253  0                                          C0l0R2-KhH8-N7Nk-BhXn-MJhS-35dn-XXdL1B
  lvasmlib vg1     1 -wi-a-   4.88G  -1  -1 253  6                                          5nlcKy-1kvs-l7qb-eIts-tEs6-E2JG-RisWDx

# vgs -v
  VG   Attr   Ext    #PV #LV #SN VSize   VFree  VG UUID                               
  vg0  wz--n- 32.00M   3   1   0 146.44G     0  ereADB-2w9v-O2P9-58OS-RN9Q-t2pV-8wXpSc
  vg1  wz--n-  4.00M   3   3   0 139.71G  9.95G LczKdV-Nq82-lNrr-EmI1-cerd-numb-1qV6m4

Usage:

In some case need recover some pv device, use the --uuid and --restorefile arguments of the pvcreate command to restore the physical volume. The following command restores the physical volume label with the backuped metadata.

在某些情况下,需要恢复一些pv设备,请使用pvcreate命令的--uuid和--restorefile参数来恢复物理卷。 以下命令使用备份的元数据恢复物理卷标签。

# pvcreate --uuid "0YnHNn-1COx-dohx-bwPf-aLyl-pO8F-f5PI5R" --restorefile /etc/lvm/archive/vg0_00000-1324010847.vg /dev/sda2
  Physical volume "/dev/sda2" successfully created

Note: lvm2 uuid will be changed after re-create.

Lvm2 uuid 将在重建后更改

#!/usr/bin/env python3 # Copyright (C) 2017 The Android Open Source Project # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from __future__ import absolute_import from __future__ import division from __future__ import print_function import argparse import atexit import hashlib import os import shutil import signal import subprocess import sys import tempfile import time import uuid import platform TRACE_TO_TEXT_SHAS = { 'linux': '7e3e10dfb324e31723efd63ac25037856e06eba0', 'mac': '21f0f42dd019b4f09addd404a114fbf2322ca8a4', } TRACE_TO_TEXT_PATH = tempfile.gettempdir() TRACE_TO_TEXT_BASE_URL = ('https://storage.googleapis.com/perfetto/') NULL = open(os.devnull) NOOUT = { 'stdout': NULL, 'stderr': NULL, } UUID = str(uuid.uuid4())[-6:] def check_hash(file_name, sha_value): file_hash = hashlib.sha1() with open(file_name, 'rb') as fd: while True: chunk = fd.read(4096) if not chunk: break file_hash.update(chunk) return file_hash.hexdigest() == sha_value def load_trace_to_text(os_name): sha_value = TRACE_TO_TEXT_SHAS[os_name] file_name = 'trace_to_text-' + os_name + '-' + sha_value local_file = os.path.join(TRACE_TO_TEXT_PATH, file_name) if os.path.exists(local_file): if not check_hash(local_file, sha_value): os.remove(local_file) else: return local_file url = TRACE_TO_TEXT_BASE_URL + file_name subprocess.check_call(['curl', '-L', '-#', '-o', local_file, url]) if not check_hash(local_file, sha_value): os.remove(local_file) raise ValueError("Invalid signature.") os.chmod(local_file, 0o755) return local_file PACKAGES_LIST_CFG = '''data_sources { config { name: "android.packages_list" } } ''' CFG_INDENT = ' ' CFG = '''buffers {{ size_kb: 63488 }} data_sources {{ config {{ name: "android.heapprofd" heapprofd_config {{ shmem_size_bytes: {shmem_size} sampling_interval_bytes: {interval} {target_cfg} }} }} }} duration_ms: {duration} write_into_file: true flush_timeout_ms: 30000 flush_period_ms: 604800000 ''' # flush_period_ms of 1 week to suppress trace_processor_shell warning. CONTINUOUS_DUMP = """ continuous_dump_config {{ dump_phase_ms: 0 dump_interval_ms: {dump_interval} }} """ PROFILE_LOCAL_PATH = os.path.join(tempfile.gettempdir(), UUID) IS_INTERRUPTED = False def sigint_handler(sig, frame): global IS_INTERRUPTED IS_INTERRUPTED = True def print_no_profile_error(): print("No profiles generated", file=sys.stderr) print( "If this is unexpected, check " "https://perfetto.dev/docs/data-sources/native-heap-profiler#troubleshooting.", file=sys.stderr) def known_issues_url(number): return ('https://perfetto.dev/docs/data-sources/native-heap-profiler' '#known-issues-android{}'.format(number)) KNOWN_ISSUES = { '10': known_issues_url(10), 'Q': known_issues_url(10), '11': known_issues_url(11), 'R': known_issues_url(11), } def maybe_known_issues(): release_or_codename = subprocess.check_output( ['adb', 'shell', 'getprop', 'ro.build.version.release_or_codename'] ).decode('utf-8').strip() return KNOWN_ISSUES.get(release_or_codename, None) SDK = { 'R': 30, } def release_or_newer(release): sdk = int(subprocess.check_output( ['adb', 'shell', 'getprop', 'ro.system.build.version.sdk'] ).decode('utf-8').strip()) if sdk >= SDK[release]: return True codename = subprocess.check_output( ['adb', 'shell', 'getprop', 'ro.build.version.codename'] ).decode('utf-8').strip() return codename == release def main(argv): parser = argparse.ArgumentParser() parser.add_argument( "-i", "--interval", help="Sampling interval. " "Default 4096 (4KiB)", type=int, default=4096) parser.add_argument( "-d", "--duration", help="Duration of profile (ms). 0 to run until interrupted. " "Default: until interrupted by user.", type=int, default=0) # This flag is a no-op now. We never start heapprofd explicitly using system # properties. parser.add_argument( "--no-start", help="Do not start heapprofd.", action='store_true') parser.add_argument( "-p", "--pid", help="Comma-separated list of PIDs to " "profile.", metavar="PIDS") parser.add_argument( "-n", "--name", help="Comma-separated list of process " "names to profile.", metavar="NAMES") parser.add_argument( "-f", "--functions", help="Comma-separated list of functions " "names to profile.", metavar="FUNCTIONS") parser.add_argument( "-c", "--continuous-dump", help="Dump interval in ms. 0 to disable continuous dump.", type=int, default=0) parser.add_argument( "--heaps", help="Comma-separated list of heaps to collect, e.g: malloc,art. " "Requires Android 12.", metavar="HEAPS") parser.add_argument( "--all-heaps", action="store_true", help="Collect allocations from all heaps registered by target." ) parser.add_argument( "--no-android-tree-symbolization", action="store_true", help="Do not symbolize using currently lunched target in the " "Android tree." ) parser.add_argument( "--disable-selinux", action="store_true", help="Disable SELinux enforcement for duration of " "profile.") parser.add_argument( "--no-versions", action="store_true", help="Do not get version information about APKs.") parser.add_argument( "--no-running", action="store_true", help="Do not target already running processes. Requires Android 11.") parser.add_argument( "--no-startup", action="store_true", help="Do not target processes that start during " "the profile. Requires Android 11.") parser.add_argument( "--shmem-size", help="Size of buffer between client and " "heapprofd. Default 8MiB. Needs to be a power of two " "multiple of 4096, at least 8192.", type=int, default=8 * 1048576) parser.add_argument( "--block-client", help="When buffer is full, block the " "client to wait for buffer space. Use with caution as " "this can significantly slow down the client. " "This is the default", action="store_true") parser.add_argument( "--block-client-timeout", help="If --block-client is given, do not block any allocation for " "longer than this timeout (us).", type=int) parser.add_argument( "--no-block-client", help="When buffer is full, stop the " "profile early.", action="store_true") parser.add_argument( "--idle-allocations", help="Keep track of how many " "bytes were unused since the last dump, per " "callstack", action="store_true") parser.add_argument( "--dump-at-max", help="Dump the maximum memory usage " "rather than at the time of the dump.", action="store_true") parser.add_argument( "--disable-fork-teardown", help="Do not tear down client in forks. This can be useful for programs " "that use vfork. Android 11+ only.", action="store_true") parser.add_argument( "--simpleperf", action="store_true", help="Get simpleperf profile of heapprofd. This is " "only for heapprofd development.") parser.add_argument( "--trace-to-text-binary", help="Path to local trace to text. For debugging.") parser.add_argument( "--print-config", action="store_true", help="Print config instead of running. For debugging.") parser.add_argument( "-o", "--output", help="Output directory.", metavar="DIRECTORY", default=None) args = parser.parse_args() fail = False if args.block_client and args.no_block_client: print( "FATAL: Both block-client and no-block-client given.", file=sys.stderr) fail = True if args.pid is None and args.name is None: print("FATAL: Neither PID nor NAME given.", file=sys.stderr) fail = True if args.duration is None: print("FATAL: No duration given.", file=sys.stderr) fail = True if args.interval is None: print("FATAL: No interval given.", file=sys.stderr) fail = True if args.shmem_size % 4096: print("FATAL: shmem-size is not a multiple of 4096.", file=sys.stderr) fail = True if args.shmem_size < 8192: print("FATAL: shmem-size is less than 8192.", file=sys.stderr) fail = True if args.shmem_size & (args.shmem_size - 1): print("FATAL: shmem-size is not a power of two.", file=sys.stderr) fail = True target_cfg = "" if not args.no_block_client: target_cfg += CFG_INDENT + "block_client: true\n" if args.block_client_timeout: target_cfg += ( CFG_INDENT + "block_client_timeout_us: %s\n" % args.block_client_timeout ) if args.no_startup: target_cfg += CFG_INDENT + "no_startup: true\n" if args.no_running: target_cfg += CFG_INDENT + "no_running: true\n" if args.dump_at_max: target_cfg += CFG_INDENT + "dump_at_max: true\n" if args.disable_fork_teardown: target_cfg += CFG_INDENT + "disable_fork_teardown: true\n" if args.all_heaps: target_cfg += CFG_INDENT + "all_heaps: true\n" if args.pid: for pid in args.pid.split(','): try: pid = int(pid) except ValueError: print("FATAL: invalid PID %s" % pid, file=sys.stderr) fail = True target_cfg += CFG_INDENT + 'pid: {}\n'.format(pid) if args.name: for name in args.name.split(','): target_cfg += CFG_INDENT + 'process_cmdline: "{}"\n'.format(name) if args.heaps: for heap in args.heaps.split(','): target_cfg += CFG_INDENT + 'heaps: "{}"\n'.format(heap) if args.functions: for functions in args.functions.split(','): target_cfg += CFG_INDENT + 'function_names: "{}"\n'.format(functions) if fail: parser.print_help() return 1 trace_to_text_binary = args.trace_to_text_binary if args.continuous_dump: target_cfg += CONTINUOUS_DUMP.format(dump_interval=args.continuous_dump) cfg = CFG.format( interval=args.interval, duration=args.duration, target_cfg=target_cfg, shmem_size=args.shmem_size) if not args.no_versions: cfg += PACKAGES_LIST_CFG if args.print_config: print(cfg) return 0 # Do this AFTER print_config so we do not download trace_to_text only to # print out the config. has_trace_to_text = True if trace_to_text_binary is None: os_name = None if sys.platform.startswith('linux'): os_name = 'linux' elif sys.platform.startswith('darwin'): os_name = 'mac' elif sys.platform.startswith('win32'): has_trace_to_text = False else: print("Invalid platform: {}".format(sys.platform), file=sys.stderr) return 1 arch = platform.machine() if arch not in ['x86_64', 'amd64']: has_trace_to_text = False if has_trace_to_text: trace_to_text_binary = load_trace_to_text(os_name) known_issues = maybe_known_issues() if known_issues: print('If you are experiencing problems, please see the known issues for ' 'your release: {}.'.format(known_issues)) # TODO(fmayer): Maybe feature detect whether we can remove traces instead of # this. uuid_trace = release_or_newer('R') if uuid_trace: profile_device_path = '/data/misc/perfetto-traces/profile-' + UUID else: user = subprocess.check_output( ['adb', 'shell', 'whoami']).decode('utf-8').strip() profile_device_path = '/data/misc/perfetto-traces/profile-' + user perfetto_cmd = ('CFG=\'{cfg}\'; echo ${{CFG}} | ' 'perfetto --txt -c - -o ' + profile_device_path + ' -d') if args.disable_selinux: enforcing = subprocess.check_output(['adb', 'shell', 'getenforce']) atexit.register( subprocess.check_call, ['adb', 'shell', 'su root setenforce %s' % enforcing]) subprocess.check_call(['adb', 'shell', 'su root setenforce 0']) if args.simpleperf: subprocess.check_call([ 'adb', 'shell', 'mkdir -p /data/local/tmp/heapprofd_profile && ' 'cd /data/local/tmp/heapprofd_profile &&' '(nohup simpleperf record -g -p $(pidof heapprofd) 2>&1 &) ' '> /dev/null' ]) profile_target = PROFILE_LOCAL_PATH if args.output is not None: profile_target = args.output else: os.mkdir(profile_target) if not os.path.isdir(profile_target): print("Output directory {} not found".format(profile_target), file=sys.stderr) return 1 if os.listdir(profile_target): print("Output directory {} not empty".format(profile_target), file=sys.stderr) return 1 perfetto_pid = subprocess.check_output( ['adb', 'exec-out', perfetto_cmd.format(cfg=cfg)]).strip() try: perfetto_pid = int(perfetto_pid.strip()) except ValueError: print("Failed to invoke perfetto: {}".format(perfetto_pid), file=sys.stderr) return 1 old_handler = signal.signal(signal.SIGINT, sigint_handler) print("Profiling active. Press Ctrl+C to terminate.") print("You may disconnect your device.") print() exists = True device_connected = True while not device_connected or (exists and not IS_INTERRUPTED): exists = subprocess.call( ['adb', 'shell', '[ -d /proc/{} ]'.format(perfetto_pid)], **NOOUT) == 0 device_connected = subprocess.call(['adb', 'shell', 'true'], **NOOUT) == 0 time.sleep(1) print("Waiting for profiler shutdown...") signal.signal(signal.SIGINT, old_handler) if IS_INTERRUPTED: # Not check_call because it could have existed in the meantime. subprocess.call(['adb', 'shell', 'kill', '-INT', str(perfetto_pid)]) if args.simpleperf: subprocess.check_call(['adb', 'shell', 'killall', '-INT', 'simpleperf']) print("Waiting for simpleperf to exit.") while subprocess.call( ['adb', 'shell', '[ -f /proc/$(pidof simpleperf)/exe ]'], **NOOUT) == 0: time.sleep(1) subprocess.check_call( ['adb', 'pull', '/data/local/tmp/heapprofd_profile', profile_target]) print( "Pulled simpleperf profile to " + profile_target + "/heapprofd_profile") # Wait for perfetto cmd to return. while exists: exists = subprocess.call( ['adb', 'shell', '[ -d /proc/{} ]'.format(perfetto_pid)]) == 0 time.sleep(1) profile_host_path = os.path.join(profile_target, 'raw-trace') subprocess.check_call( ['adb', 'pull', profile_device_path, profile_host_path], stdout=NULL) if uuid_trace: subprocess.check_call( ['adb', 'shell', 'rm', profile_device_path], stdout=NULL) if not has_trace_to_text: print('Wrote profile to {}'.format(profile_host_path)) print('This file can be opened using the Perfetto UI, https://ui.perfetto.dev') return 0 binary_path = os.getenv('PERFETTO_BINARY_PATH') if not args.no_android_tree_symbolization: product_out = os.getenv('ANDROID_PRODUCT_OUT') if product_out: product_out_symbols = product_out + '/symbols' else: product_out_symbols = None if binary_path is None: binary_path = product_out_symbols elif product_out_symbols is not None: binary_path += ":" + product_out_symbols trace_file = os.path.join(profile_target, 'raw-trace') concat_files = [trace_file] if binary_path is not None: with open(os.path.join(profile_target, 'symbols'), 'w') as fd: ret = subprocess.call([ trace_to_text_binary, 'symbolize', os.path.join(profile_target, 'raw-trace')], env=dict(os.environ, PERFETTO_BINARY_PATH=binary_path), stdout=fd) if ret == 0: concat_files.append(os.path.join(profile_target, 'symbols')) else: print("Failed to symbolize. Continuing without symbols.", file=sys.stderr) proguard_map = os.getenv('PERFETTO_PROGUARD_MAP') if proguard_map is not None: with open(os.path.join(profile_target, 'deobfuscation-packets'), 'w') as fd: ret = subprocess.call([ trace_to_text_binary, 'deobfuscate', os.path.join(profile_target, 'raw-trace')], env=dict(os.environ, PERFETTO_PROGUARD_MAP=proguard_map), stdout=fd) if ret == 0: concat_files.append( os.path.join(profile_target, 'deobfuscation-packets')) else: print("Failed to deobfuscate. Continuing without deobfuscated.", file=sys.stderr) if len(concat_files) > 1: with open(os.path.join(profile_target, 'symbolized-trace'), 'wb') as out: for fn in concat_files: with open(fn, 'rb') as inp: while True: buf = inp.read(4096) if not buf: break out.write(buf) trace_file = os.path.join(profile_target, 'symbolized-trace') trace_to_text_output = subprocess.check_output( [trace_to_text_binary, 'profile', trace_file]) profile_path = None print('caifc trace_file ' + str(trace_file)) print('caifc trace_to_text_output ' + str(trace_to_text_output)) for word in trace_to_text_output.decode('utf-8').split(): if 'heap_profile-' in word: profile_path = word if profile_path is None: print_no_profile_error() return 1 profile_files = os.listdir(profile_path) if not profile_files: print_no_profile_error() return 1 for profile_file in profile_files: shutil.copy(os.path.join(profile_path, profile_file), profile_target) subprocess.check_call( ['gzip'] + [os.path.join(profile_target, x) for x in profile_files]) symlink_path = None if args.output is None: symlink_path = os.path.join( os.path.dirname(profile_target), "heap_profile-latest") if os.path.lexists(symlink_path): os.unlink(symlink_path) os.symlink(profile_target, symlink_path) if symlink_path is not None: print("Wrote profiles to {} (symlink {})".format( profile_target, symlink_path)) else: print("Wrote profiles to {}".format(profile_target)) print("These can be viewed using pprof. Googlers: head to pprof/ and " "upload them.") if __name__ == '__main__': sys.exit(main(sys.argv))
09-10
现在还有错误,在master端,qsl@qsl:~$ sudo kubectl delete pods -n kube-system -l name=nvidia-device-plugin-ds pod "nvidia-device-plugin-daemonset-9cg2w" deleted qsl@qsl:~$ sudo kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds NAME READY STATUS RESTARTS AGE nvidia-device-plugin-daemonset-bmk6m 1/1 Running 0 13s qsl@qsl:~$ sudo kubectl describe node desktop-dgqtj9o | grep nvidia.com/gpu nvidia.com/gpu.present=true qsl@qsl:~$ sudo kubectl logs -n kube-system nvidia-device-plugin-daemonset-bmk6m I1029 09:21:42.889925 1 main.go:154] Starting FS watcher. I1029 09:21:42.889991 1 main.go:161] Starting OS watcher. I1029 09:21:42.890281 1 main.go:176] Starting Plugins. I1029 09:21:42.890298 1 main.go:234] Loading configuration. I1029 09:21:42.890366 1 main.go:242] Updating config with default resource matching patterns. I1029 09:21:42.890472 1 main.go:253] Running with config: { "version": "v1", "flags": { "migStrategy": "none", "failOnInitError": false, "nvidiaDriverRoot": "/", "gdsEnabled": false, "mofedEnabled": false, "plugin": { "passDeviceSpecs": false, "deviceListStrategy": [ "envvar" ], "deviceIDStrategy": "uuid", "cdiAnnotationPrefix": "cdi.k8s.io/", "nvidiaCTKPath": "/usr/bin/nvidia-ctk", "containerDriverRoot": "/driver-root" } }, "resources": { "gpus": [ { "pattern": "*", "name": "nvidia.com/gpu" } ] }, "sharing": { "timeSlicing": {} } } I1029 09:21:42.890488 1 main.go:256] Retreiving plugins. W1029 09:21:42.890610 1 factory.go:31] No valid resources detected, creating a null CDI handler I1029 09:21:42.890643 1 factory.go:107] Detected non-NVML platform: could not load NVML library: libnvidia-ml.so.1: cannot open shared object file: No such file or directory I1029 09:21:42.890665 1 factory.go:107] Detected non-Tegra platform: /sys/devices/soc0/family file not found E1029 09:21:42.890667 1 factory.go:115] Incompatible platform detected E1029 09:21:42.890669 1 factory.go:116] If this is a GPU node, did you configure the NVIDIA Container Toolkit? E1029 09:21:42.890671 1 factory.go:117] You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites E1029 09:21:42.890672 1 factory.go:118] You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start E1029 09:21:42.890674 1 factory.go:119] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes I1029 09:21:42.890676 1 main.go:287] No devices found. Waiting indefinitely. qsl@qsl:~$ 在agent端: qsl@DESKTOP-DGQTJ9O:~$ sudo cp /var/lib/rancher/k3s/agent/etc/containerd/config.toml /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl qsl@DESKTOP-DGQTJ9O:~$ sudo nano /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl qsl@DESKTOP-DGQTJ9O:~$ sudo systemctl restart k3s-agent qsl@DESKTOP-DGQTJ9O:~$ 其中,我修改了这个配置文件”步骤 2:删除重复的配置 请您滚动到文件的最底部。 将最后那个 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options] 块(包括它下面的两行)完全删除。 确保文件里只保留第一个(完整的)nvidia 配置块,就像这样: Ini, TOML ... [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime" SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true ... 保存并退出 (Ctrl+O, Ctrl+X)。“ 你现在给我一个最准确的方法去一个个检查,可以效率不高但是准确度要高
10-30
[root@master ~]# mkfs.xfs /dev/sda4 /dev/sda4: 没有那个文件或目录 Usage: mkfs.xfs /* blocksize */ [-b log=n|size=num] /* metadata */ [-m crc=0|1,finobt=0|1,uuid=xxx] /* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num, (sunit=value,swidth=value|su=num,sw=num|noalign), sectlog=n|sectsize=num /* force overwrite */ [-f] /* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2, projid32bit=0|1] /* no discard */ [-K] /* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n sunit=value|su=num,sectlog=n|sectsize=num, lazy-count=0|1] /* label */ [-L label (maximum 12 characters)] /* naming */ [-n log=n|size=num,version=2|ci,ftype=0|1] /* no-op info only */ [-N] /* prototype file */ [-p fname] /* quiet */ [-q] /* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx] /* sectorsize */ [-s log=n|size=num] /* version */ [-V] devicename <devicename> is required unless -d name=xxx is given. <num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB), xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB). <value> is xxx (512 byte blocks). [root@master ~]# yum install -y device-mapper-multipath 已加载插件:fastestmirror, langpacks Loading mirror speeds from cached hostfile base | 3.6 kB 00:00:00 centos-plus | 2.9 kB 00:00:00 docker-ce-stable | 3.5 kB 00:00:00 extras | 2.9 kB 00:00:00 updates | 2.9 kB 00:00:00 正在解决依赖关系 --> 正在检查事务 ---> 软件包 device-mapper-multipath.x86_64.0.0.4.9-123.el7 将被 升级 ---> 软件包 device-mapper-multipath.x86_64.0.0.4.9-136.el7_9 将被 更新 --> 正在处理依赖关系 kpartx = 0.4.9-136.el7_9,它被软件包 device-mapper-multipath-0.4.9-136.el7_9.x86_64 需要 --> 正在处理依赖关系 device-mapper-multipath-libs = 0.4.9-136.el7_9,它被软件包 device-mapper-multipath-0.4.9-136.el7_9.x86_64 需要 --> 正在检查事务 ---> 软件包 device-mapper-multipath-libs.x86_64.0.0.4.9-123.el7 将被 升级 ---> 软件包 device-mapper-multipath-libs.x86_64.0.0.4.9-136.el7_9 将被 更新 ---> 软件包 kpartx.x86_64.0.0.4.9-123.el7 将被 升级 ---> 软件包 kpartx.x86_64.0.0.4.9-136.el7_9 将被 更新 --> 解决依赖关系完成 依赖关系解决 =============================================================================================================== Package 架构 版本 源 大小 =============================================================================================================== 正在更新: device-mapper-multipath x86_64 0.4.9-136.el7_9 updates 148 k 为依赖而更新: device-mapper-multipath-libs x86_64 0.4.9-136.el7_9 updates 268 k kpartx x86_64 0.4.9-136.el7_9 updates 81 k 事务概要 =============================================================================================================== 升级 1 软件包 (+2 依赖软件包) 总计:497 k 总下载量:416 k Downloading packages: No Presto metadata available for updates (1/2): device-mapper-multipath-0.4.9-136.el7_9.x86_64.rpm | 148 kB 00:00:00 (2/2): device-mapper-multipath-libs-0.4.9-136.el7_9.x86_64.rpm | 268 kB 00:00:00 --------------------------------------------------------------------------------------------------------------- 总计 1.0 MB/s | 416 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction 正在更新 : device-mapper-multipath-libs-0.4.9-136.el7_9.x86_64 1/6 正在更新 : kpartx-0.4.9-136.el7_9.x86_64 2/6 正在更新 : device-mapper-multipath-0.4.9-136.el7_9.x86_64 3/6 清理 : device-mapper-multipath-0.4.9-123.el7.x86_64 4/6 清理 : device-mapper-multipath-libs-0.4.9-123.el7.x86_64 5/6 清理 : kpartx-0.4.9-123.el7.x86_64 6/6 验证中 : kpartx-0.4.9-136.el7_9.x86_64 1/6 验证中 : device-mapper-multipath-libs-0.4.9-136.el7_9.x86_64 2/6 验证中 : device-mapper-multipath-0.4.9-136.el7_9.x86_64 3/6 验证中 : kpartx-0.4.9-123.el7.x86_64 4/6 验证中 : device-mapper-multipath-0.4.9-123.el7.x86_64 5/6 验证中 : device-mapper-multipath-libs-0.4.9-123.el7.x86_64 6/6 更新完毕: device-mapper-multipath.x86_64 0:0.4.9-136.el7_9 作为依赖被升级: device-mapper-multipath-libs.x86_64 0:0.4.9-136.el7_9 kpartx.x86_64 0:0.4.9-136.el7_9 完毕! [root@master ~]# kpartx -a /dev/sda device-mapper: reload ioctl on sda1 failed: Device or resource busy create/reload failed on sda1 device-mapper: reload ioctl on sda2 failed: Device or resource busy create/reload failed on sda2 device-mapper: reload ioctl on sda3 failed: Device or resource busy create/reload failed on sda3 [root@master ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 50G 0 disk ├─sda1 8:1 0 1G 0 part /boot ├─sda2 8:2 0 2G 0 part [SWAP] └─sda3 8:3 0 17G 0 part / sr0 11:0 1 4.3G 0 rom [root@master ~]# blockdev --rereadpt /dev/sda blockdev: BLKRRPART ioctl 出错: 设备或资源忙 [root@master ~]# reboot Connection closing...Socket close. Connection closed by foreign host. Disconnected from remote host(master) at 22:10:57. Type `help' to learn how to use Xshell prompt. [C:\~]$ Connecting to 10.23.175.100:22... Connection established. To escape to local shell, press 'Ctrl+Alt+]'. Last login: Fri Dec 26 22:00:13 2025 from 10.23.175.1 [root@master ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 50G 0 disk ├─sda1 8:1 0 1G 0 part /boot ├─sda2 8:2 0 2G 0 part [SWAP] └─sda3 8:3 0 17G 0 part / sr0 11:0 1 4.3G 0 rom [root@master ~]# mkfs.xfs /dev/sda4 /dev/sda4: 没有那个文件或目录 Usage: mkfs.xfs /* blocksize */ [-b log=n|size=num] /* metadata */ [-m crc=0|1,finobt=0|1,uuid=xxx] /* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num, (sunit=value,swidth=value|su=num,sw=num|noalign), sectlog=n|sectsize=num /* force overwrite */ [-f] /* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2, projid32bit=0|1] /* no discard */ [-K] /* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n sunit=value|su=num,sectlog=n|sectsize=num, lazy-count=0|1] /* label */ [-L label (maximum 12 characters)] /* naming */ [-n log=n|size=num,version=2|ci,ftype=0|1] /* no-op info only */ [-N] /* prototype file */ [-p fname] /* quiet */ [-q] /* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx] /* sectorsize */ [-s log=n|size=num] /* version */ [-V] devicename <devicename> is required unless -d name=xxx is given. <num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB), xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB). <value> is xxx (512 byte blocks). [root@master ~]#
最新发布
12-27
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值