CSI-external-provisioner

main()

这段Go代码是一个CSI(容器存储接口)Provisioner(供应器)的实现,用于在Kubernetes集群中动态提供持久卷。代码涉及多个组件和步骤,下面是对关键部分的解释:

  1. 初始化和配置
  • 命令行标志和环境变量:代码使用flag包处理命令行参数,如feature-gateskubeconfig等。同时,从环境变量中获取配置,如NODE_NAME
  • 日志和度量:使用klog进行日志记录,并配置Prometheus度量收集器。
  1. CSI驱动连接和验证
  • 连接到CSI驱动:通过gRPC连接到CSI驱动,并进行基本的探测(ctrl.Probe)以确保驱动可用。
  • 获取驱动名称和能力:从CSI驱动获取驱动名称(ctrl.GetDriverName)和能力(ctrl.GetDriverCapabilities)。
  1. 拓扑和节点信息
  • 拓扑支持:如果CSI驱动支持拓扑,则创建相应的informer来监视节点和CSINode对象。
  • 节点部署:如果启用了节点部署(--enable-node-deployment),则获取节点信息并配置节点部署对象。
  1. Provisioner和Controller创建
  • Provisioner创建:使用获取的配置和客户端创建CSI Provisioner对象,该对象实现了Provisioner接口。
  • 容量控制器:如果启用了容量功能(--enable-capacity),则创建容量控制器来发布存储容量信息。
  1. HTTP服务器和度量
  • HTTP服务器:如果指定了度量地址(--metrics-address)或HTTP端点(--http-endpoint),则启动HTTP服务器来暴露度量和可能的调试端点(如pprof)。
  1. Informers和队列
  • Informer和队列:创建各种资源的Informer来监视Kubernetes对象的变化,并使用工作队列处理事件。
  1. 运行
  • 启动Informer和控制器:启动Informer工厂和控制器,开始监视和处理事件。

总结
这段代码是一个复杂的CSI Provisioner实现,它集成了Kubernetes客户端、CSI驱动、度量收集、拓扑感知、容量管理等多个组件。通过精心设计的架构和模块化的代码,它能够在Kubernetes集群中高效地提供和管理持久卷。

func main() {
	var config *rest.Config
	var err error

	flag.Var(utilflag.NewMapStringBool(&featureGates), "feature-gates", "A set of key=value pairs that describe feature gates for alpha/experimental features. "+
		"Options are:\n"+strings.Join(utilfeature.DefaultFeatureGate.KnownFeatures(), "\n"))

	klog.InitFlags(nil)
	flag.CommandLine.AddGoFlagSet(goflag.CommandLine)
	flag.Set("logtostderr", "true")
	flag.Parse()

	ctx := context.Background()

	if err := utilfeature.DefaultMutableFeatureGate.SetFromMap(featureGates); err != nil {
		klog.Fatal(err)
	}

	node := os.Getenv("NODE_NAME")
	if *enableNodeDeployment && node == "" {
		klog.Fatal("The NODE_NAME environment variable must be set when using --enable-node-deployment.")
	}

	if *showVersion {
		fmt.Println(os.Args[0], version)
		os.Exit(0)
	}
	klog.Infof("Version: %s", version)

	if *metricsAddress != "" && *httpEndpoint != "" {
		klog.Error("only one of `--metrics-address` and `--http-endpoint` can be set.")
		os.Exit(1)
	}
	addr := *metricsAddress
	if addr == "" {
		addr = *httpEndpoint
	}

	// get the KUBECONFIG from env if specified (useful for local/debug cluster)
	kubeconfigEnv := os.Getenv("KUBECONFIG")

	if kubeconfigEnv != "" {
		klog.Infof("Found KUBECONFIG environment variable set, using that..")
		kubeconfig = &kubeconfigEnv
	}

	if *master != "" || *kubeconfig != "" {
		klog.Infof("Either master or kubeconfig specified. building kube config from that..")
		config, err = clientcmd.BuildConfigFromFlags(*master, *kubeconfig)
	} else {
		klog.Infof("Building kube configs for running in cluster...")
		config, err = rest.InClusterConfig()
	}
	if err != nil {
		klog.Fatalf("Failed to create config: %v", err)
	}

	config.QPS = *kubeAPIQPS
	config.Burst = *kubeAPIBurst

	clientset, err := kubernetes.NewForConfig(config)
	if err != nil {
		klog.Fatalf("Failed to create client: %v", err)
	}

	// snapclientset.NewForConfig creates a new Clientset for  VolumesnapshotV1Client
	snapClient, err := snapclientset.NewForConfig(config)
	if err != nil {
		klog.Fatalf("Failed to create snapshot client: %v", err)
	}

	var gatewayClient gatewayclientset.Interface
	if utilfeature.DefaultFeatureGate.Enabled(features.CrossNamespaceVolumeDataSource) {
		// gatewayclientset.NewForConfig creates a new Clientset for GatewayClient
		gatewayClient, err = gatewayclientset.NewForConfig(config)
		if err != nil {
			klog.Fatalf("Failed to create gateway client: %v", err)
		}
	}

	metricsManager := metrics.NewCSIMetricsManagerWithOptions("", /* driverName */
		// Will be provided via default gatherer.
		metrics.WithProcessStartTime(false),
		metrics.WithSubsystem(metrics.SubsystemSidecar),
	)

	grpcClient, err := ctrl.Connect(ctx, *csiEndpoint, metricsManager)
	if err != nil {
		klog.Error(err.Error())
		os.Exit(1)
	}

	err = ctrl.Probe(ctx, grpcClient, *operationTimeout)
	if err != nil {
		klog.Error(err.Error())
		os.Exit(1)
	}

	// Autodetect provisioner name
	provisionerName, err := ctrl.GetDriverName(grpcClient, *operationTimeout)
	if err != nil {
		klog.Fatalf("Error getting CSI driver name: %s", err)
	}
	klog.V(2).Infof("Detected CSI driver %s", provisionerName)
	metricsManager.SetDriverName(provisionerName)

	translator := csitrans.New()
	supportsMigrationFromInTreePluginName := ""
	if translator.IsMigratedCSIDriverByName(provisionerName) {
		supportsMigrationFromInTreePluginName, err = translator.GetInTreeNameFromCSIName(provisionerName)
		if err != nil {
			klog.Fatalf("Failed to get InTree plugin name for migrated CSI plugin %s: %v", provisionerName, err)
		}
		klog.V(2).Infof("Supports migration from in-tree plugin: %s", supportsMigrationFromInTreePluginName)

		// Create a new connection with the metrics manager with migrated label
		metricsManager = metrics.NewCSIMetricsManagerWithOptions(provisionerName,
			// Will be provided via default gatherer.
			metrics.WithProcessStartTime(false),
			metrics.WithMigration())
		migratedGrpcClient, err := ctrl.Connect(ctx, *csiEndpoint, metricsManager)
		if err != nil {
			klog.Error(err.Error())
			os.Exit(1)
		}
		grpcClient.Close()
		grpcClient = migratedGrpcClient

		err = ctrl.Probe(ctx, grpcClient, *operationTimeout)
		if err != nil {
			klog.Error(err.Error())
			os.Exit(1)
		}
	}

	// Prepare http endpoint for metrics + leader election healthz
	mux := http.NewServeMux()
	gatherers := prometheus.Gatherers{
		// For workqueue and leader election metrics, set up via the anonymous imports of:
		// https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/component-base/metrics/prometheus/workqueue/metrics.go
		// https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/component-base/metrics/prometheus/clientgo/leaderelection/metrics.go
		//
		// Also to happens to include Go runtime and process metrics:
		// https://github.com/kubernetes/kubernetes/blob/9780d88cb6a4b5b067256ecb4abf56892093ee87/staging/src/k8s.io/component-base/metrics/legacyregistry/registry.goL46-L49
		legacyregistry.DefaultGatherer,
		// For CSI operations.
		metricsManager.GetRegistry(),
	}

	pluginCapabilities, controllerCapabilities, err := ctrl.GetDriverCapabilities(grpcClient, *operationTimeout)
	if err != nil {
		klog.Fatalf("Error getting CSI driver capabilities: %s", err)
	}

	// Generate a unique ID for this provisioner
	timeStamp := time.Now().UnixNano() / int64(time.Millisecond)
	identity := strconv.FormatInt(timeStamp, 10) + "-" + strconv.Itoa(rand.Intn(10000)) + "-" + provisionerName
	if *enableNodeDeployment {
		identity = identity + "-" + node
	}

	factory := informers.NewSharedInformerFactory(clientset, ctrl.ResyncPeriodOfCsiNodeInformer)
	var factoryForNamespace informers.SharedInformerFactory // usually nil, only used for CSIStorageCapacity

	// -------------------------------
	// Listers
	// Create informer to prevent hit the API server for all resource request
	scLister := factory.Storage().V1().StorageClasses().Lister()
	claimLister := factory.Core().V1().PersistentVolumeClaims().Lister()

	var vaLister storagelistersv1.VolumeAttachmentLister
	if controllerCapabilities[csi.ControllerServiceCapability_RPC_PUBLISH_UNPUBLISH_VOLUME] {
		klog.Info("CSI driver supports PUBLISH_UNPUBLISH_VOLUME, watching VolumeAttachments")
		vaLister = factory.Storage().V1().VolumeAttachments().Lister()
	} else {
		klog.Info("CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments")
	}

	var nodeDeployment *ctrl.NodeDeployment
	if *enableNodeDeployment {
		nodeDeployment = &ctrl.NodeDeployment{
			NodeName:         node,
			ClaimInformer:    factory.Core().V1().PersistentVolumeClaims(),
			ImmediateBinding: *nodeDeploymentImmediateBinding,
			BaseDelay:        *nodeDeploymentBaseDelay,
			MaxDelay:         *nodeDeploymentMaxDelay,
		}
		nodeInfo, err := ctrl.GetNodeInfo(grpcClient, *operationTimeout)
		if err != nil {
			klog.Fatalf("Failed to get node info from CSI driver: %v", err)
		}
		nodeDeployment.NodeInfo = *nodeInfo
	}

	var nodeLister listersv1.NodeLister
	var csiNodeLister storagelistersv1.CSINodeLister
	if ctrl.SupportsTopology(pluginCapabilities) {
		if nodeDeployment != nil {
			// Avoid watching in favor of fake, static objects. This is particularly relevant for
			// Node objects, which can generate significant traffic.
			csiNode := &storagev1.CSINode{
				ObjectMeta: metav1.ObjectMeta{
					Name: nodeDeployment.NodeName,
				},
				Spec: storagev1.CSINodeSpec{
					Drivers: []storagev1.CSINodeDriver{
						{
							Name:   provisionerName,
							NodeID: nodeDeployment.NodeInfo.NodeId,
						},
					},
				},
			}
			node := &v1.Node{
				ObjectMeta: metav1.ObjectMeta{
					Name: nodeDeployment.NodeName,
				},
			}
			if nodeDeployment.NodeInfo.AccessibleTopology != nil {
				for key := range nodeDeployment.NodeInfo.AccessibleTopology.Segments {
					csiNode.Spec.Drivers[0].TopologyKeys = append(csiNode.Spec.Drivers[0].TopologyKeys, key)
				}
				node.Labels = nodeDeployment.NodeInfo.AccessibleTopology.Segments
			}
			klog.Infof("using local topology with Node = %+v and CSINode = %+v", node, csiNode)

			// We make those fake objects available to the topology code via informers which
			// never change.
			stoppedFactory := informers.NewSharedInformerFactory(clientset, 1000*time.Hour)
			csiNodes := stoppedFactory.Storage().V1().CSINodes()
			nodes := stoppedFactory.Core().V1().Nodes()
			csiNodes.Informer().GetStore().Add(csiNode)
			nodes.Informer().GetStore().Add(node)
			csiNodeLister = csiNodes.Lister()
			nodeLister = nodes.Lister()

		} else {
			csiNodeLister = factory.Storage().V1().CSINodes().Lister()
			nodeLister = factory.Core().V1().Nodes().Lister()
		}
	}

	var referenceGrantLister referenceGrantv1beta1.ReferenceGrantLister
	var gatewayFactory gatewayInformers.SharedInformerFactory
	if utilfeature.DefaultFeatureGate.Enabled(features.CrossNamespaceVolumeDataSource) {
		gatewayFactory = gatewayInformers.NewSharedInformerFactory(gatewayClient, ctrl.ResyncPeriodOfReferenceGrantInformer)
		referenceGrants := gatewayFactory.Gateway().V1beta1().ReferenceGrants()
		referenceGrantLister = referenceGrants.Lister()
	}

	// -------------------------------
	// PersistentVolumeClaims informer
	rateLimiter := workqueue.NewItemExponentialFailureRateLimiter(*retryIntervalStart, *retryIntervalMax)
	claimQueue := workqueue.NewNamedRateLimitingQueue(rateLimiter, "claims")
	claimInformer := factory.Core().V1().PersistentVolumeClaims().Informer()

	// Setup options
	provisionerOptions := []func(*controller.ProvisionController) error{
		controller.LeaderElection(false), // Always disable leader election in provisioner lib. Leader election should be done here in the CSI provisioner level instead.
		controller.FailedProvisionThreshold(0),
		controller.FailedDeleteThreshold(0),
		controller.RateLimiter(rateLimiter),
		controller.Threadiness(int(*workerThreads)),
		controller.CreateProvisionedPVLimiter(workqueue.DefaultControllerRateLimiter()),
		controller.ClaimsInformer(claimInformer),
		controller.NodesLister(nodeLister),
	}

	if utilfeature.DefaultFeatureGate.Enabled(features.HonorPVReclaimPolicy) {
		provisionerOptions = append(provisionerOptions, controller.AddFinalizer(true))
	}

	if supportsMigrationFromInTreePluginName != "" {
		provisionerOptions = append(provisionerOptions, controller.AdditionalProvisionerNames([]string{supportsMigrationFromInTreePluginName}))
	}

	// Create the provisioner: it implements the Provisioner interface expected by
	// the controller
	csiProvisioner := ctrl.NewCSIProvisioner(
		clientset,
		*operationTimeout,
		identity,
		*volumeNamePrefix,
		*volumeNameUUIDLength,
		grpcClient,
		snapClient,
		provisionerName,
		pluginCapabilities,
		controllerCapabilities,
		supportsMigrationFromInTreePluginName,
		*strictTopology,
		*immediateTopology,
		translator,
		scLister,
		csiNodeLister,
		nodeLister,
		claimLister,
		vaLister,
		referenceGrantLister,
		*extraCreateMetadata,
		*defaultFSType,
		nodeDeployment,
		*controllerPublishReadOnly,
		*preventVolumeModeConversion,
	)

	var capacityController *capacity.Controller
	if *enableCapacity {
		// Publishing storage capacity information uses its own client
		// with separate rate limiting.
		config.QPS = *kubeAPICapacityQPS
		config.Burst = *kubeAPICapacityBurst
		clientset, err := kubernetes.NewForConfig(config)
		if err != nil {
			klog.Fatalf("Failed to create client: %v", err)
		}

		namespace := os.Getenv("NAMESPACE")
		if namespace == "" {
			klog.Fatal("need NAMESPACE env variable for CSIStorageCapacity objects")
		}
		var controller *metav1.OwnerReference
		if *capacityOwnerrefLevel >= 0 {
			podName := os.Getenv("POD_NAME")
			if podName == "" {
				klog.Fatal("need POD_NAME env variable to determine CSIStorageCapacity owner")
			}
			var err error
			controller, err = owner.Lookup(config, namespace, podName,
				schema.GroupVersionKind{
					Group:   "",
					Version: "v1",
					Kind:    "Pod",
				}, *capacityOwnerrefLevel)
			if err != nil {
				klog.Fatalf("look up owner(s) of pod: %v", err)
			}
			klog.Infof("using %s/%s %s as owner of CSIStorageCapacity objects", controller.APIVersion, controller.Kind, controller.Name)
		}

		var topologyInformer topology.Informer
		if nodeDeployment == nil {
			topologyInformer = topology.NewNodeTopology(
				provisionerName,
				clientset,
				factory.Core().V1().Nodes(),
				factory.Storage().V1().CSINodes(),
				workqueue.NewNamedRateLimitingQueue(rateLimiter, "csitopology"),
			)
		} else {
			var segment topology.Segment
			if nodeDeployment.NodeInfo.AccessibleTopology != nil {
				for key, value := range nodeDeployment.NodeInfo.AccessibleTopology.Segments {
					segment = append(segment, topology.SegmentEntry{Key: key, Value: value})
				}
			}
			klog.Infof("producing CSIStorageCapacity objects with fixed topology segment %s", segment)
			topologyInformer = topology.NewFixedNodeTopology(&segment)
		}
		go topologyInformer.RunWorker(ctx)

		managedByID := "external-provisioner"
		if *enableNodeDeployment {
			managedByID = getNameWithMaxLength(managedByID, node, validation.DNS1035LabelMaxLength)
		}

		// We only need objects from our own namespace. The normal factory would give
		// us an informer for the entire cluster. We can further restrict the
		// watch to just those objects with the right labels.
		factoryForNamespace = informers.NewSharedInformerFactoryWithOptions(clientset,
			ctrl.ResyncPeriodOfCsiNodeInformer,
			informers.WithNamespace(namespace),
			informers.WithTweakListOptions(func(lo *metav1.ListOptions) {
				lo.LabelSelector = labels.Set{
					capacity.DriverNameLabel: provisionerName,
					capacity.ManagedByLabel:  managedByID,
				}.AsSelector().String()
			}),
		)

		// We use the V1 CSIStorageCapacity API if available.
		clientFactory := capacity.NewV1ClientFactory(clientset)
		cInformer := factoryForNamespace.Storage().V1().CSIStorageCapacities()

		// This invalid object is used in a v1 Create call to determine
		// based on the resulting error whether the v1 API is supported.
		invalidCapacity := &storagev1.CSIStorageCapacity{
			ObjectMeta: metav1.ObjectMeta{
				Name: "%123-invalid-name",
			},
		}
		createdCapacity, err := clientset.StorageV1().CSIStorageCapacities(namespace).Create(ctx, invalidCapacity, metav1.CreateOptions{})
		switch {
		case err == nil:
			klog.Fatalf("creating an invalid v1.CSIStorageCapacity didn't fail as expected, got: %s", createdCapacity)
		case apierrors.IsNotFound(err):
			// We need to bridge between the v1beta1 API on the
			// server and the v1 API expected by the capacity code.
			klog.Info("using the CSIStorageCapacity v1beta1 API")
			clientFactory = capacity.NewV1beta1ClientFactory(clientset)
			cInformer = capacity.NewV1beta1InformerBridge(factoryForNamespace.Storage().V1beta1().CSIStorageCapacities())
		case apierrors.IsInvalid(err):
			klog.Info("using the CSIStorageCapacity v1 API")
		default:
			klog.Fatalf("unexpected error when checking for the V1 CSIStorageCapacity API: %v", err)
		}

		capacityController = capacity.NewCentralCapacityController(
			csi.NewControllerClient(grpcClient),
			provisionerName,
			clientFactory,
			// Metrics for the queue is available in the default registry.
			workqueue.NewNamedRateLimitingQueue(rateLimiter, "csistoragecapacity"),
			controller,
			managedByID,
			namespace,
			topologyInformer,
			factory.Storage().V1().StorageClasses(),
			cInformer,
			*capacityPollInterval,
			*capacityImmediateBinding,
			*operationTimeout,
		)
		legacyregistry.CustomMustRegister(capacityController)

		// Wrap Provision and Delete to detect when it is time to refresh capacity.
		csiProvisioner = capacity.NewProvisionWrapper(csiProvisioner, capacityController)
	}

	if addr != "" {
		// Start HTTP server, regardless whether we are the leader or not.
		// Register provisioner metrics manually to be able to add multiplexer in front of it
		m := libmetrics.New("controller")
		reg := prometheus.NewRegistry()
		reg.MustRegister([]prometheus.Collector{
			m.PersistentVolumeClaimProvisionTotal,
			m.PersistentVolumeClaimProvisionFailedTotal,
			m.PersistentVolumeClaimProvisionDurationSeconds,
			m.PersistentVolumeDeleteTotal,
			m.PersistentVolumeDeleteFailedTotal,
			m.PersistentVolumeDeleteDurationSeconds,
		}...)
		provisionerOptions = append(provisionerOptions, controller.MetricsInstance(m))
		gatherers = append(gatherers, reg)

		// This is similar to k8s.io/component-base/metrics HandlerWithReset
		// except that we gather from multiple sources. This is necessary
		// because both CSI metrics manager and component-base manage
		// their own registry. Probably could be avoided by making
		// CSI metrics manager a bit more flexible.
		mux.Handle(*metricsPath,
			promhttp.InstrumentMetricHandler(
				reg,
				promhttp.HandlerFor(gatherers, promhttp.HandlerOpts{})))

		if *enableProfile {
			klog.InfoS("Starting profiling", "endpoint", httpEndpoint)

			mux.HandleFunc("/debug/pprof/", pprof.Index)
			mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
			mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
			mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
			mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
		}
		go func() {
			klog.Infof("ServeMux listening at %q", addr)
			err := http.ListenAndServe(addr, mux)
			if err != nil {
				klog.Fatalf("Failed to start HTTP server at specified address (%q) and metrics path (%q): %s", addr, *metricsPath, err)
			}
		}()
	}

	logger := klog.FromContext(ctx)
	provisionController = controller.NewProvisionController(
		logger,
		clientset,
		provisionerName,
		csiProvisioner,
		provisionerOptions...,
	)

	csiClaimController := ctrl.NewCloningProtectionController(
		clientset,
		claimLister,
		claimInformer,
		claimQueue,
		controllerCapabilities,
	)

	run := func(ctx context.Context) {
		factory.Start(ctx.Done())
		if factoryForNamespace != nil {
			// Starting is enough, the capacity controller will
			// wait for sync.
			factoryForNamespace.Start(ctx.Done())
		}
		cacheSyncResult := factory.WaitForCacheSync(ctx.Done())
		for _, v := range cacheSyncResult {
			if !v {
				klog.Fatalf("Failed to sync Informers!")
			}
		}

		if utilfeature.DefaultFeatureGate.Enabled(features.CrossNamespaceVolumeDataSource) {
			if gatewayFactory != nil {
				gatewayFactory.Start(ctx.Done())
			}
			gatewayCacheSyncResult := gatewayFactory.WaitForCacheSync(ctx.Done())
			for _, v := range gatewayCacheSyncResult {
				if !v {
					klog.Fatalf("Failed to sync Informers for gateway!")
				}
			}
		}

		if capacityController != nil {
			go capacityController.Run(ctx, int(*capacityThreads))
		}
		if csiClaimController != nil {
			go csiClaimController.Run(ctx, int(*finalizerThreads))
		}
		provisionController.Run(ctx)
	}

	if !*enableLeaderElection {
		run(ctx)
	} else {
		// this lock name pattern is also copied from sigs.k8s.io/sig-storage-lib-external-provisioner/controller
		// to preserve backwards compatibility
		lockName := strings.Replace(provisionerName, "/", "-", -1)

		// create a new clientset for leader election
		leClientset, err := kubernetes.NewForConfig(config)
		if err != nil {
			klog.Fatalf("Failed to create leaderelection client: %v", err)
		}

		le := leaderelection.NewLeaderElection(leClientset, lockName, run)
		if *httpEndpoint != "" {
			le.PrepareHealthCheck(mux, leaderelection.DefaultHealthCheckTimeout)
		}

		if *leaderElectionNamespace != "" {
			le.WithNamespace(*leaderElectionNamespace)
		}

		le.WithLeaseDuration(*leaderElectionLeaseDuration)
		le.WithRenewDeadline(*leaderElectionRenewDeadline)
		le.WithRetryPeriod(*leaderElectionRetryPeriod)
		le.WithIdentity(identity)

		if err := le.Run(); err != nil {
			klog.Fatalf("failed to initialize leader election: %v", err)
		}
	}
}

NewProvisionController()

  1. 获取主机名和生成唯一ID
    • 使用os.Hostname()获取当前主机名,如果获取失败,则记录错误日志并退出程序。
    • 将主机名与一个UUID结合,生成一个唯一的ID,以避免在同一主机上运行的多个进程发生冲突。
  2. 创建并初始化ProvisionController实例
  • 初始化ProvisionController结构体,包括客户端、供应器名称、供应器实现、ID、组件名、事件记录器等字段。
  • 设置一系列默认值,如重同步周期、错误时的指数退避策略、线程数、失败阈值等。
  • 初始化指标相关配置。
  1. 处理选项函数
    • 遍历传入的选项函数列表,对每个函数进行调用,以配置ProvisionController实例。如果某个选项函数执行失败,则记录错误日志并退出程序。
  2. 初始化速率限制器和工作队列
    • 根据配置创建速率限制器,并用于初始化claimQueuevolumeQueue两个工作队列。
  3. 初始化Informer和事件处理器
    • 使用informers.NewSharedInformerFactory创建共享Informer工厂。
    • 为PersistentVolumeClaims(PVCs)、PersistentVolumes(PVs)和StorageClasses分别设置事件处理器和Indexer。
    • Informer用于监听Kubernetes资源的变化,并根据变化触发相应的事件处理函数。
  4. 初始化VolumeStore
    • 根据配置选择使用NewVolumeStoreQueueNewBackoffStore来初始化volumeStore,用于处理PV的创建和保存逻辑。
// NewProvisionController creates a new provision controller using
// the given configuration parameters and with private (non-shared) informers.
func NewProvisionController(
	logger klog.Logger,
	client kubernetes.Interface,
	provisionerName string,
	provisioner Provisioner,
	options ...func(*ProvisionController) error,
) *ProvisionController {
	id, err := os.Hostname()
	if err != nil {
		logger.Error(err, "Error getting hostname")
		klog.FlushAndExit(klog.ExitFlushTimeout, 1)
	}
	// add a uniquifier so that two processes on the same host don't accidentally both become active
	id = id + "_" + string(uuid.NewUUID())
	component := provisionerName + "_" + id

	// TODO: Once the following PR is merged, change to use StartLogging and StartRecordingToSinkWithContext
	// https://github.com/kubernetes/kubernetes/pull/120729
	v1.AddToScheme(scheme.Scheme)
	broadcaster := record.NewBroadcaster()
	broadcaster.StartStructuredLogging(0)
	broadcaster.StartRecordingToSink(&corev1.EventSinkImpl{Interface: client.CoreV1().Events(v1.NamespaceAll)})
	eventRecorder := broadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: component})

	controller := &ProvisionController{
		client:                    client,
		provisionerName:           provisionerName,
		provisioner:               provisioner,
		id:                        id,
		component:                 component,
		eventRecorder:             eventRecorder,
		resyncPeriod:              DefaultResyncPeriod,
		exponentialBackOffOnError: DefaultExponentialBackOffOnError,
		threadiness:               DefaultThreadiness,
		failedProvisionThreshold:  DefaultFailedProvisionThreshold,
		failedDeleteThreshold:     DefaultFailedDeleteThreshold,
		leaderElection:            DefaultLeaderElection,
		leaderElectionNamespace:   getInClusterNamespace(),
		leaseDuration:             DefaultLeaseDuration,
		renewDeadline:             DefaultRenewDeadline,
		retryPeriod:               DefaultRetryPeriod,
		metrics:                   metrics.New(controllerSubsystem),
		metricsPort:               DefaultMetricsPort,
		metricsAddress:            DefaultMetricsAddress,
		metricsPath:               DefaultMetricsPath,
		addFinalizer:              DefaultAddFinalizer,
		hasRun:                    false,
		hasRunLock:                &sync.Mutex{},
	}

	for _, option := range options {
		err := option(controller)
		if err != nil {
			logger.Error(err, "Error processing controller options")
			klog.FlushAndExit(klog.ExitFlushTimeout, 1)
		}
	}

	var rateLimiter workqueue.RateLimiter
	if controller.rateLimiter != nil {
		// rateLimiter set via parameter takes precedence
		rateLimiter = controller.rateLimiter
	} else if controller.exponentialBackOffOnError {
		rateLimiter = workqueue.NewMaxOfRateLimiter(
			workqueue.NewItemExponentialFailureRateLimiter(15*time.Second, 1000*time.Second),
			&workqueue.BucketRateLimiter{Limiter: rate.NewLimiter(rate.Limit(10), 100)},
		)
	} else {
		rateLimiter = workqueue.NewMaxOfRateLimiter(
			workqueue.NewItemExponentialFailureRateLimiter(15*time.Second, 15*time.Second),
			&workqueue.BucketRateLimiter{Limiter: rate.NewLimiter(rate.Limit(10), 100)},
		)
	}
	controller.claimQueue = workqueue.NewNamedRateLimitingQueue(rateLimiter, "claims")
	controller.volumeQueue = workqueue.NewNamedRateLimitingQueue(rateLimiter, "volumes")

	informer := informers.NewSharedInformerFactory(client, controller.resyncPeriod)

	// ----------------------
	// PersistentVolumeClaims

	claimHandler := cache.ResourceEventHandlerFuncs{
		AddFunc:    func(obj interface{}) { controller.enqueueClaim(obj) },
		UpdateFunc: func(oldObj, newObj interface{}) { controller.enqueueClaim(newObj) },
		DeleteFunc: func(obj interface{}) {
			// NOOP. The claim is either in claimsInProgress and in the queue, so it will be processed as usual
			// or it's not in claimsInProgress and then we don't care
		},
	}

	if controller.claimInformer != nil {
		controller.claimInformer.AddEventHandlerWithResyncPeriod(claimHandler, controller.resyncPeriod)
	} else {
		controller.claimInformer = informer.Core().V1().PersistentVolumeClaims().Informer()
		controller.claimInformer.AddEventHandler(claimHandler)
	}
	err = controller.claimInformer.AddIndexers(cache.Indexers{uidIndex: func(obj interface{}) ([]string, error) {
		uid, err := getObjectUID(obj)
		if err != nil {
			return nil, err
		}
		return []string{uid}, nil
	}})
	if err != nil {
		logger.Error(err, "Error setting indexer for pvc informer", "indexer", uidIndex)
		klog.FlushAndExit(klog.ExitFlushTimeout, 1)
	}
	controller.claimsIndexer = controller.claimInformer.GetIndexer()

	// -----------------
	// PersistentVolumes

	volumeHandler := cache.ResourceEventHandlerFuncs{
		AddFunc:    func(obj interface{}) { controller.enqueueVolume(obj) },
		UpdateFunc: func(oldObj, newObj interface{}) { controller.enqueueVolume(newObj) },
		DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
	}

	if controller.volumeInformer != nil {
		controller.volumeInformer.AddEventHandlerWithResyncPeriod(volumeHandler, controller.resyncPeriod)
	} else {
		controller.volumeInformer = informer.Core().V1().PersistentVolumes().Informer()
		controller.volumeInformer.AddEventHandler(volumeHandler)
	}
	controller.volumes = controller.volumeInformer.GetStore()

	// --------------
	// StorageClasses

	// no resource event handler needed for StorageClasses
	if controller.classInformer == nil {
		controller.classInformer = informer.Storage().V1().StorageClasses().Informer()
	}
	controller.classes = controller.classInformer.GetStore()

	if controller.createProvisionerPVLimiter != nil {
		logger.V(2).Info("Using saving PVs to API server in background")
		controller.volumeStore = NewVolumeStoreQueue(client, controller.createProvisionerPVLimiter, controller.claimsIndexer, controller.eventRecorder)
	} else {
		if controller.createProvisionedPVBackoff == nil {
			// Use linear backoff with createProvisionedPVInterval and createProvisionedPVRetryCount by default.
			if controller.createProvisionedPVInterval == 0 {
				controller.createProvisionedPVInterval = DefaultCreateProvisionedPVInterval
			}
			if controller.createProvisionedPVRetryCount == 0 {
				controller.createProvisionedPVRetryCount = DefaultCreateProvisionedPVRetryCount
			}
			controller.createProvisionedPVBackoff = &wait.Backoff{
				Duration: controller.createProvisionedPVInterval,
				Factor:   1, // linear backoff
				Steps:    controller.createProvisionedPVRetryCount,
				// Cap:      controller.createProvisionedPVInterval,
			}
		}
		logger.V(2).Info("Using blocking saving PVs to API server")
		controller.volumeStore = NewBackoffStore(client, controller.eventRecorder, controller.createProvisionedPVBackoff, controller)
	}

	return controller
}

syncClaim()

  1. 判断是否应该进行供给:
    • 调用ctrl.shouldProvision(ctx, claim)方法来判断是否需要对这个PVC进行供给操作。如果返回错误,则更新供给统计信息并返回错误。
    • 如果shouldtrue,表示需要进行供给操作。
  2. 供给操作:
    • 记录供给操作的开始时间。
    • 从上下文中获取logger对象。
    • 调用ctrl.provisionClaimOperation(ctx, claim)方法进行供给操作,返回操作状态和可能的错误。
    • 更新供给统计信息,传入错误和开始时间。
  3. 处理供给操作的结果:
    • 如果供给操作没有错误或者状态是ProvisioningFinished,表示供给操作已经完成或者不需要进行。根据错误类型进行不同的处理:
      • 如果没有错误,记录日志并删除该PVC在claimsInProgress中的记录。
      • 如果错误是errStopProvision,记录日志并将错误置为nil(调用者会重新排队处理)。
      • 其他错误类型,记录日志。
    • 如果供给状态是ProvisioningInBackground,表示供给操作正在后台进行,记录日志并将PVC添加到claimsInProgress中。
    • 如果供给状态是ProvisioningNoChange,不做任何修改,保持claimsInProgress的状态不变。
  4. 返回错误:
    • 如果不需要进行供给操作或者供给操作已经完成并且没有需要处理的错误,则返回nil
    • 否则,返回供给操作中的错误。
      这段代码的主要逻辑是围绕PVC的供给状态进行操作,根据供给的结果更新内部状态(如claimsInProgress),并记录相关的日志信息。通过这种方式,ProvisionController能够管理多个PVC的供给过程,确保每个PVC都能够被正确地处理。
func (ctrl *ProvisionController) syncClaim(ctx context.Context, obj interface{}) error {
	claim, ok := obj.(*v1.PersistentVolumeClaim)
	if !ok {
		return fmt.Errorf("expected claim but got %+v", obj)
	}

	should, err := ctrl.shouldProvision(ctx, claim)
	if err != nil {
		ctrl.updateProvisionStats(claim, err, time.Time{})
		return err
	} else if should {
		startTime := time.Now()
		logger := klog.FromContext(ctx)

		status, err := ctrl.provisionClaimOperation(ctx, claim)
		ctrl.updateProvisionStats(claim, err, startTime)
		if err == nil || status == ProvisioningFinished {
			// Provisioning is 100% finished / not in progress.
			switch err {
			case nil:
				logger.V(5).Info("Claim processing succeeded, removing PVC from claims in progress", "claimUID", claim.UID)
			case errStopProvision:
				logger.V(5).Info("Stop provisioning, removing PVC from claims in progress", "claimUID", claim.UID)
				// Our caller would requeue if we pass on this special error; return nil instead.
				err = nil
			default:
				logger.V(2).Info("Final error received, removing PVC from claims in progress", "claimUID", claim.UID)
			}
			ctrl.claimsInProgress.Delete(string(claim.UID))
			return err
		}
		if status == ProvisioningInBackground {
			// Provisioning is in progress in background.
			logger.V(2).Info("Temporary error received, adding PVC to claims in progress", "claimUID", claim.UID)
			ctrl.claimsInProgress.Store(string(claim.UID), claim)
		} else {
			// status == ProvisioningNoChange.
			// Don't change claimsInProgress:
			// - the claim is already there if previous status was ProvisioningInBackground.
			// - the claim is not there if if previous status was ProvisioningFinished.
		}
		return err
	}
	return nil
}
shouldProvision()
  1. 检查PVC是否已指定卷名
    • 如果claim.Spec.VolumeName不为空,表示这个PVC已经绑定到了一个具体的卷上,因此不需要再进行供给。方法返回false, nil
  2. 检查Provisioner是否实现了Qualifier接口
    • 通过类型断言ctrl.provisioner.(Qualifier)检查ctrl.provisioner是否实现了Qualifier接口。
    • 如果实现了,并且Qualifier接口的ShouldProvision方法返回false,则表示不需要进行供给。方法返回false, nil
  3. 检查PVC的注解以确定Provisioner
    • 首先尝试从PVC的注解中获取annStorageProvisioner的值。
    • 如果不存在,则尝试获取annBetaStorageProvisioner的值。
    • 这两个注解用于指定负责供给卷的Provisioner。
  4. 检查找到的Provisioner是否是已知的
    • 如果找到了Provisioner的注解,并且这个Provisioner是控制器已知的(通过ctrl.knownProvisioner(provisioner)检查),则继续下一步。
  5. 检查StorageClass的VolumeBindingMode
    • 通过util.GetPersistentVolumeClaimClass(claim)获取PVC所属的StorageClass。
    • 通过ctrl.getStorageClass(claimClass)获取这个StorageClass的详细信息。
    • 检查StorageClass的
      检查StorageClass的VolumeBindingMode。如果设置为storage.VolumeBindingWaitForFirstConsumer(即延迟绑定模式),则需要进一步检查PVC的注解中是否有annSelectedNode
      • 如果有annSelectedNode且其值不为空,表示已经选定了节点,可以进行供给。方法返回true, nil
      • 如果没有或值为空,则不进行供给。方法返回false, nil
  6. 默认进行供给
    • 如果StorageClass的VolumeBindingMode不是延迟绑定模式,或者没有找到VolumeBindingMode,则默认需要进行供给。方法返回true, nil
  7. 未找到Provisioner
    • 如果在PVC的注解中没有找到任何Provisioner的标识,则不进行供给。方法返回false, nil

总结来说,这段代码通过检查PVC的各种属性和注解,以及关联的StorageClass的配置,来决定是否需要对这个PVC进行卷的供给。这涉及到检查是否已经指定了卷、是否满足特定的供给条件、是否使用了延迟绑定模式等多个方面

// shouldProvision returns whether a claim should have a volume provisioned for
// it, i.e. whether a Provision is "desired"
func (ctrl *ProvisionController) shouldProvision(ctx context.Context, claim *v1.PersistentVolumeClaim) (bool, error) {
    if claim.Spec.VolumeName != "" {
        return false, nil
    }
    if qualifier, ok := ctrl.provisioner.(Qualifier); ok {
        if !qualifier.ShouldProvision(ctx, claim) {
            return false, nil
        }
    }
    provisioner, found := claim.Annotations[annStorageProvisioner]
    if !found {
        provisioner, found = claim.Annotations[annBetaStorageProvisioner]
    }
    if found {
        if ctrl.knownProvisioner(provisioner) {
            claimClass := util.GetPersistentVolumeClaimClass(claim)
            class, err := ctrl.getStorageClass(claimClass)
            if err != nil {
                return false, err
            }
            if class.VolumeBindingMode != nil && *class.VolumeBindingMode == storage.VolumeBindingWaitForFirstConsumer {
                // When claim is in delay binding mode, annSelectedNode is
                // required to provision volume.
                // Though PV controller set annStorageProvisioner only when
                // annSelectedNode is set, but provisioner may remove
                // annSelectedNode to notify scheduler to reschedule again.
                if selectedNode, ok := claim.Annotations[annSelectedNode]; ok && selectedNode != "" {
                    return true, nil
                }
                return false, nil
            }
            return true, nil
        }
    }
    return false, nil
}
provisionClaimOperation()
  1. 获取PVC的类别
    • 使用util.GetPersistentVolumeClaimClass(claim)获取PVC的存储类别(StorageClass)。
  2. 日志记录
    • 使用Kubernetes的日志库klog来记录日志,包括PVC和StorageClass的信息。
  3. 检查PV是否已经存在
    • 通过ctrl.getProvisionedVolumeNameForClaim(claim)获取预期的PV名称,然后检查这个PV是否已经在ctrl.volumes中存在。如果存在,说明PV已经被分配,函数返回ProvisioningFinishederrStopProvision
  4. 获取PVC的引用
    • 使用ref.GetReference(scheme.Scheme, claim)获取PVC的引用,以便在后续操作中引用这个PVC对象。
  5. 检查是否可以分配
    • 调用ctrl.canProvision(ctx, claim)检查当前的ProvisionController是否可以处理这个PVC的分配请求。如果不能,记录事件并返回错误。
  6. 获取StorageClass信息
    • 通过ctrl.getStorageClass(claimClass)获取PVC指定的StorageClass的信息。如果获取失败或StorageClass的Provisioner不被当前ProvisionController支持,则记录错误并返回。
  7. 获取选定的节点
    • 如果PVC的注解中指定了选定的节点(annSelectedNodeannAlphaSelectedNode),则尝试获取这个节点的信息。如果节点不存在,调用ctrl.provisionVolumeErrorHandling处理错误。
  8. 准备分配选项
    • 创建一个ProvisionOptions对象,包含StorageClass、PV名称、PVC对象和选定的节点信息。
  9. 记录正常事件
    • 使用ctrl.eventRecorder.Event记录一个正常事件,表示外部Provisioner正在为PVC分配存储卷。
  10. 调用Provisioner进行分配
    • 调用ctrl.provisioner.Provision(ctx, options)尝试分配存储卷。如果分配失败,根据错误类型进行相应的错误处理。
  11. 设置PVC的引用和Finalizer
    • 如果分配成功,设置PV的ClaimRef为PVC的引用,并根据需要添加Finalizer。
  12. 更新PV的元数据和存储类别
    • 更新PV的注解和存储类别信息。
  13. 存储和添加PV
    • 使用ctrl.volumeStore.StoreVolume存储PV信息,并将PV添加到ctrl.volumes中。
  14. 返回结果
    • 如果所有操作都成功,函数返回ProvisioningFinishednil表示成功完成分配。

这个函数涵盖了从检查PV是否存在到实际分配存储卷,再到更新内部状态和记录相关事件的整个过程。它是Kubernetes存储卷分配流程中的一个关键部分,确保了PVC能够被正确地处理和分配存储资源。

func (ctrl *ProvisionController) provisionClaimOperation(ctx context.Context, claim *v1.PersistentVolumeClaim) (ProvisioningState, error) {
	// Most code here is identical to that found in controller.go of kube's PV controller...
	claimClass := util.GetPersistentVolumeClaimClass(claim)
	logger := klog.LoggerWithValues(klog.FromContext(ctx), "PVC", klog.KObj(claim), "StorageClass", claimClass)
	logger.V(4).Info("Started")

	//  A previous doProvisionClaim may just have finished while we were waiting for
	//  the locks. Check that PV (with deterministic name) hasn't been provisioned
	//  yet.
	pvName := ctrl.getProvisionedVolumeNameForClaim(claim)
	_, exists, err := ctrl.volumes.GetByKey(pvName)
	if err == nil && exists {
		// Volume has been already provisioned, nothing to do.
		logger.V(4).Info("PersistentVolume already exists, skipping", "PV", pvName)
		return ProvisioningFinished, errStopProvision
	}

	// Prepare a claimRef to the claim early (to fail before a volume is
	// provisioned)
	claimRef, err := ref.GetReference(scheme.Scheme, claim)
	if err != nil {
		logger.Error(err, "Unexpected error getting claim reference")
		return ProvisioningNoChange, err
	}

	// Check if this provisioner can provision this claim.
	if err = ctrl.canProvision(ctx, claim); err != nil {
		ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, "ProvisioningFailed", err.Error())
		logger.Error(err, "Failed to provision volume")
		return ProvisioningFinished, errStopProvision
	}

	// For any issues getting fields from StorageClass (including reclaimPolicy & mountOptions),
	// retry the claim because the storageClass can be fixed/(re)created independently of the claim
	class, err := ctrl.getStorageClass(claimClass)
	if err != nil {
		logger.Error(err, "Error getting claim's StorageClass's fields")
		return ProvisioningFinished, err
	}
	if !ctrl.knownProvisioner(class.Provisioner) {
		// class.Provisioner has either changed since shouldProvision() or
		// annDynamicallyProvisioned contains different provisioner than
		// class.Provisioner.
		logger.Error(nil, "Unknown provisioner requested in claim's StorageClass", "provisioner", class.Provisioner)
		return ProvisioningFinished, errStopProvision
	}

	var selectedNode *v1.Node
	// Get SelectedNode
	if nodeName, ok := getString(claim.Annotations, annSelectedNode, annAlphaSelectedNode); ok {
		if ctrl.nodeLister != nil {
			selectedNode, err = ctrl.nodeLister.Get(nodeName)
		} else {
			selectedNode, err = ctrl.client.CoreV1().Nodes().Get(ctx, nodeName, metav1.GetOptions{}) // TODO (verult) cache Nodes
		}
		if err != nil {
			// if node does not exist, reschedule and remove volume.kubernetes.io/selected-node annotation
			if apierrs.IsNotFound(err) {
				ctx2 := klog.NewContext(ctx, logger)
				return ctrl.provisionVolumeErrorHandling(ctx2, ProvisioningReschedule, err, claim)
			}
			err = fmt.Errorf("failed to get target node: %v", err)
			ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, "ProvisioningFailed", err.Error())
			return ProvisioningNoChange, err
		}
	}

	options := ProvisionOptions{
		StorageClass: class,
		PVName:       pvName,
		PVC:          claim,
		SelectedNode: selectedNode,
	}

	ctrl.eventRecorder.Event(claim, v1.EventTypeNormal, "Provisioning", fmt.Sprintf("External provisioner is provisioning volume for claim %q", klog.KObj(claim)))

	volume, result, err := ctrl.provisioner.Provision(ctx, options)
	if err != nil {
		if ierr, ok := err.(*IgnoredError); ok {
			// Provision ignored, do nothing and hope another provisioner will provision it.
			logger.V(4).Info("Volume provision ignored", "reason", ierr)
			return ProvisioningFinished, errStopProvision
		}

		ctx2 := klog.NewContext(ctx, logger)
		err = fmt.Errorf("failed to provision volume with StorageClass %q: %v", claimClass, err)
		return ctrl.provisionVolumeErrorHandling(ctx2, result, err, claim)
	}

	logger.V(4).Info("Volume is provisioned", "PV", volume.Name)

	// Set ClaimRef and the PV controller will bind and set annBoundByController for us
	volume.Spec.ClaimRef = claimRef

	// Add external provisioner finalizer if it doesn't already have it
	if ctrl.addFinalizer && !ctrl.checkFinalizer(volume, finalizerPV) {
		volume.ObjectMeta.Finalizers = append(volume.ObjectMeta.Finalizers, finalizerPV)
	}

	metav1.SetMetaDataAnnotation(&volume.ObjectMeta, annDynamicallyProvisioned, class.Provisioner)
	volume.Spec.StorageClassName = claimClass

	logger.V(4).Info("Succeeded")

	if err := ctrl.volumeStore.StoreVolume(logger, claim, volume); err != nil {
		return ProvisioningFinished, err
	}
	if err = ctrl.volumes.Add(volume); err != nil {
		utilruntime.HandleError(err)
	}
	return ProvisioningFinished, nil
}
Provision()

负责为Kubernetes中的PersistentVolumeClaim(PVC)提供(或创建)一个PersistentVolume(PV)

  1. 检查PVC的注解
    • 首先,方法检查PVC的注解中是否有指定的存储提供者(annStorageProvisionerannBetaStorageProvisioner)。
    • 如果没有找到匹配的存储提供者,或者找到的存储提供者名称与当前驱动的名称不匹配,并且PVC没有标记为已迁移到当前驱动(annMigratedTo),则方法返回一个IgnoredError,表示这个PVC不是由当前存储提供者负责。
  2. 节点检查
    • 调用checkNode方法检查当前节点是否负责该PVC的提供。如果不负责,则返回ProvisioningNoChange状态和相应的错误。
  3. 准备提供
    • 调用prepareProvision方法进行提供前的准备工作,如计算请求的大小、名称等。
    • 如果准备过程中出现错误或返回的结果为空,则直接返回错误和状态。
  4. 创建卷
    • 使用markAsMigrated函数标记上下文,如果PVC已经迁移。
    • 设置一个超时,并调用CSI客户端的CreateVolume方法创建卷。
    • 如果创建失败,根据错误类型和是否支持拓扑感知,决定是否可以重新调度。
  5. 处理创建响应
    • 如果CreateVolume调用成功,从响应中获取卷信息,并设置卷的属性和容量。
    • 如果响应中的容量为0(表示未知),则使用请求中的容量。
    • 如果响应中的容量小于请求的容量,则尝试删除创建的卷,并返回错误。
  6. 检查数据源
    • 如果PVC指定了数据源或数据源引用,并且创建的卷没有提供内容源,则尝试删除卷并返回错误。
  7. 设置PV的其他属性
    • 根据卷的访问模式和CSI驱动的能力,设置PV的只读属性。
    • 设置PV的元数据,包括名称、访问模式、挂载选项、容量等。
    • 如果提供了删除秘密的引用,则在PV上设置相应的注解。
    • 根据存储类的回收策略设置PV的回收策略。
    • 如果支持拓扑感知,设置PV的节点亲和性。
    • 根据PVC的规范设置PV的卷模式(文件系统或块设备)和文件系统类型。
    • 如果启用了卷属性类特性,并且PVC指定了卷属性类名,则在PV上设置该名称。
  8. 处理迁移的卷
    • 如果PVC是从其他存储提供者迁移过来的,尝试将CSI PV转换为内部(in-tree)PV。如果转换失败,则删除已创建的PV并返回错误。
  9. 成功创建PV
    • 如果所有步骤都成功完成,返回创建的PV、ProvisioningFinished状态和nil错误。

这段代码是Kubernetes CSI存储提供者的一部分,负责处理PVC的提供流程,包括检查、准备、创建卷以及设置PV的各种属性。

func (p *csiProvisioner) Provision(ctx context.Context, options controller.ProvisionOptions) (*v1.PersistentVolume, controller.ProvisioningState, error) {
	claim := options.PVC
	provisioner, ok := claim.Annotations[annStorageProvisioner]
	if !ok {
		provisioner = claim.Annotations[annBetaStorageProvisioner]
	}
	if provisioner != p.driverName && claim.Annotations[annMigratedTo] != p.driverName {
		// The storage provisioner annotation may not equal driver name but the
		// PVC could have annotation "migrated-to" which is the new way to
		// signal a PVC is migrated (k8s v1.17+)
		return nil, controller.ProvisioningFinished, &controller.IgnoredError{
			Reason: fmt.Sprintf("PVC annotated with external-provisioner name %s does not match provisioner driver name %s. This could mean the PVC is not migrated",
				provisioner,
				p.driverName),
		}
	}

	// The same check already ran in ShouldProvision, but perhaps
	// it couldn't complete due to some unexpected error.
	owned, err := p.checkNode(ctx, claim, options.StorageClass, "provision")
	if err != nil {
		return nil, controller.ProvisioningNoChange,
			fmt.Errorf("node check failed: %v", err)
	}
	if !owned {
		return nil, controller.ProvisioningNoChange, &controller.IgnoredError{
			Reason: fmt.Sprintf("not responsible for provisioning of PVC %s/%s because it is not assigned to node %q", claim.Namespace, claim.Name, p.nodeDeployment.NodeName),
		}
	}

	result, state, err := p.prepareProvision(ctx, claim, options.StorageClass, options.SelectedNode)
	if result == nil {
		return nil, state, err
	}
	req := result.req
	volSizeBytes := req.CapacityRange.RequiredBytes
	pvName := req.Name
	provisionerCredentials := req.Secrets

	createCtx := markAsMigrated(ctx, result.migratedVolume)
	createCtx, cancel := context.WithTimeout(createCtx, p.timeout)
	defer cancel()
	rep, err := p.csiClient.CreateVolume(createCtx, req)
	if err != nil {
		// Giving up after an error and telling the pod scheduler to retry with a different node
		// only makes sense if:
		// - The CSI driver supports topology: without that, the next CreateVolume call after
		//   rescheduling will be exactly the same.
		// - We are working on a volume with late binding: only in that case will
		//   provisioning be retried if we give up for now.
		// - The error is one where rescheduling is
		//   a) allowed (i.e. we don't have to keep calling CreateVolume because the operation might be running) and
		//   b) it makes sense (typically local resource exhausted).
		//   isFinalError is going to check this.
		//
		// We do this regardless whether the driver has asked for strict topology because
		// even drivers which did not ask for it explicitly might still only look at the first
		// topology entry and thus succeed after rescheduling.
		mayReschedule := p.supportsTopology() &&
			options.SelectedNode != nil
		state := checkError(err, mayReschedule)
		klog.V(5).Infof("CreateVolume failed, supports topology = %v, node selected %v => may reschedule = %v => state = %v: %v",
			p.supportsTopology(),
			options.SelectedNode != nil,
			mayReschedule,
			state,
			err)
		return nil, state, err
	}

	if rep.Volume != nil {
		klog.V(3).Infof("create volume rep: %+v", *rep.Volume)
	}
	volumeAttributes := map[string]string{provisionerIDKey: p.identity}
	for k, v := range rep.Volume.VolumeContext {
		volumeAttributes[k] = v
	}
	respCap := rep.GetVolume().GetCapacityBytes()

	// According to CSI spec CreateVolume should be able to return capacity = 0, which means it is unknown. for example NFS/FTP
	if respCap == 0 {
		respCap = volSizeBytes
		klog.V(3).Infof("csiClient response volume with size 0, which is not supported by apiServer, will use claim size:%d", respCap)
	} else if respCap < volSizeBytes {
		capErr := fmt.Errorf("created volume capacity %v less than requested capacity %v", respCap, volSizeBytes)
		delReq := &csi.DeleteVolumeRequest{
			VolumeId: rep.GetVolume().GetVolumeId(),
		}
		err = cleanupVolume(ctx, p, delReq, provisionerCredentials)
		if err != nil {
			capErr = fmt.Errorf("%v. Cleanup of volume %s failed, volume is orphaned: %v", capErr, pvName, err)
		}
		// use InBackground to retry the call, hoping the volume is deleted correctly next time.
		return nil, controller.ProvisioningInBackground, capErr
	}

	if options.PVC.Spec.DataSource != nil ||
		(utilfeature.DefaultFeatureGate.Enabled(features.CrossNamespaceVolumeDataSource) &&
			options.PVC.Spec.DataSourceRef != nil && options.PVC.Spec.DataSourceRef.Namespace != nil &&
			len(*options.PVC.Spec.DataSourceRef.Namespace) > 0) {
		contentSource := rep.GetVolume().ContentSource
		if contentSource == nil {
			sourceErr := fmt.Errorf("volume content source missing")
			delReq := &csi.DeleteVolumeRequest{
				VolumeId: rep.GetVolume().GetVolumeId(),
			}
			err = cleanupVolume(ctx, p, delReq, provisionerCredentials)
			if err != nil {
				sourceErr = fmt.Errorf("%v. cleanup of volume %s failed, volume is orphaned: %v", sourceErr, pvName, err)
			}
			return nil, controller.ProvisioningInBackground, sourceErr
		}
	}
	pvReadOnly := false
	volCaps := req.GetVolumeCapabilities()
	// if the request only has one accessmode and if its ROX, set readonly to true
	// TODO: check for the driver capability of MULTI_NODE_READER_ONLY capability from the CSI driver
	if len(volCaps) == 1 && volCaps[0].GetAccessMode().GetMode() == csi.VolumeCapability_AccessMode_MULTI_NODE_READER_ONLY && p.controllerPublishReadOnly {
		pvReadOnly = true
	}

	result.csiPVSource.VolumeHandle = p.volumeIdToHandle(rep.Volume.VolumeId)
	result.csiPVSource.VolumeAttributes = volumeAttributes
	result.csiPVSource.ReadOnly = pvReadOnly
	pv := &v1.PersistentVolume{
		ObjectMeta: metav1.ObjectMeta{
			Name: pvName,
		},
		Spec: v1.PersistentVolumeSpec{
			AccessModes:  options.PVC.Spec.AccessModes,
			MountOptions: options.StorageClass.MountOptions,
			Capacity: v1.ResourceList{
				v1.ResourceName(v1.ResourceStorage): bytesToQuantity(respCap),
			},
			// TODO wait for CSI VolumeSource API
			PersistentVolumeSource: v1.PersistentVolumeSource{
				CSI: result.csiPVSource,
			},
		},
	}

	// Set annDeletionSecretRefName and namespace in PV object.
	if result.provDeletionSecrets != nil {
		klog.V(5).Infof("createVolumeOperation: set annotation [%s/%s] on pv [%s].", annDeletionProvisionerSecretRefNamespace, annDeletionProvisionerSecretRefName, pv.Name)
		metav1.SetMetaDataAnnotation(&pv.ObjectMeta, annDeletionProvisionerSecretRefName, result.provDeletionSecrets.name)
		metav1.SetMetaDataAnnotation(&pv.ObjectMeta, annDeletionProvisionerSecretRefNamespace, result.provDeletionSecrets.namespace)
	} else {
		metav1.SetMetaDataAnnotation(&pv.ObjectMeta, annDeletionProvisionerSecretRefName, "")
		metav1.SetMetaDataAnnotation(&pv.ObjectMeta, annDeletionProvisionerSecretRefNamespace, "")
	}

	if options.StorageClass.ReclaimPolicy != nil {
		pv.Spec.PersistentVolumeReclaimPolicy = *options.StorageClass.ReclaimPolicy
	}

	if p.supportsTopology() {
		pv.Spec.NodeAffinity = GenerateVolumeNodeAffinity(rep.Volume.AccessibleTopology)
	}

	// Set VolumeMode to PV if it is passed via PVC spec when Block feature is enabled
	if options.PVC.Spec.VolumeMode != nil {
		pv.Spec.VolumeMode = options.PVC.Spec.VolumeMode
	}
	// Set FSType if PV is not Block Volume
	if !util.CheckPersistentVolumeClaimModeBlock(options.PVC) {
		pv.Spec.PersistentVolumeSource.CSI.FSType = result.fsType
	}

	vacName := claim.Spec.VolumeAttributesClassName
	if utilfeature.DefaultFeatureGate.Enabled(features.VolumeAttributesClass) && vacName != nil && *vacName != "" {
		pv.Spec.VolumeAttributesClassName = vacName
	}

	klog.V(2).Infof("successfully created PV %v for PVC %v and csi volume name %v", pv.Name, options.PVC.Name, pv.Spec.CSI.VolumeHandle)

	if result.migratedVolume {
		pv, err = p.translator.TranslateCSIPVToInTree(pv)
		if err != nil {
			klog.Warningf("failed to translate CSI PV to in-tree due to: %v. Deleting provisioned PV", err)
			deleteErr := p.Delete(ctx, pv)
			if deleteErr != nil {
				klog.Warningf("failed to delete partly provisioned PV: %v", deleteErr)
				// Retry the call again to clean up the orphan
				return nil, controller.ProvisioningInBackground, err
			}
			return nil, controller.ProvisioningFinished, err
		}
	}

	klog.V(5).Infof("successfully created PV %+v", pv.Spec.PersistentVolumeSource)
	return pv, controller.ProvisioningFinished, nil
}
checkNode()
  1. 检查nodeDeployment是否为nil
    • 如果是nil,则直接返回true, nil,表示应该进行提供(可能是因为没有节点部署的限制)。
  2. 获取选定的节点
    • 从PVC的注解中获取annSelectedNode的值,作为选定的节点。
  3. 根据选定的节点进行分支处理
    • 没有选定节点 (selectedNode == ""):
      • 记录日志,表示开始检查PVC的节点。
      • 如果sc(存储类)为nil,则尝试通过scLister获取存储类对象。
      • 检查存储类的VolumeBindingMode是否为VolumeBindingImmediate,并且nodeDeployment.ImmediateBinding是否为true。如果不是,则返回false, nil
      • 如果存储类设置了AllowedTopologies,则检查当前节点是否符合这些拓扑要求。如果不符合,则记录日志并返回false, nil
      • 尝试通过becomeOwner方法将当前节点设置为PVC的所有者。如果失败,则返回错误。
      • 返回false, nil,表示不需要在当前节点上进行提供,因为已经处理了所有者设置。
    • 选定的节点是当前节点 (selectedNode == p.nodeDeployment.NodeName):
      • 返回true, nil,表示应该在当前节点上进行提供。
    • 选定的节点是其他节点
      • 返回false, nil,表示忽略该PVC,因为它已经被其他节点选定。
// checkNode optionally checks whether the PVC is assigned to the current node.
// If the PVC uses immediate binding, it will try to take the PVC for provisioning
// on the current node. Returns true if provisioning can proceed, an error
// in case of a failure that prevented checking.
func (p *csiProvisioner) checkNode(ctx context.Context, claim *v1.PersistentVolumeClaim, sc *storagev1.StorageClass, caller string) (provision bool, err error) {
	if p.nodeDeployment == nil {
		return true, nil
	}

	var selectedNode string
	if claim.Annotations != nil {
		selectedNode = claim.Annotations[annSelectedNode]
	}
	switch selectedNode {
	case "":
		logger := klog.V(5)
		if logger.Enabled() {
			logger.Infof("%s: checking node for PVC %s/%s with resource version %s", caller, claim.Namespace, claim.Name, claim.ResourceVersion)
			defer func() {
				logger.Infof("%s: done checking node for PVC %s/%s with resource version %s: provision %v, err %v", caller, claim.Namespace, claim.Name, claim.ResourceVersion, provision, err)
			}()
		}

		if sc == nil {
			var err error
			sc, err = p.scLister.Get(*claim.Spec.StorageClassName)
			if err != nil {
				return false, err
			}
		}
		if sc.VolumeBindingMode == nil ||
			*sc.VolumeBindingMode != storagev1.VolumeBindingImmediate ||
			!p.nodeDeployment.ImmediateBinding {
			return false, nil
		}

		// If the storage class has AllowedTopologies set, then
		// it must match our own. We can find out by trying to
		// create accessibility requirements.  If that fails,
		// we should not become the owner.
		if len(sc.AllowedTopologies) > 0 {
			node, err := p.nodeLister.Get(p.nodeDeployment.NodeName)
			if err != nil {
				return false, err
			}
			if _, err := GenerateAccessibilityRequirements(
				p.client,
				p.driverName,
				claim.Name,
				sc.AllowedTopologies,
				node,
				p.strictTopology,
				p.immediateTopology,
				p.csiNodeLister,
				p.nodeLister); err != nil {
				if logger.Enabled() {
					logger.Infof("%s: ignoring PVC %s/%s, allowed topologies is not compatible: %v", caller, claim.Namespace, claim.Name, err)
				}
				return false, nil
			}
		}

		// Try to select the current node if there is a chance of it
		// being created there, i.e. there is currently enough free space (checked in becomeOwner).
		//
		// If later volume provisioning fails on this node, the annotation will be unset and node
		// selection will happen again. If no other node picks up the volume, then the PVC remains
		// in the queue and this check will be repeated from time to time.
		//
		// A lot of different external-provisioner instances will try to do this at the same time.
		// To avoid the thundering herd problem, we sleep in becomeOwner for a short random amount of time
		// (for new PVCs) or exponentially increasing time (for PVCs were we already had a conflict).
		if err := p.nodeDeployment.becomeOwner(ctx, p, claim); err != nil {
			return false, fmt.Errorf("PVC %s/%s: %v", claim.Namespace, claim.Name, err)
		}

		// We are now either the owner or someone else is. We'll check when the updated PVC
		// enters the workqueue and gets processed by sig-storage-lib-external-provisioner.
		return false, nil
	case p.nodeDeployment.NodeName:
		// Our node is selected.
		return true, nil
	default:
		// Some other node is selected, ignore it.
		return false, nil
	}
}
prepareProvision()

这段Go代码是一个名为prepareProvision的函数,属于csiProvisioner结构体。该函数的主要目的是为Kubernetes中的PersistentVolumeClaim(PVC)准备CSI(Container Storage Interface)存储卷的创建请求。以下是该函数的主要步骤和逻辑的详细解释:

  1. 检查StorageClass
    • 如果传入的StorageClass(SC)为nil,则函数返回错误,表示存储类不存在。
  2. 处理数据源
    • 调用dataSource方法获取PVC的数据源(如果有的话),这可能是一个卷快照或其他类型的源。
    • 如果获取数据源时出错,则返回错误。
  3. 处理从in-tree插件迁移的卷
    • 如果CSI provisioner支持从特定的in-tree插件迁移,且当前SC的provisioner与支持的in-tree插件名称匹配,则尝试将SC翻译为CSI兼容的SC。
    • 如果翻译失败,则返回错误。
  4. 检查所需的能力
    • 根据数据源的类型(如卷快照或PVC克隆),设置所需的能力标志(如是否需要快照或克隆)。
    • 如果数据源类型不受支持,则记录事件并返回,表示外部填充器将负责卷的创建。
  5. 处理卷属性类
    • 如果启用了VolumeAttributesClass特性,并且PVC指定了VolumeAttributesClassName,则记录需要修改卷的标志。
  6. 检查驱动能力
    • 调用checkDriverCapabilities方法检查CSI驱动是否支持所需的能力(如快照、克隆等)。
    • 如果不支持,则返回错误。
  7. 检查PVC选择器
    • 如果PVC指定了选择器,则返回错误,因为选择器当前不受支持。
  8. 生成卷名
    • 使用makeVolumeName方法生成卷的名称。
  9. 处理文件系统类型
    • 从SC的参数中获取文件系统类型(fstype)。
    • 如果同时使用了新旧两种键来指定fstype,则返回错误。
    • 如果没有指定fstype,但设置了默认的文件系统类型,则使用默认值。
  10. 获取卷能力
    • 调用getVolumeCapabilities方法获取卷的能力(如访问模式、挂载选项等)。
  11. 创建CSI CreateVolumeRequest
    • 构建一个CSI CreateVolumeRequest请求,包括卷名、参数、卷能力、容量范围等。
  12. 处理数据源内容
    • 如果存在数据源(如克隆或快照),则获取卷内容源并添加到请求中。
    • 对于克隆操作,设置克隆终结器。
  13. 处理拓扑要求
    • 如果支持拓扑,则生成可访问性要求并添加到请求中。
  14. 处理秘密引用
    • 解析并获取用于provisioner、控制器发布、节点阶段、节点发布、控制器扩展和节点扩展的秘密引用。
  15. 移除带前缀的参数
    • 从SC参数中移除带前缀的键。
  16. 添加额外的创建元数据
    • 如果启用了额外创建元数据,则将PVC和PV的元数据添加到请求中。
  17. 处理删除秘密参数
    • 记录需要在卷删除时删除的秘密。
  18. 处理卷属性类参数
    • 如果指定了VolumeAttributesClassName,则获取对应的VAC(Volume Attributes Class)对象,并检查其驱动名称是否匹配。
    • 将VAC的参数添加到请求的可变参数中。
  19. 返回准备结果
    • 构建并返回一个prepareProvisionResult对象,包含文件系统类型、是否迁移了卷、CSI创建请求、CSI PV源和需要删除的秘密参数。

该函数通过一系列的检查和准备步骤,构建了一个CSI创建卷的请求,并返回了相关的准备结果,以便后续进行卷的实际创建操作。

// prepareProvision does non-destructive parameter checking and preparations for provisioning a volume.
func (p *csiProvisioner) prepareProvision(ctx context.Context, claim *v1.PersistentVolumeClaim, sc *storagev1.StorageClass, selectedNode *v1.Node) (*prepareProvisionResult, controller.ProvisioningState, error) {
	if sc == nil {
		return nil, controller.ProvisioningFinished, errors.New("storage class was nil")
	}

	// normalize dataSource and dataSourceRef.
	dataSource, err := p.dataSource(ctx, claim)
	if err != nil {
		return nil, controller.ProvisioningFinished, err
	}

	migratedVolume := false
	if p.supportsMigrationFromInTreePluginName != "" {
		// NOTE: we cannot depend on PVC.Annotations[volume.beta.kubernetes.io/storage-provisioner] to get
		// the in-tree provisioner name in case of CSI migration scenarios. The annotation will be
		// set to the CSI provisioner name by PV controller for migration scenarios
		// so that external provisioner can correctly pick up the PVC pointing to an in-tree plugin
		if sc.Provisioner == p.supportsMigrationFromInTreePluginName {
			klog.V(2).Infof("translating storage class for in-tree plugin %s to CSI", sc.Provisioner)
			storageClass, err := p.translator.TranslateInTreeStorageClassToCSI(p.supportsMigrationFromInTreePluginName, sc)
			if err != nil {
				return nil, controller.ProvisioningFinished, fmt.Errorf("failed to translate storage class: %v", err)
			}
			sc = storageClass
			migratedVolume = true
		} else {
			klog.V(4).Infof("skip translation of storage class for plugin: %s", sc.Provisioner)
		}
	}

	// Make sure the plugin is capable of fulfilling the requested options
	rc := &requiredCapabilities{}
	if dataSource != nil {
		// PVC.Spec.DataSource.Name is the name of the VolumeSnapshot API object
		if dataSource.Name == "" {
			return nil, controller.ProvisioningFinished, fmt.Errorf("the PVC source not found for PVC %s", claim.Name)
		}

		switch dataSource.Kind {
		case snapshotKind:
			if dataSource.APIVersion != snapshotAPIGroup {
				return nil, controller.ProvisioningFinished, fmt.Errorf("the PVC source does not belong to the right APIGroup. Expected %s, Got %s", snapshotAPIGroup, dataSource.APIVersion)
			}
			rc.snapshot = true
		case pvcKind:
			rc.clone = true
		default:
			// DataSource is not VolumeSnapshot and PVC
			// Assume external data populator to create the volume, and there is no more work for us to do
			p.eventRecorder.Event(claim, v1.EventTypeNormal, "Provisioning", "Assuming an external populator will provision the volume")
			return nil, controller.ProvisioningFinished, &controller.IgnoredError{
				Reason: fmt.Sprintf("data source (%s) is not handled by the provisioner, assuming an external populator will provision it",
					dataSource.Kind),
			}
		}
	}

	var vacName string
	if utilfeature.DefaultFeatureGate.Enabled(features.VolumeAttributesClass) {
		if claim.Spec.VolumeAttributesClassName != nil {
			vacName = *claim.Spec.VolumeAttributesClassName
		}
	}

	if vacName != "" {
		rc.modifyVolume = true
	}

	if err := p.checkDriverCapabilities(rc); err != nil {
		return nil, controller.ProvisioningFinished, err
	}

	if claim.Spec.Selector != nil {
		return nil, controller.ProvisioningFinished, fmt.Errorf("claim Selector is not supported")
	}

	pvName, err := makeVolumeName(p.volumeNamePrefix, string(claim.ObjectMeta.UID), p.volumeNameUUIDLength)
	if err != nil {
		return nil, controller.ProvisioningFinished, err
	}

	fsTypesFound := 0
	fsType := ""
	for k, v := range sc.Parameters {
		if strings.ToLower(k) == "fstype" || k == prefixedFsTypeKey {
			fsType = v
			fsTypesFound++
		}
		if strings.ToLower(k) == "fstype" {
			klog.Warningf(deprecationWarning("fstype", prefixedFsTypeKey, ""))
		}
	}
	if fsTypesFound > 1 {
		return nil, controller.ProvisioningFinished, fmt.Errorf("fstype specified in parameters with both \"fstype\" and \"%s\" keys", prefixedFsTypeKey)
	}
	if fsType == "" && p.defaultFSType != "" {
		fsType = p.defaultFSType
	}

	capacity := claim.Spec.Resources.Requests[v1.ResourceName(v1.ResourceStorage)]
	volSizeBytes := capacity.Value()

	volumeCaps, err := p.getVolumeCapabilities(claim, sc, fsType)
	if err != nil {
		return nil, controller.ProvisioningFinished, err
	}

	// Create a CSI CreateVolumeRequest and Response
	req := csi.CreateVolumeRequest{
		Name:               pvName,
		Parameters:         sc.Parameters,
		VolumeCapabilities: volumeCaps,
		CapacityRange: &csi.CapacityRange{
			RequiredBytes: int64(volSizeBytes),
		},
	}

	if dataSource != nil && (rc.clone || rc.snapshot) {
		volumeContentSource, err := p.getVolumeContentSource(ctx, claim, sc, dataSource)
		if err != nil {
			return nil, controller.ProvisioningNoChange, fmt.Errorf("error getting handle for DataSource Type %s by Name %s: %v", dataSource.Kind, dataSource.Name, err)
		}
		req.VolumeContentSource = volumeContentSource
	}

	if dataSource != nil && rc.clone {
		err = p.setCloneFinalizer(ctx, claim, dataSource)
		if err != nil {
			return nil, controller.ProvisioningNoChange, err
		}
	}

	if p.supportsTopology() {
		requirements, err := GenerateAccessibilityRequirements(
			p.client,
			p.driverName,
			claim.Name,
			sc.AllowedTopologies,
			selectedNode,
			p.strictTopology,
			p.immediateTopology,
			p.csiNodeLister,
			p.nodeLister)
		if err != nil {
			return nil, controller.ProvisioningNoChange, fmt.Errorf("error generating accessibility requirements: %v", err)
		}
		req.AccessibilityRequirements = requirements
	}

	// Resolve provision secret credentials.
	provisionerSecretRef, err := getSecretReference(provisionerSecretParams, sc.Parameters, pvName, claim)
	if err != nil {
		return nil, controller.ProvisioningNoChange, err
	}
	provisionerCredentials, err := getCredentials(ctx, p.client, provisionerSecretRef)
	if err != nil {
		return nil, controller.ProvisioningNoChange, err
	}
	req.Secrets = provisionerCredentials

	// Resolve controller publish, node stage, node publish secret references
	controllerPublishSecretRef, err := getSecretReference(controllerPublishSecretParams, sc.Parameters, pvName, claim)
	if err != nil {
		return nil, controller.ProvisioningNoChange, err
	}
	nodeStageSecretRef, err := getSecretReference(nodeStageSecretParams, sc.Parameters, pvName, claim)
	if err != nil {
		return nil, controller.ProvisioningNoChange, err
	}
	nodePublishSecretRef, err := getSecretReference(nodePublishSecretParams, sc.Parameters, pvName, claim)
	if err != nil {
		return nil, controller.ProvisioningNoChange, err
	}
	controllerExpandSecretRef, err := getSecretReference(controllerExpandSecretParams, sc.Parameters, pvName, claim)
	if err != nil {
		return nil, controller.ProvisioningNoChange, err
	}
	nodeExpandSecretRef, err := getSecretReference(nodeExpandSecretParams, sc.Parameters, pvName, claim)
	if err != nil {
		return nil, controller.ProvisioningNoChange, err
	}
	csiPVSource := &v1.CSIPersistentVolumeSource{
		Driver: p.driverName,
		// VolumeHandle and VolumeAttributes will be added after provisioning.
		ControllerPublishSecretRef: controllerPublishSecretRef,
		NodeStageSecretRef:         nodeStageSecretRef,
		NodePublishSecretRef:       nodePublishSecretRef,
		ControllerExpandSecretRef:  controllerExpandSecretRef,
		NodeExpandSecretRef:        nodeExpandSecretRef,
	}

	req.Parameters, err = removePrefixedParameters(sc.Parameters)
	if err != nil {
		return nil, controller.ProvisioningFinished, fmt.Errorf("failed to strip CSI Parameters of prefixed keys: %v", err)
	}

	if p.extraCreateMetadata {
		// add pvc and pv metadata to request for use by the plugin
		req.Parameters[pvcNameKey] = claim.GetName()
		req.Parameters[pvcNamespaceKey] = claim.GetNamespace()
		req.Parameters[pvNameKey] = pvName
	}
	deletionAnnSecrets := new(deletionSecretParams)

	if provisionerSecretRef != nil {
		deletionAnnSecrets.name = provisionerSecretRef.Name
		deletionAnnSecrets.namespace = provisionerSecretRef.Namespace
	}

	if vacName != "" {
		vac, err := p.client.StorageV1alpha1().VolumeAttributesClasses().Get(ctx, vacName, metav1.GetOptions{})
		if err != nil {
			return nil, controller.ProvisioningNoChange, err
		}

		if vac.DriverName != p.driverName {
			return nil, controller.ProvisioningFinished, fmt.Errorf("VAC %s referenced in PVC is for driver %s which does not match driver name %s", vacName, vac.DriverName, p.driverName)
		}

		req.MutableParameters = vac.Parameters
	}

	return &prepareProvisionResult{
		fsType:              fsType,
		migratedVolume:      migratedVolume,
		req:                 &req,
		csiPVSource:         csiPVSource,
		provDeletionSecrets: deletionAnnSecrets,
	}, controller.ProvisioningNoChange, nil

}
<think>好的,我现在需要解决用户的问题,关于Kubernetes PVC处于Pending状态,ProvisioningFailed错误,涉及cephfs.csi.ceph.com存储类和DeadlineExceeded错误。首先,我需要回顾用户提供的引用内容,看看有没有相关的信息。根据引用[2],当PVC无法申请PV时,可能的原因是认证失败,错误信息显示“Operation notpermitted”,这可能与Ceph集群的权限设置有关。用户提到使用ceph-csi插件的cephfs方式部署存储,引用[3]中提到了部署步骤,需要检查RBAC配置是否正确,比如csi-provisioner的ServiceAccount是否有足够的权限。此外,引用[4]中显示csi-rbdplugin-provisioner的Pod处于Running状态,但用户的问题是关于cephfs的,可能需要确认cephfs的插件是否部署正确。DeadlineExceeded错误通常表示操作超时,可能原因包括存储后端响应慢、网络问题或配置错误。需要检查Ceph集群的状态,确保MON和MDS服务正常运行,同时确认存储类参数是否正确,比如monitors的地址是否正确,secret是否存在且有效。用户提到的PVC访问模式是ReadWriteMany,而引用[2]提到如果使用块存储(如RBD),可能需要改为ReadWriteOnce,但cephfs支持RWX,所以这里可能不是访问模式的问题,但需要确认存储类是否支持所需的访问模式。 另外,引用[3]中部署cephfsCSI时没有启用PodSecurityPolicy,但需要确认集群的PSP或新的PodSecurityAdmission是否影响了CSI插件的运行,可能需要调整安全策略。总结可能的解决步骤: 1.检查Ceph集群状态和网络连接。2.验证存储类配置,包括monitors地址和secret。3.检查RBAC配置,确保Provisioner有足够权限。4.查看CSI插件日志,确认具体错误原因。 5.确保PVC的访问模式与存储类支持的模式匹配。 6.调整超时设置或排查Ceph集群性能问题。 还需要参考用户的问题历史,如果有之前的配置错误,可能需要针对性检查。例如,用户之前可能遇到认证问题,需要确保ceph-csi使用的secret正确,并且Ceph集群的权限设置允许CSI插件进行操作。</think>针对Kubernetes PVC处于Pending状态且出现`ProvisioningFailed`错误的问题(涉及`cephfs.csi.ceph.com`存储类和`DeadlineExceeded`错误),以下是分步解决方案: --- ### **一、核心原因分析** 1. **存储插件通信异常**:Ceph CSI插件无法与Ceph集群建立有效连接,可能因网络问题或Ceph服务异常导致超时[^2][^3]。 2. **权限配置错误**:Ceph CSI插件缺少访问Ceph集群的权限(如cephx认证信息错误)[^2]。 3. **存储类参数错误**:存储类`csi-cephfs-sc`中配置的Ceph MON地址、文件系统名称等参数不匹配[^3]。 4. **资源限制或性能瓶颈**:Ceph集群负载过高导致响应超时(DeadlineExceeded)[^2]。 --- ### **二、具体解决步骤** #### **1. 检查Ceph集群状态** ```bash # 进入Ceph集群管理节点执行 ceph -s # 确认集群状态为HEALTH_OK ceph fs status # 确认cephfs文件系统状态正常 ``` - 若集群状态异常,优先修复Ceph集群(如MON或MDS服务故障)[^3]。 --- #### **2. 验证Ceph CSI插件配置** **(1) 检查存储类参数** ```yaml # 存储类示例片段(csi-cephfs-sc) parameters: clusterID: <ceph-cluster-id> fsName: <cephfs-name> monitors: 10.0.0.1:6789,10.0.0.2:6789 # 必须与Ceph集群实际MON地址一致 ``` - 使用`kubectl get storageclass csi-cephfs-sc -o yaml`核对参数[^3]。 **(2) 检查Secret配置** ```yaml # Ceph CSI使用的Secret(通常为ceph-csi-secret) data: userID: <base64编码的ceph用户ID> userKey: <base64编码的ceph用户密钥> ``` - 确保Secret存在且信息正确:`kubectl get secret -n <namespace>`[^2]。 --- #### **3. 查看CSI插件日志** ```bash # 查找cephfs插件的Provisioner Pod kubectl get pods -l app=csi-cephfsplugin-provisioner # 查看Provisioner日志(重点关注错误码) kubectl logs <provisioner-pod-name> -c csi-provisioner | grep -i deadline ``` - 若日志中出现`DeadlineExceeded`,通常表示Ceph集群响应超时,需检查Ceph性能或网络延迟[^2][^4]。 --- #### **4. 调整超时参数(可选)** 在存储类中添加超时配置: ```yaml parameters: ... grpcTimeout: "30" # 单位秒,默认15秒可能不足 ``` --- #### **5. 验证RBAC权限** 确保CSI插件ServiceAccount绑定了正确的ClusterRole: ```bash kubectl describe clusterrole csi-provisioner-role # 引用[3]中的csi-provisioner-rbac.yaml kubectl describe clusterrolebinding csi-provisioner-binding ``` - 缺少`persistentvolumeclaims`的`create`权限会导致Provisioning失败[^3]。 --- #### **6. 测试PVC创建** 使用最小化PVC测试: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-test-pvc spec: accessModes: [ReadWriteMany] resources: requests: storage: 1Gi storageClassName: csi-cephfs-sc ``` - 若仍失败,检查PVC事件:`kubectl describe pvc cephfs-test-pvc`[^1][^2]。 --- ### **三、附录:常见错误对照表** | 错误现象 | 解决方案 | |---------|---------| | `DeadlineExceeded` | 检查Ceph集群负载、网络延迟,增大`grpcTimeout` | | `Operation not permitted` | 修复Ceph用户权限或Secret配置[^2] | | `monitors missing` | 核对存储类中MON地址[^3] | ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值