Orleans 在 Kubernetes 上的部署配置与源码机制说明
本文基于源码和官方文档梳理在 Kubernetes 上托管 Orleans 的正确姿势,包含:
- 配置点与约束
- 关键源码行为与引用
- 示例应用代码与 Kubernetes YAML
- 启动与运行期的时序图
- 常见问题与排查
参考文档:
- Kubernetes hosting(官方文档): learn.microsoft.com - Orleans Kubernetes hosting
一、核心概念与约束
- 使用
Microsoft.Orleans.Hosting.Kubernetes增强在 Kubernetes 的托管体验,通过UseKubernetesHosting()完成:- 将
SiloOptions.SiloName设为 Pod 名称 - 将
EndpointOptions.AdvertisedIPAddress设为 Pod IP(或由 PodName 解析) - 将
EndpointOptions.SiloListeningEndpoint/GatewayListeningEndpoint绑定到 Any 地址,端口默认11111 / 30000 - 从 Pod 标签/环境变量设置
ClusterOptions.ServiceId与ClusterOptions.ClusterId - 启动期:探测 K8s 中不再存在的 Pod 与 Orleans 成员差异,标记失配 Silo 为 Dead
- 运行期:集群内仅选取少量 Silo(默认 2 个)作为“观察者”监视 K8s 事件,减少 API Server 压力
- 将
- 注意:Kubernetes 托管不等于 Orleans 集群成员管理(Clustering Provider 仍需单独配置,如 Azure Storage/ADO.NET/Consul 等)
- 必要标签与环境变量:
- Pod 标签:
orleans/serviceId、orleans/clusterId - 环境变量:
POD_NAME、POD_NAMESPACE、POD_IP、ORLEANS_SERVICE_ID、ORLEANS_CLUSTER_ID
- Pod 标签:
二、关键源码位置与行为
- 托管扩展注册与默认配置(添加
ConfigureKubernetesHostingOptions与KubernetesClusterAgent)
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Options;
using Orleans.Configuration;
using Orleans.Hosting.Kubernetes;
using Orleans.Runtime;
using System;
namespace Orleans.Hosting
{
/// <summary>
/// Extensions for hosting a silo in Kubernetes.
/// </summary>
public static class KubernetesHostingExtensions
{
/// <summary>
/// Adds Kubernetes hosting support.
/// </summary>
public static ISiloBuilder UseKubernetesHosting(this ISiloBuilder siloBuilder)
{
return siloBuilder.ConfigureServices(services => services.UseKubernetesHosting(configureOptions: null));
}
/// <summary>
/// Adds Kubernetes hosting support.
/// </summary>
public static ISiloBuilder UseKubernetesHosting(this ISiloBuilder siloBuilder, Action<OptionsBuilder<KubernetesHostingOptions>> configureOptions)
{
return siloBuilder.ConfigureServices(services => services.UseKubernetesHosting(configureOptions));
}
/// <summary>
/// Adds Kubernetes hosting support.
/// </summary>
public static IServiceCollection UseKubernetesHosting(this IServiceCollection services) => services.UseKubernetesHosting(configureOptions: null);
/// <summary>
/// Adds Kubernetes hosting support.
/// </summary>
public static IServiceCollection UseKubernetesHosting(this IServiceCollection services, Action<OptionsBuilder<KubernetesHostingOptions>> configureOptions)
{
configureOptions?.Invoke(services.AddOptions<KubernetesHostingOptions>());
// Configure defaults based on the current environment.
services.AddSingleton<IConfigureOptions<ClusterOptions>, ConfigureKubernetesHostingOptions>();
services.AddSingleton<IConfigureOptions<SiloOptions>, ConfigureKubernetesHostingOptions>();
services.AddSingleton<IPostConfigureOptions<EndpointOptions>, ConfigureKubernetesHostingOptions>();
services.AddSingleton<IConfigureOptions<KubernetesHostingOptions>, ConfigureKubernetesHostingOptions>();
services.AddSingleton<IValidateOptions<KubernetesHostingOptions>, KubernetesHostingOptionsValidator>();
services.AddSingleton<ILifecycleParticipant<ISiloLifecycle>, KubernetesClusterAgent>();
return services;
}
}
}
- 环境变量/标签映射与端点配置(将
POD_*映射到SiloOptions/EndpointOptions,将ORLEANS_*映射到ClusterOptions)
#nullable enable
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Options;
using Orleans.Configuration;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;
namespace Orleans.Hosting.Kubernetes
{
internal class ConfigureKubernetesHostingOptions :
IConfigureOptions<ClusterOptions>,
IConfigureOptions<SiloOptions>,
IPostConfigureOptions<EndpointOptions>,
IConfigureOptions<KubernetesHostingOptions>
{
private readonly IServiceProvider _serviceProvider;
public ConfigureKubernetesHostingOptions(IServiceProvider serviceProvider)
{
_serviceProvider = serviceProvider;
}
public void Configure(KubernetesHostingOptions options)
{
options.Namespace ??= Environment.GetEnvironmentVariable(KubernetesHostingOptions.PodNamespaceEnvironmentVariable) ?? ReadNamespaceFromServiceAccount();
options.PodName ??= Environment.GetEnvironmentVariable(KubernetesHostingOptions.PodNameEnvironmentVariable) ?? Environment.MachineName;
options.PodIP ??= Environment.GetEnvironmentVariable(KubernetesHostingOptions.PodIPEnvironmentVariable);
}
public void Configure(ClusterOptions options)
{
var serviceIdEnvVar = Environment.GetEnvironmentVariable(KubernetesHostingOptions.ServiceIdEnvironmentVariable);
if (!string.IsNullOrWhiteSpace(serviceIdEnvVar))
{
options.ServiceId = serviceIdEnvVar;
}
var clusterIdEnvVar = Environment.GetEnvironmentVariable(KubernetesHostingOptions.ClusterIdEnvironmentVariable);
if (!string.IsNullOrWhiteSpace(clusterIdEnvVar))
{
options.ClusterId = clusterIdEnvVar;
}
}
public void Configure(SiloOptions options)
{
var hostingOptions = _serviceProvider.GetRequiredService<IOptions<KubernetesHostingOptions>>().Value;
if (!string.IsNullOrWhiteSpace(hostingOptions.PodName))
{
options.SiloName = hostingOptions.PodName;
}
}
public void PostConfigure(string? name, EndpointOptions options)
{
// Use PostConfigure to give the developer an opportunity to set SiloPort and GatewayPort using regular
// Configure methods without needing to worry about ordering with respect to the UseKubernetesHosting call.
if (options.AdvertisedIPAddress is null)
{
var hostingOptions = _serviceProvider.GetRequiredService<IOptions<KubernetesHostingOptions>>().Value;
IPAddress? podIp = null;
if (hostingOptions.PodIP is not null)
{
podIp = IPAddress.Parse(hostingOptions.PodIP);
}
else
{
var hostAddresses = Dns.GetHostAddresses(hostingOptions.PodName);
if (hostAddresses != null)
{
podIp = IPAddressSelector.PickIPAddress(hostAddresses);
}
}
if (podIp is not null)
{
options.AdvertisedIPAddress = podIp;
}
}
if (options.SiloListeningEndpoint is null)
{
options.SiloListeningEndpoint = new IPEndPoint(IPAddress.Any, options.SiloPort);
}
if (options.GatewayListeningEndpoint is null && options.GatewayPort > 0)
{
options.GatewayListeningEndpoint = new IPEndPoint(IPAddress.Any, options.GatewayPort);
}
}
private string? ReadNamespaceFromServiceAccount()
{
// Read the namespace from the pod's service account.
- 常量:环境变量与标签名(确保 YAML 和应用一致)
using k8s;
using System;
namespace Orleans.Hosting.Kubernetes
{
/// <summary>
/// Options for hosting in Kubernetes.
/// </summary>
public sealed class KubernetesHostingOptions
{
private readonly Lazy<KubernetesClientConfiguration> _clientConfiguration;
/// <summary>
/// The environment variable for specifying the Kubernetes namespace which all silos in this cluster belong to.
/// </summary>
public const string PodNamespaceEnvironmentVariable = "POD_NAMESPACE";
/// <summary>
/// The environment variable for specifying the name of the Kubernetes pod which this silo is executing in.
/// </summary>
public const string PodNameEnvironmentVariable = "POD_NAME";
/// <summary>
/// The environment variable for specifying the IP address of this pod.
/// </summary>
public const string PodIPEnvironmentVariable = "POD_IP";
/// <summary>
/// The environment variable for specifying <see cref="Orleans.Configuration.ClusterOptions.ClusterId"/>.
/// </summary>
public const string ClusterIdEnvironmentVariable = "ORLEANS_CLUSTER_ID";
/// <summary>
/// The environment variable for specifying <see cref="Orleans.Configuration.ClusterOptions.ServiceId"/>.
/// </summary>
public const string ServiceIdEnvironmentVariable = "ORLEANS_SERVICE_ID";
/// <summary>
/// The name of the <see cref="Orleans.Configuration.ClusterOptions.ServiceId"/> label on the pod.
/// </summary>
public const string ServiceIdLabel = "orleans/serviceId";
/// <summary>
/// The name of the <see cref="Orleans.Configuration.ClusterOptions.ClusterId"/> label on the pod.
/// </summary>
public const string ClusterIdLabel = "orleans/clusterId";
public KubernetesHostingOptions()
{
_clientConfiguration = new Lazy<KubernetesClientConfiguration>(() => this.GetClientConfiguration());
- 代理:启动期“对齐”与运行期“观察/标记/删除”
- 启动时:写回本 Pod 标签的 ServiceId/ClusterId,列举同标签 Pods,与 Orleans 成员对比,未匹配的活跃 Silo 标记为 Dead
- 运行时:选择 N 个活跃 Silo 作为 watcher(默认 2),监听 Pod 删除事件并将对应 Silo 标记为 Dead;可选地删除失效 Silo 对应 Pod(配置控制)
private async Task OnStart(CancellationToken cancellation)
{
var attempts = 0;
while (!cancellation.IsCancellationRequested)
{
try
{
await AddClusterOptionsToPodLabels(cancellation);
// Find the currently known cluster members first, before interrogating Kubernetes
await _clusterMembershipService.Refresh();
var snapshot = _clusterMembershipService.CurrentSnapshot.Members;
// Find the pods which correspond to this cluster
var pods = await _client.ListNamespacedPodAsync(
namespaceParameter: _podNamespace,
labelSelector: _podLabelSelector,
cancellationToken: cancellation);
var clusterPods = new HashSet<string> { _podName };
foreach (var pod in pods.Items)
{
clusterPods.Add(pod.Metadata.Name);
}
var known = new HashSet<string>();
var knownMap = new Dictionary<string, ClusterMember>();
known.Add(_podName);
foreach (var member in snapshot.Values)
{
if (member.Status == SiloStatus.Dead)
{
continue;
}
known.Add(member.Name);
knownMap[member.Name] = member;
}
var unknownPods = new List<string>(clusterPods.Except(known));
unknownPods.Sort();
foreach (var pod in unknownPods)
{
_logger.LogWarning("Pod {PodName} does not correspond to any known silos", pod);
// Delete the pod once it has been active long enough?
}
var unmatched = new List<string>(known.Except(clusterPods));
unmatched.Sort();
foreach (var pod in unmatched)
{
var siloAddress = knownMap[pod];
if (siloAddress.Status is not SiloStatus.Active)
{
continue;
}
_logger.LogWarning("Silo {SiloAddress} does not correspond to any known pod. Marking it as dead.", siloAddress);
await _clusterMembershipService.TryKill(siloAddress.SiloAddress);
}
break;
}
catch (HttpOperationException exception) when (exception.Response.StatusCode is System.Net.HttpStatusCode.Forbidden)
{
_logger.LogError(exception, $"Unable to monitor pods due to insufficient permissions. Ensure that this pod has an appropriate Kubernetes role binding. Here is an example role binding:\n{ExampleRoleBinding}");
}
catch (Exception exception)
{
_logger.LogError(exception, "Error while initializing Kubernetes cluster agent");
if (++attempts > _options.CurrentValue.MaxKubernetesApiRetryAttempts)
{
throw;
}
await Task.Delay(1000, cancellation);
}
}
// Start monitoring loop
ThreadPool.UnsafeQueueUserWorkItem(_ => _runTask = Task.WhenAll(Task.Run(MonitorOrleansClustering), Task.Run(MonitorKubernetesPods)), null);
}
private async Task MonitorOrleansClustering()
{
var previous = _clusterMembershipService.CurrentSnapshot;
while (!_shutdownToken.IsCancellationRequested)
{
try
{
await foreach (var update in _clusterMembershipService.MembershipUpdates.WithCancellation(_shutdownToken.Token))
{
// Determine which silos should be monitoring Kubernetes
var chosenSilos = _clusterMembershipService.CurrentSnapshot.Members.Values
.Where(s => s.Status == SiloStatus.Active)
.OrderBy(s => s.SiloAddress)
.Take(_options.CurrentValue.MaxAgents)
.ToList();
if (!_enableMonitoring && chosenSilos.Any(s => s.SiloAddress.Equals(_localSiloDetails.SiloAddress)))
{
_enableMonitoring = true;
_pauseMonitoringSemaphore.Release(1);
}
else if (_enableMonitoring)
{
_enableMonitoring = false;
}
if (_enableMonitoring && _options.CurrentValue.DeleteDefunctSiloPods)
{
var delta = update.CreateUpdate(previous);
foreach (var change in delta.Changes)
{
if (change.SiloAddress.Equals(_localSiloDetails.SiloAddress))
{
// Ignore all changes for this silo
continue;
}
if (change.Status == SiloStatus.Dead)
{
try
{
if (_logger.IsEnabled(LogLevel.Information))
{
_logger.LogInformation("Silo {SiloAddress} is dead, proceeding to delete the corresponding pod, {PodName}, in namespace {PodNamespace}", change.SiloAddress, change.Name, _podNamespace);
}
await _client.DeleteNamespacedPodAsync(change.Name, _podNamespace);
}
catch (Exception exception)
{
_logger.LogError(exception, "Error deleting pod {PodName} in namespace {PodNamespace} corresponding to defunct silo {SiloAddress}", change.Name, _podNamespace, change.SiloAddress);
}
}
}
}
previous = update;
}
}
catch (Exception exception) when (!(_shutdownToken.IsCancellationRequested && (exception is TaskCanceledException || exception is OperationCanceledException)))
{
if (_logger.IsEnabled(LogLevel.Debug))
await foreach (var (eventType, pod) in pods.WatchAsync<V1PodList, V1Pod>(_shutdownToken.Token))
{
if (!_enableMonitoring || _shutdownToken.IsCancellationRequested)
{
break;
}
if (string.Equals(pod.Metadata.Name, _podName, StringComparison.Ordinal))
{
// Never declare ourselves dead this way.
continue;
}
if (eventType == WatchEventType.Modified)
{
// TODO: Remember silo addresses for pods that are restarting/terminating
}
if (eventType == WatchEventType.Deleted)
{
if (this.TryMatchSilo(pod, out var member) && member.Status != SiloStatus.Dead)
{
if (_logger.IsEnabled(LogLevel.Information))
{
_logger.LogInformation("Declaring server {Silo} dead since its corresponding pod, {Pod}, has been deleted", member.SiloAddress, pod.Metadata.Name);
}
await _clusterMembershipService.TryKill(member.SiloAddress);
}
}
}
三、应用最小化示例(C#)
var builder = Host.CreateDefaultBuilder(args)
.UseOrleans(silo =>
{
// 启用 Kubernetes 托管(核心)
silo.UseKubernetesHosting();
// 必须选择一个 Clustering Provider(示例:Azure Storage)
silo.UseAzureStorageClustering(options =>
{
options.ConnectionString = Environment.GetEnvironmentVariable("STORAGE_CONNECTION_STRING");
});
// 端口(可选;缺省为 11111 / 30000)
silo.Configure<EndpointOptions>(opt =>
{
opt.SiloPort = 11111;
opt.GatewayPort = 30000;
});
});
await builder.RunConsoleAsync();
四、Kubernetes YAML 示例与解释
- Deployment(含标签/环境变量/端口/探针/优雅终止)
apiVersion: apps/v1
kind: Deployment
metadata:
name: orleans-dictionary-app
labels:
app: orleans-dictionary-app
orleans/serviceId: dictionary-app
spec:
replicas: 3
selector:
matchLabels:
app: orleans-dictionary-app
template:
metadata:
labels:
app: orleans-dictionary-app
orleans/serviceId: dictionary-app
orleans/clusterId: dictionary-app
spec:
serviceAccountName: default
automountServiceAccountToken: true
containers:
- name: silo
image: my-registry.azurecr.io/my-orleans-app:latest
imagePullPolicy: Always
ports:
- name: silo
containerPort: 11111
- name: gateway
containerPort: 30000
env:
- name: ORLEANS_SERVICE_ID
valueFrom:
fieldRef:
fieldPath: metadata.labels['orleans/serviceId']
- name: ORLEANS_CLUSTER_ID
valueFrom:
fieldRef:
fieldPath: metadata.labels['orleans/clusterId']
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: STORAGE_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: az-storage-acct
key: key
- name: DOTNET_SHUTDOWNTIMEOUTSECONDS
value: "120"
# 探针建议:轻量本地检查(与 Orleans 成员探测互补)
livenessProbe:
tcpSocket:
port: silo
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
tcpSocket:
port: silo
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 6
resources:
requests:
cpu: "200m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
terminationGracePeriodSeconds: 180
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
minReadySeconds: 60
- RBAC(允许 list/watch/delete/patch Pods,供代理使用)
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: orleans-hosting
rules:
- apiGroups: [ "" ]
resources: ["pods"]
verbs: ["get", "watch", "list", "delete", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: orleans-hosting-binding
subjects:
- kind: ServiceAccount
name: default
apiGroup: ''
roleRef:
kind: Role
name: orleans-hosting
apiGroup: ''
- Service(Silo 端口集群内可达,Gateway 端口对客户端暴露)
apiVersion: v1
kind: Service
metadata:
name: orleans-silo
spec:
selector:
app: orleans-dictionary-app
ports:
- name: silo
port: 11111
targetPort: 11111
clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
name: orleans-gateway
spec:
type: LoadBalancer
selector:
app: orleans-dictionary-app
ports:
- name: gateway
port: 30000
targetPort: 30000
解释要点:
- 标签
orleans/serviceId与orleans/clusterId必须与应用一致(配置通过 env 注入到ClusterOptions)。 - 环境变量
POD_NAME/POD_NAMESPACE/POD_IP用于设置SiloName与AdvertisedIPAddress等。 - 探针以本地 TCP 检查为宜(不做跨 Pod 功能校验),与 Orleans 成员失效探测互补。
- 需要 RBAC 权限,避免代理在启动期或运行期访问 K8s API 遭遇 403。
五、时序图
- 启动期:对齐标签与成员、标记失配 Silo 为 Dead
- 运行期:选择 watchers 监听 K8s;Pod 删除触发 Silo Dead;可选删除失效 Pod
六、常见问题与排查
-
报错:
KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined- 进入 Pod 检查是否存在:
kubectl exec -it <pod> -- printenv | findstr KUBERNETES_SERVICE_ - 确保
automountServiceAccountToken: true且绑定了有权限的 ServiceAccount(见上文 RBAC) - 参考:learn.microsoft.com - Orleans Kubernetes hosting
- 进入 Pod 检查是否存在:
-
Silo 名称与 Pod 名称要一致(由
POD_NAME注入)。端口默认为11111/30000,如自定义请在应用中配置EndpointOptions。 -
未配置 Clustering Provider 时 Silo 无法加入集群:请在
UseKubernetesHosting()同时配置任意一个 Provider(Azure/ADO.NET/Consul/…)。
七、最小化落地步骤
- 在应用中启用
UseKubernetesHosting()并配置任一 Clustering Provider。 - 打包镜像并推送至镜像仓库。
- 创建集群 Secret(如
az-storage-acct)存放 Clustering 连接串。 - 应用本文示例 Deployment、RBAC、Service 清单。
- 验证:
- Pod 上标签/环境变量齐全;
- 日志显示
AdvertisedIPAddress为POD_IP; - 多副本时可互相发现,删除某 Pod 会将对应 Silo 标记为 Dead;
- 探针通过,滚动升级不中断。
引用:
776

被折叠的 条评论
为什么被折叠?



