政府软件变革 云原生_“变革成本”软件系统

政府软件变革 云原生

In all humility, I propose the following question not just for the so-called technologists, but for management as well

在所有谦卑中,我不仅向所谓的技术人员提出以下问题,还向管理人员提出以下问题

“How easy is it to make changes to a system we operate?”

“对我们运营的系统进行更改有多容易?”

变更成本 (Cost to change)

I haven’t looked for formal measures of “cost to change a software system”. Nevertheless, there are many parts to it, including psychological factors, effort, time, the probability of success, and so on. I will defer a more detailed discussion on these parts to a later post.

我没有在寻找“改变软件系统的成本”的正式措施。 但是,它有很多部分,包括心理因素,努力,时间,成功的可能性等等。 我将在后面的部分中对这些部分进行更详细的讨论。

The “cost to change” is low, for example, when we do not hesitate to make changes, we can predict the outcome of a change with confidence, changes are both made or reversed quickly, and we are not jumping through hoops just to be allowed to make changes. Conversely, the “cost to change” is high, when we dread making a change, we can never be sure what the change will result in, changes can only take place at intervals of many days or months, and we must worship at many an altar to appease the overlords of change.

“改变的成本”很低,例如,当我们毫不犹豫地进行更改时,我们可以信心十足地预测更改的结果,更改可以快速完成或撤消,而我们并没有为了实现目标而跳槽。允许进行更改。 相反,“改变的成本”很高,当我们害怕做出改变时,我们永远无法确定改变会导致什么,改变只能间隔数天或数月,并且我们必须多次崇拜祭坛安抚变化的霸主。

We can consider some general aspects of how changes are made, to help us form an opinion of the “cost to change”:

我们可以考虑如何进行更改的一些一般方面,以帮助我们形成“更改成本”的观点:

  1. Traceable: Whether the origins of change can be established directly?

    可追溯 :变更的根源是否可以直接建立?

  2. Discriminating between changes: Changes can be considered to impact a system in three different ways, namely

    区分更改 :可以考虑更改以三种不同方式影响系统,即

  • “Functional changes”: These impact functionality for users.

    “功能更改”:这些影响用户的功能。
  • “Configuration changes”: These are changes to configuration that may or may not modify functionality.

    “配置更改”:这些配置更改可能会或可能不会更改功能。
  • “Non-functional changes”: These are changes that no user cares about. For example, changes to versions of software libraries or runtimes.

    “非功能性更改”:这些是用户不关心的更改。 例如,更改软件库或运行时的版本。

3. Uptime: Is it possible to make changes without disrupting users?

3. 正常运行时间 :是否可以在不中断用户的情况下进行更改?

4. Rollback/Repeat: The ease with which one can reverse the effect of any change that has been made, or repeat it; in an “atomic” (all-or-nothing) fashion.

4. 回滚/重复 :可以轻松地使已进行的任何更改或重复进行的效果反转; 以“原子”(全有或全无)的方式。

5. Recreate: The ease with which one can reproduce the operating environment for the system, in the event that it is damaged or lost for any reason.

5. 重新创建 :在由于任何原因导致系统损坏或丢失的情况下,可以轻松重现系统的操作环境。

6. Plasticity: The ease with which the team that owns the system can make a change.

6. 可塑性 :拥有系统的团队可以轻松进行更改。

7. Bureaucracy: Is there needless ceremony around introducing any change?

7. 官僚主义 :是否有引入任何变化的不必要的仪式?

I now submit the most important proposal of this post:

我现在提交这篇文章中最重要的建议:

“Our primary responsibility to our customers, (when operating a technology business) is to ensure that the cost to change for all of the systems we operate; individually, and in concert, is close to ZERO, at all times”.

“(在经营技术业务时)我们对客户的主要责任是确保我们运营的所有系统的成本都会发生变化; 无论何时,无论是个人还是音乐会都接近零。”

I have deliberately introduced the words “in concert” for good reason. We deliver value to customers by bringing together multiple collaborating systems, and some of these are not as easy to change as the rest. This also extends across our business relationships, including partners as well as suppliers. In other words,

我有充分的理由特意介绍了“一致”一词。 我们通过整合多个协作系统为客户创造价值,其中一些不像其他系统那么容易更改。 这还涉及我们的业务关系,包括合作伙伴和供应商。 换一种说法,

“Our ability to make changes to serve our customers better is ultimately limited by the one system that is the most difficult to change (regardless of whether it is operated by us, or our business partners).”

“我们做出改变以更好地为客户服务的能力最终受到最难以改变的一个系统的限制(无论它是由我们运营还是由我们的业务合作伙伴运营)。”

I will now comment briefly on each of these aspects.

我现在将对这些方面中的每一个进行简要评论。

“变革成本”的各个方面 (Aspects of “cost to change”)

可追溯性 (Traceability)

All of the different elements that come together to define the behaviour of the system must be tracked in version control, and changed through a workflow (see trunk-based development) that allows for concurrent progress on different streams of change, as well as isolation between them.

必须在版本控制中跟踪定义系统行为的所有不同元素,并通过工作流进行更改(请参阅基于主干的开发 ),该工作流允许在不同的更改流上并发进行进度,以及之间的隔离他们。

区分变更 (Discriminating between changes)

We must engineer systems so we can introduce functional changes in a way that users can adopt change gradually, with a minimum of surprise or fuss. It should also be possible to introduce configuration changes without requiring users and systems to perform the same actions, as when systems are launched for the first time; and in ways that do not require guessing whether the configuration changes have taken effect uniformly.

我们必须设计系统,以便我们引入功能更改,使用户可以逐渐采用更改,而不会感到惊讶或大惊小怪。 首次用户启动系统时,也可以在不要求用户和系统执行相同操作的情况下进行配置更改。 并且无需猜测配置更改是否已统一生效。

The category of “non-functional” changes deserves more attention than it gets. The number of “non-functional” changes outnumber the number of functional and configuration changes by far; but almost everybody (including technologists) tends to underestimate this category. In fact, while one might somehow artificially limit the number of functional and configuration changes, the non-functional changes are frequent and inevitable; due to the obsolescence of software, the uncovering of issues and vulnerabilities, and component software libraries and services reaching end-of-life.

“非功能性”更改类别应引起更多关注。 到目前为止,“非功能”变更的数量超过了功能变更和配置变更的数量; 但是几乎每个人(包括技术人员)都倾向于低估这一类别。 实际上,尽管可以通过某种方式人为地限制功能和配置更改的数量,但非功能更改却是常见且不可避免的。 由于软件的过时,问题和漏洞的发现以及组件软件库和服务的寿命终止,这些信息已经被淘汰。

正常运行时间 (Uptime)

We must deliberately engineer systems so changes can be introduced with zero disruption to users. At the other end of the spectrum are systems that are impossible to change without first asking users to stop using the system for some period of time!

我们必须刻意设计系统,以便可以在不中断用户的情况下引入更改。 另一方面,如果不首先要求用户在一段时间内停止使用系统,就无法更改系统!

回滚/重复 (Rollback/Repeat)

When it is difficult to rollback a particular change, we tend to put off changes! We dread making the change, and we may get through to the other side of the change either as tragic heroes, or as unfortunate martyrs; suffering burnout just from the act of making a change. (In particular, I recall a torrid weekend of ‘Memorial Day’ release, with many of us cooped up high above Times Square at investment bank ‘X’, awaiting our turns, each lasting just a few minutes. It was about as much fun as being in a hurricane at sea).

当难以回滚特定更改时,我们倾向于推迟更改! 我们害怕做出改变,我们可能会成为悲剧英雄或不幸的烈士而进入变革的另一端。 仅仅因为做出改变而遭受倦怠。 (特别是,我回想起“纪念日”发布的一个炎热的周末,我们许多人在投资银行“ X”的时代广场上方高高地等待着我们,每个轮候仅需几分钟。就像在海上飓风中一样)。

The mechanism for making changes must be repeatable, and therefore, we should automate it. One of the subtle qualities we should aim for is that the mechanism that applies changes should do so in a fashion that is “idempotent”, i.e., it does not matter whether some operation is attempted one time or many.

进行更改的机制必须是可重复的,因此,我们应该使其自动化。 我们应该追求的微妙品质之一是,应用更改的机制应该以“幂等”的方式进行更改,即,一次或多次尝试操作都没有关系。

重新建立 (Recreate)

When we have little confidence that we can recreate a system, it reflects that we may have lost sight of some elements of how the system was put together. In actual fact, there are very few teams that are confident they can recreate the production environment for their users within a reasonable Recovery Time Objective. Many BCP/DR (business continuity/disaster recovery plans) exercises are unlikely to be tested in entirety until disaster truly strikes, at which point, it might be too late.

当我们对可以重新创建系统的信心不足时,它反映出我们可能对系统组合方式的某些要素视而不见。 事实上,很少有团队,有信心,他们可以合理的[R ecovery 定时ØBJECTIVE内重新为他们的用户生产环境。 在灾难真正爆发之前,许多BCP / DR (业务连续性/灾难恢复计划)活动不太可能全部进行测试,此时可能为时已晚。

The ease with which a system can be recreated is, in fact, a consequence over time of the related aspect of “Rollback/Repeat”. A system that has been uniformly built up in a disciplined way over time with a sequence of changes that can each be applied or rolled back, is also likely to be one that can be easily recreated.

实际上,重新创建系统的难易程度是“回滚/重复”相关方面随时间推移的结果。 随着时间的流逝,以一种有纪律的方式统一建立的,具有一系列可以应用或回滚的更改的系统,也很可能是易于重新创建的系统。

可塑性 (Plasticity)

Everyone that is involved with a system soon forms an opinion of how easy or difficult it is to introduce any kind of change. Good practices and processes (such as Test Driven Development and Continuous Integration/Continuous Deployment) improve the ease of change, bad practices make change harder.

很快,与系统相关的每个人都会对引入任何类型的变更有多难。 良好做法和流程(如T EST d里文d才有发展和C ontinuous ntegration / C ontinuous d eployment)提高易变化的,不好的做法做出改变更难。

However, the more insidious effect is that the human perception of how easy or difficult it is to make changes acts powerfully in both directions: we hesitate to make changes when we perceive that it is difficult to do so (and vice-versa!). This acts as a double negative: we tend to not undertake the non-functional improvements that can make it easier to change a system, when it is already difficult to change!

但是,更隐蔽的效果是,人类对进行更改的难易程度的感知在两个方向上都表现出强大的作用:当我们意识到难以更改时,我们会犹豫进行更改(反之亦然!)。 这具有双重负面影响:我们往往不会进行非功能性的改进,以便在已经很难进行更改的情况下更轻松地更改系统!

Here’s an anecdote: my team decided (on a hackathon) to demonstrate the use of Continuous Integration on a collaborating system, that was already difficult to make changes to. Many months after the hackathon, the team that owns the system is yet to put this into practice.

这是一个轶事:我的团队决定(在黑客马拉松上)演示在协作系统上使用持续集成的情况,该系统已经很难进行更改。 黑客马拉松后的几个月,拥有该系统的团队尚未将其付诸实践。

官僚 (Bureaucracy)

As with all bureaucracies, needless human controls around change perpetuate their power by saying “NO” to change more often than saying “Yes” and may have been incentivised to err on the side of caution. It is preferable instead that the mechanisms for change are engineered to automatically provide fast and actionable feedback that teams can learn from, and thereby, regulate themselves.

与所有官僚机构一样,对变革的不必要的人为控制使他们的权力永久化,他们说“否”比说“是”更频繁地进行变革,并且可能会出于谨慎的考虑而被鼓励犯错。 相反,最好将变更机制设计为自动提供快速可行的反馈,以便团队可以从中学习并进行自我调节。

关于“变革成本”的一些建议 (Some proposals about the “cost to change”)

Here’s the next proposal that I submit to you:

这是我提交给您的下一个建议:

“Traditionally, we look at the functional and non-functional characteristics (with respect to the corresponding requirements) for any software system. However, we must grant the same eminence to a third element: the characteristics concerning how easy it is to CHANGE the system.”

“传统上,我们着眼于任何软件系统的功能和非功能特性(相对于相应要求)。 但是,我们必须对第三个要素给予同样的重视:有关更改系统的难易程度的特征。”

In the absence of this third “dynamic” element, the other two elements are merely snapshots at a given point in time, and communicate a limited picture of any system, i.e. we wish to understand and communicate both aspects:

在没有第三个“动态”元素的情况下,其他两个元素仅仅是给定时间点的快照,它们传达了任何系统的局限性,即我们希望理解和传达这两个方面:

  1. How well does it do what it is expected to do now?

    它目前的预期效果如何?
  2. How easy is it to change it, so it can continue to do what it will be expected to do, now and in the future?

    更改它有多容易,以便它可以现在和将来继续做预期的工作?

I believe this should be considered as an independent element, (and not subsumed, say, under non-functional characteristics), as both functional and non-functional characteristics can be impacted only by making changes!

我认为这应该被视为一个独立的要素(并且不应该被归类为非功能性特征),因为只有通过进行更改才能影响功能性和非功能性特征!

My experience says even many technologists mistakenly believe that if one keeps the first element unchanged (i.e., “we do not change what the system is expected to do”), they can pretend it is not necessary to change the system.

我的经验表明,甚至许多技术人员都错误地认为,如果保持第一个要素不变(即“我们不会更改系统预期的功能”),他们会假装没有必要更改系统。

I am inspired by the second law of Thermodynamics (“The entropy of a closed system only increases over time”) to propose Krishnan’s second law of software inertia™ :

我从热力学第二定律(“封闭系统的熵仅随时间增加”)中受到启发,提出了克里希南的软件惯性第二定律 ™:

“When neglected, the ease with which changes can be made to a system worsens over time.”

“被忽略时,对系统进行更改的难易程度会随着时间的流逝而恶化。”

I submit that only a limited aspect of this law has been recognised under the umbrella of “Technical Debt”. I propose instead that we constantly measure our ability to make changes to the systems we operate. We should also act to improve the “cost to change”; in much the same way that we seek to improve the functional and non-functional characteristics of the system.

我认为,在“技术债务”的保护下,仅承认该法律的有限方面。 相反,我建议我们不断衡量我们对所运行系统进行更改的能力。 我们还应采取行动来改善“变革成本”; 与我们寻求改善系统功能和非功能特性的方式几乎相同。

I propose more questions we should consider as corollaries of the above:

我提出了以上问题的更多问题,我们应该考虑:

  1. How easy is it to change each of the systems we operate, both independently and taken together?

    改变我们独立运行或一起运行的每个系统有多容易?
  2. Have we recorded and tracked this “cost to change” over time? (so as to arrest a slide)

    我们是否记录并跟踪了这种“成本变化”? (以逮捕一张幻灯片)
  3. When operating with business partners and suppliers (or indeed, when acquiring new systems), are the collaborating systems easy to change? Have we tracked this over time?

    与业务合作伙伴和供应商合作时(或者实际上,在购买新系统时),协作系统是否容易更改? 我们是否一直跟踪这一情况?

This brings me to the conclusion of this post: Some of the systems we operate may have aged, and are probably difficult to change. We may have concluded that it is more expensive (in the short-term) to replace these systems, than to keep them running as-is. Once these assessments turn unfavourable, they are unlikely to magically turn favourable at a later date (thanks to Krishnan’s second law). But, the true cost to change, at a more holistic level and in the longer term, is that this will ultimately limit the speed at which we can adapt to meet the changing needs of our customers; and this costs us a lot more over time!

这使我得出本文的结论:我们运行的某些系统可能已经老化,并且可能难以更改。 我们可能已经得出结论,在短期内更换这些系统比使它们保持原样运行更为昂贵。 一旦这些评估变得不利,它们就不太可能在以后魔术般地变得有利(由于克里希南的第二定律)。 但是,从长远来看,更全面的变化真正代价是,这最终将限制我们适应客户不断变化的需求的速度。 随着时间的流逝,这使我们付出了更多!

结语 (Epilogue)

As I add finishing touches to this post, I watch with bated breath the situation unfold at Garmin, with a protracted and widespread outage that began on 23rd July. If rumours are to be believed, Garmin suffered a ransomware attack. In such an event, the only safe strategy is to painstakingly re-provision everything from scratch, and restore clean backups, taking care to ensure that these are not similarly compromised. This is a very difficult situation to be in, but it is one in which Garmin will be dealing with the “cost to change” for pretty much everything they operate!

在我为帖子添加结尾处时,我屏息地看着Garmin的情况,从7月23日开始的旷日持久的停运 。 如果可以相信谣言 ,Garmin遭受了勒索软件攻击。 在这种情况下,唯一安全的策略是从头开始精心分配所有内容,并还原干净的备份,并确保不会受到类似损害。 这是一个非常困难的情况,但在这种情况下, Garmin几乎要为他们所经营的所有工作处理“变革成本”!

翻译自: https://medium.com/swlh/the-cost-to-change-software-systems-4cffe05287c9

政府软件变革 云原生

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值