chatops
Access control is a key component of data security. In simple terms, access control means regulating who has the ability to access resources in a computing environment. At Policygenius, we implemented an access control policy around our Google Cloud resources following the principle of least privilege.
访问控制是数据安全性的关键组成部分。 简而言之,访问控制意味着调节谁有能力访问计算环境中的资源。 在Policygenius,我们遵循最小特权原则围绕Google Cloud资源实施了访问控制策略。
The principle of least privilege promotes minimal user privileges on computing resources, based on users’ job necessities. Ideally, each user should have the least authority necessary to perform their duties. This helps reduce the “attack surface” of the computing resources by eliminating unnecessary privileges that can result in network exploits and system compromises.
最小特权原则基于用户的工作需要,在计算资源上促进了最小的用户特权。 理想情况下,每个用户应具有执行其职责所需的最少权限。 通过消除可能导致网络利用和系统受损的不必要特权,这有助于减少计算资源的“攻击面”。
权衡我们的选择 (Weighing our options)
The main requirement was to have an approval workflow where an engineer would only be able to access the Google Cloud Platform (GCP) resources after management approval and only for a limited amount of time. Additionally, we wanted to log all of the related activity and store them for auditability.
主要要求是要有一个批准工作流,工程师在经过管理批准后,工程师只能在有限的时间内访问Google Cloud Platform(GCP)资源。 此外,我们希望记录所有相关活动并将其存储以供审核。
One potential solution was using an emergency access account aka break glass account. We looked into HashiCorp Vault’s open-source solution to safeguard the shared account’s password. This solution does not offer an approval workflow, one of the key requirements of our endeavor. Also, a common account would make it challenging to trace actions back to an actual user.
一种可能的解决方案是使用紧急访问帐户(也称为破玻璃帐户)。 我们研究了HashiCorp Vault的开源解决方案,以保护共享帐户的密码。 该解决方案不提供批准工作流程,这是我们努力的关键要求之一。 同样,普通帐户会使将操作追溯到实际用户变得颇具挑战性。
Another suitable option was Gimme, an open-source access control solution developed by Spotify. Google recently released a feature called IAM Conditions (beta at the time) which gives us the capability to provision access that automatically expires after a specific amount of time. Gimme uses this IAM feature to limit access to Google Cloud resources for a specific duration. It has an approval workflow, but it is much more difficult for the approving manager than just clicking an approve/deny button. The code needed significant customization and modification to accommodate all our requirements. Also, the Github repository was later archived.
另一个合适的选择是Spotify开发的开源访问控制解决方案Gimme 。 Google最近发布了一项称为IAM条件 (当时为Beta)的功能,该功能使我们能够提供在特定时间段后自动失效的访问权限。 Gimme使用此IAM功能来限制特定时间段内对Google Cloud资源的访问。 它具有批准工作流程,但是对于批准经理而言,比单击批准/拒绝按钮要困难得多。 该代码需要进行大量的自定义和修改,以适应我们的所有要求。 另外,Github存储库后来也被归档了。
There were multiple other full blown access control solution offerings we looked into. Most of them lacked ease of integration with GCP’s access management and were costly. Although some of these tools were really great (we almost ended up using Gimme), we decided to write our own tool primarily because it can be customized to meet our exact requirements.
我们研究了其他多种功能完善的访问控制解决方案。 他们中的大多数人都缺乏与GCP访问管理集成的便利性,而且价格昂贵。 尽管其中一些工具确实很棒(我们几乎最终使用了Gimme),但我们决定编写自己的工具主要是因为可以对其进行定制以满足我们的确切要求。
Based on a ChatOps project built during our hackathon, Geniusbot, we decided to approach the problem from a different perspective. Since managers wanted the ability to quickly approve the requests even after hours, we decided the most user-friendly solution would be a slackbot. ChatOps is conversation-driven development where developers can type a command into a chat room and a chatbot is configured to execute these commands through custom scripts.
根据我们在黑客马拉松(Geniusbot)期间建立的ChatOps项目 ,我们决定从不同的角度来解决这个问题。 由于经理们希望即使下班后也能快速批准请求,因此我们决定最人性化的解决方案是slackbot。 ChatOps是对话驱动的开发,开发人员可以在聊天室中键入命令,并且将聊天机器人配置为通过自定义脚本执行这些命令。
Our ChatOps tool consists of three main components:
我们的ChatOps工具包含三个主要组件:
- Slack user interface for users and approvers. 用户和审批者的松弛用户界面。
- Google Cloud Functions written in Python for the backend. 后端使用Python编写的Google Cloud Functions。
- GCP’s IAM Conditions to handle access provisioning. GCP的IAM条件,以处理访问权限设置。
Similar to the Gimme tool, we are using IAM Conditions to automatically expire provisioned access after a set amount of time. It ensures that an engineer has least privileges required to perform their duty at a given of point. This is just one of the features that we are using. IAM Conditions has other capabilities such as setting up access schedules and resource based access provisioning.
与Gimme工具类似,我们使用IAM条件在设定的时间后自动使预配置的访问权过期。 它确保工程师在给定的时间点具有执行任务所需的最少特权。 这只是我们正在使用的功能之一。 IAM条件还具有其他功能,例如设置访问时间表和基于资源的访问设置。
工作流程 (Workflow)
User initiates a request in the slack channel by typing a simple command. A message with request details is sent to the channel with the on-call manager tagged to it. They can either Approve or Deny the request.
用户通过键入一个简单命令在备用通道中发起请求。 带有请求详细信息的消息将发送给带有已标记的呼叫管理器的消息。 他们可以批准或拒绝请求。
If denied, the workflow stops and the user is notified in the Slack channel. If approved, a browser window opens prompting the approver to authenticate using their Google credentials. On successful authentication, the process is complete and the user is notified in the slack channel.
如果被拒绝,工作流将停止,并在Slack频道中通知用户。 如果获得批准,则会打开一个浏览器窗口,提示批准者使用其Google凭据进行身份验证。 认证成功后,该过程完成,并在松弛通道中通知用户。

By default the access is valid for 60 minutes, but a user can request for more or less time by passing a parameter in their slack command.
默认情况下,访问有效期为60分钟,但用户可以通过在其slack命令中传递参数来请求更多或更少的时间。
建筑 (Architecture)
To better understand the architecture, we will split it up in smaller parts.
为了更好地理解体系结构,我们将其分成较小的部分。
Slack has a feature called “slash commands”; these slash commands trigger HTTP endpoints, the Request Handler cloud function endpoint in our application. This cloud function handles all of the incoming requests. It generates a slack response message which includes details about the request as well as “Approve” and “Deny” buttons. The function also makes a request to PagerDuty to identify the manager on call and tags the manager along with the request.
Slack具有一个称为“ 斜线命令 ”的功能; 这些斜杠命令触发HTTP端点,即应用程序中的“ 请求处理程序”云功能端点。 该云功能处理所有传入的请求。 它会生成一条松弛响应消息,其中包括有关请求以及“批准”和“拒绝”按钮的详细信息。 该功能还向PagerDuty发出请求,以识别正在通话的管理员,并随请求一起标记管理员。

The approving manager then looks into the request and either approves or denies it. This triggers the Acknowledger function which acknowledges the action via a slack message. The workflow stops if the request is rejected. If approved, an authentication page opens up in a web browser. If authentication is successful, it checks if the approver is authorized to approve the request based on their IAM privileges. This step covers both authentication and authorization of the approver. Google Key Management Service is used to decrypt URL parameters embedded in slack buttons.
然后,审批管理器调查该请求,然后批准或拒绝该请求。 这将触发Acknowledger函数,该函数通过一条松弛消息来确认操作。 如果请求被拒绝,工作流将停止。 如果获得批准,则将在Web浏览器中打开一个身份验证页面。 如果身份验证成功,它将根据其IAM特权检查批准者是否有权批准该请求。 此步骤涵盖批准者的身份验证和授权。 Google密钥管理服务用于解密松弛按钮中嵌入的URL参数。
Once both Authentication and Authorization pass, the Provisioner function creates the IAM condition based on the request. The authentication token of the approver is used to push changes to the IAM policy for the Google Cloud project.
身份验证和授权都通过后, Provisioner函数将根据请求创建IAM条件。 批准者的身份验证令牌用于将更改推送到Google Cloud项目的IAM策略。

For auditability, we log details about processed requests, requesters and approvers. Logs also provide abundant useful information when it comes to debugging an application. Cloud Function logging is an out-of-box feature in GCP. Our Python cloud functions log all of their activity in this logging system. We also have a log sink setup which continuously copies all of these logs in a storage bucket for long term storage.
为了便于审核,我们记录了有关已处理请求,请求者和批准者的详细信息。 在调试应用程序时,日志还提供了大量有用的信息。 云功能日志记录是GCP中的现成功能。 我们的Python云功能会将所有活动记录在此日志记录系统中。 我们还有一个日志接收器设置,可将所有这些日志连续复制到存储桶中以进行长期存储。

Below we have a complete architecture diagram after combining all the parts.
合并所有部分后,下面有完整的体系结构图。

结论 (Conclusion)
After implementing this tool, we are now able to enforce the principle of least privilege. When required, access can be provisioned for a limited duration after going through a management approval workflow. Importantly, we now have the ability to audit who has access to production resources on a given timestamp.
实施此工具后,我们现在可以执行最低特权原则。 在需要时,可以在经过管理批准工作流后的有限时间内设置访问权限。 重要的是,我们现在能够审计在给定的时间戳上谁可以访问生产资源。
下一步是什么? (What’s next?)
After launching this ChatOps command, we have also expanded this tool to other distinct slack commands. Each command identifies which IAM role a user is requesting access for. We are working on expanding this tool for other resources and IAM roles within our Google Cloud setup.
启动此ChatOps命令后,我们还将此工具扩展为其他不同的Slack命令。 每个命令都标识用户请求访问哪个IAM角色。 我们正在努力将该工具扩展到Google Cloud设置中的其他资源和IAM角色。
我们正在成长! (We’re growing!)
At Policygenius Engineering, we work on solving challenging problems with innovative approaches. If you are interested in the type of work we do and wish to experience our highly collaborative culture, check out our careers page!
在Policygenius Engineering,我们致力于通过创新方法解决具有挑战性的问题。 如果您对我们的工作类型感兴趣并且希望体验我们高度协作的文化,请查看我们的职业页面!
翻译自: https://medium.com/policygenius-stories/chatops-for-production-access-control-b4feafbe9449
chatops