讲讲集群情况下的session机制

服务器集群原理与挑战
本文探讨了服务器集群的概念及其发展历程,分析了为何需要进行服务器集群,并深入讨论了集群中的会话管理难题。此外,还介绍了高可用性和负载均衡集群的特点。

1.集群的历史
集群,英文叫cluster,是一个老话题了,使用yahoo.cn, baidu.com和g.cn都没有搜索到究竟集群这个概念是从什么开始的,后来发现了一篇英文文章http: //www.domaingurus.com/faqs/what-is-a-server-cluster.html。
2.为什么要做服务器做集群?
通常情况下,我们的应用都不需要做集群,但随着访问量的加大,一台服务器无法支撑,无法做出快速的响应,而且在这种访问压力下,有可能会压垮服务器。这种 情况下,我们需要增加服务器、分发请求给各个服务器,但必须保证在客户看来是在访问一个服务器而不是多个。下面引用了袁红岗的一段采访记录,应该说比较有 说服力:

    “在以下两种情况下集群是有用的:1. 高并发超负荷运行的主机,例如google这样的网站,它的访问量是相当大的,因此google会采取集群策略来分散客户的请求,以提高整体响应能力。我 们接触的很多J2EE应用负荷量都不大,其实每秒访问量在500以下的应用都没有必要采取集群策略。2. 失效转移,其实我认为这才是集群真正有用的地方,使用一台低成本计算设备作为主设备的备份,在主设备发生故障时及时接替,以保证7×24小时不间断服务。 综上所述,在准备采用集群之前,一定要仔细分析具体的应用环境,以避免不必要的浪费。”

3.服务器集群的难点
作为每个web应用,实现会话是最基本的要求。我们都知道用户在访问服务器的时候,服务器会为每个用户生成一个唯一的session,当如果下次这个用户 的请求被分发到集群中的另一台的时候,那台服务器也必须要重新创建一个新的session,导致用户前面的session信息丢失。因此,集群的一个重要 难点在于如何保证这些集群服务器使用的session都是该用户的同一个session。

  1. 多台集群服务器之间互相复制session信息
  2. 一台服务器存放session信息,由集群服务器读取
  3. 将session信息保存于客服端cookie,节省session复制的开销

可以想到,1和2其实都有很大的开销,而3则是一个不错的选择,当然有一个极大的缺陷:无法保持多个请求之间的状态信息,而只能保存一些最基本的、 经常使用的信息。在《J2EE核心模式》一书中是这样评价的:这个策略真正解决了在实现跨越物理机器的负载平衡的情况下跨越多个服务器的状态复制问题;但 在保存大量状态时,会引起执行性能的下降,因为所有的会话状态都伴随着跨越网络的每个请求和应答,同样,在客户端保存会话状态的时候,也有大小和类型的约 束。

服务器集群的定义

 

Server Cluster Definition

原文URL: http://www.domaingurus.com/faqs/what-is-a-server-cluster.html

What is a Cluster?

A cluster is the aggregation of multiple stand-alone computers linked together by software and networking technologies to create a unified system.

Clusters are typically categorized into 2 general types:

  • High Performance Computing (HPC), made up of markets traditionally serviced by supercomputers for applications requiring greater computational power than a single computer can provide; or
  • Enterprise or High Availability (HA) with automatic failover, load balancing, redundancy, and other features that provide high reliability for the data center. Many HPC clusters also incorporate some of the features of HA clusters.

Application requirements vary between and within each of these system types. For this reason it’s imperative that you choose a cluster partner that understands the intricacies of cluster design and can help you avoid the pitfalls of cluster deployment.

High-availability (HA) clusters

High-availability clusters are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant nodes, which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to manage the redundancy inherent in a cluster to eliminate single points of failure. There are many commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux OS

Load-balancing clusters

Load-balancing clusters operate by having all workload come through one or more load-balancing front ends, which then distribute it to a collection of back end servers. Although they are primarily implemented for improved performance, they commonly include high-availability features as well. Such a cluster of computers is sometimes referred to as a server farm. If this packages does not meet your needs, or if you need help determining how best to utilize the power of a dedicated server cluster, complete our advanced sever cluster questionnaire and our sales team will prepare a custom hosting quote just for you.

High Performance Computing (HPC) Clusters

Linux clusters are democratizing supercomputing for engineers, scientists, and researchers whose work demands the highest levels of computational analysis, modeling, and simulations. The Customers section of this site explains how innovative teams in a variety of industries are using HPC clusters to help speed up product development and groundbreaking research.

HPC clusters are optimized for workloads which require jobs or processes happening on the separate cluster computer nodes to communicate actively during the computation. These include computations where intermediate results from one node’s calculations will affect future calculations on other nodes.

Cluster history

The history of cluster computing is best captured by a footnote in Greg Pfister’s In Search of Clusters: “Virtually every press release from DEC mentioning clusters says ‘DEC, who invented clusters…’. IBM didn’t invent them either. Customers invented clusters, as soon as they couldn’t fit all their work on one computer, or needed a backup. The date of the first is unknown, but I’d be surprised if it wasn’t in the 1960′s, or even late 1950′s.”

The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl’s Law. Amdahl’s Law describes mathematically the speedup one can expect from parallelizing any given otherwise serially performed task on a parallel architecture. This article defined the engineering basis for both multiprocessor computing and cluster computing, where the primary differentiator is whether or not the interprocessor communications are supported “inside” the computer (on for example a customized internal communications bus or network) or “outside” the computer on a commodity network.

Consequently the history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivation for the development of a network was to link computing resources, creating a de facto computer cluster. Packet switching networks were conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched network, the ARPANET project succeeded in creating in 1969 what was arguably the world’s first commodity-network based computer cluster by linking four different computer centers (each of which was something of a “cluster” in its own right, but probably not a commodity cluster). The ARPANET project grew into the Internet — which can be thought of as “the mother of all computer clusters” (as the union of nearly all of the compute resources, including clusters, that happen to be connected). It also established the paradigm in use by all computer clusters in the world today — the use of packet-switched networks to perform interprocessor communications between processor (sets) located in otherwise disconnected frames.

The development of customer-built and research clusters proceded hand in hand with that of both networks and the Unix operating system from the early 1970s, as both TCP/IP and the Xerox PARC project created and formalized protocols for network-based communications. The Hydra operating system was built for a cluster of DEC PDP-11 minicomputers called C.mmp at C-MU in 1971. However, it wasn’t until circa 1983 that the protocols and tools for easily doing remote job distribution and file sharing were defined (largely within the context of BSD Unix, as implemented by Sun Microsystems) and hence became generally available commercially, along with a shared filesystem.

The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCnet wasn’t a commercial success and clustering per se didn’t really take off until DEC released their VAXcluster product in the 1984 for the VAX/VMS operating system. The ARCnet and VAXcluster products not only supported parallel computing, but also shared file systems and peripheral devices. They were supposed to give you the advantage of parallel processing, while maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP running on Alpha and Itanium systems.

Two other noteworthy early commercial clusters were the Tandem Himalaya (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for business use).

No history of commodity compute clusters would be complete without noting the pivotal role played by the development of Parallel Virtual Machine (PVM) software in 1989. This open source software based on TCP/IP communications enabled the instant creation of a virtual supercomputer — a high performance compute cluster — made out of any TCP/IP connected systems. Free form heterogeneous clusters built on top of this model rapidly achieved total throughput in FLOPS that greatly exceeded that available even with the most expensive “big iron” supercomputers. PVM and the advent of inexpensive networked PC’s led, in1993, to a NASA project to build supercomputers out of commodity clusters. In 1995 the invention of the “beowulf”-style cluster — a compute cluster built on top of a commodity network for the specific purpose of “being a supercomputer” capable of performing tightly coupled parallel HPC computations. This in turn spurred the independent development of Grid computing as a named entity, although Grid-style clustering had been around at least as long as the Unix operating system and the Arpanet, whether or not it, or the clusters that used it, were named. reference: wikipedia

<think>好的,我现在需要回答用户关于缓存工作原理及应用场景的问题。首先,我得先理解用户的需求。用户提到他们想了解缓存的基本概念和使用方法,特别是缓存的工作原理和应用场景。根据用户提供的引用资料,我需要综合这些信息,整理出一个清晰、准确的回答。 首先,我需要回顾用户提供的引用内容。引用[1]提到了Guava Cache和一些本地缓存,引用[2]详细介绍了Redis作为缓存的应用,包括缓存策略、预热、穿透、雪崩和击穿。引用[3]解释了缓存雪崩的原因,引用[4]讨论了布隆过滤器在解决缓存穿透中的应用。这些资料涵盖了缓存的类型、常见问题及解决方案,以及具体技术的应用。 接下来,我需要组织回答的结构。用户的问题分为两个部分:工作原理和应用场景。我应该先解释缓存是什么,然后说明其工作原理,包括数据存储、访问流程和更新策略。接着讨论应用场景,举几个典型的例子,比如数据库查询、静态资源缓存、分布式系统等。然后,结合引用中的信息,补充常见问题及解决方案,如缓存雪崩、穿透、击穿,以及如何用布隆过滤器应对缓存穿透。 需要注意用户提到的系统级指令,比如使用正确的中文回答,行内公式用$...$,独立公式用$$...$$。虽然当前问题可能不涉及数学公式,但如果有相关的内容,比如缓存命中率的计算,可能需要用公式表示,例如命中率$H = \frac{N_{hit}}{N_{total}}$。但根据现有引用资料,可能不需要太多公式,不过还是需要保持警惕,确保格式正确。 另外,用户要求生成相关问题,我需要根据回答内容提出几个相关的问题,比如缓存策略的选择、具体问题的解决方案比较等。这些问题应该能够帮助用户进一步深入理解缓存的相关知识。 需要确保引用的正确使用。在回答中提到的缓存雪崩、穿透等问题,末尾需要添加引用标识,如[^3]、[^4]等。但要注意,引用标识需要与用户提供的引用编号对应。例如,缓存雪崩的解释来自引用[3],而布隆过滤器的应用来自引用[4]。 可能还需要检查是否有遗漏的重要概念。比如,缓存的类型(本地缓存与分布式缓存)、常见的缓存算法(如LRU、LFU)等,用户提供的引用中提到了Guava Cache和Redis,所以应该涵盖这些内容。例如,在应用场景部分,可以提到本地缓存适用于高频访问但数据量小的场景,而分布式缓存如Redis适合大规模数据和高并发环境。 同时,需要确保语言流畅,结构清晰,分点说明。例如,工作原理部分可以分为数据存储、访问流程、更新策略;应用场景分点列举,每个场景简要说明;常见问题及解决方案同样分点,每个问题解释原因和解决方法。 最后,生成的相关问题需要与内容紧密相关,能够引导用户进一步探索。例如,如何选择缓存策略,本地缓存和分布式缓存的区别,或者具体技术如Redis和Guava Cache的对比。 总结下来,回答的结构大致如下: 1. 缓存的定义和作用 2. 工作原理 a. 数据存储 b. 访问流程 c. 更新策略 3. 应用场景 a. 数据库查询加速 b. 静态资源缓存 c. 分布式系统 4. 常见问题及解决方案 a. 缓存雪崩 b. 缓存穿透 c. 缓存击穿 5. 相关技术(如Guava Cache、Redis) 6. 生成的问题 现在需要将这些内容用中文组织起来,确保每个部分都参考了用户提供的引用,并正确标注引用标识。同时,保持语言简洁明了,结构清晰,帮助用户逐步理解缓存的工作原理和应用场景。</think>### 缓存(Cache)的工作原理及应用场景 #### 一、缓存的基本概念 缓存是一种**临时存储机制**,通过将高频访问的数据保存在**高速存储介质**中(如内存),减少对低速存储(如磁盘数据库)的直接访问,从而提升系统性能和响应速度[^1]。 #### 二、缓存的工作原理 1. **数据存储逻辑** 缓存通过键值对(Key-Value)存储数据。当用户首次请求数据时,系统会: - 检查缓存中是否存在该数据(缓存命中); - 若未命中,则从数据库读取数据并写入缓存,供后续请求使用。 2. **访问流程** 以数据库查询为例: ```plaintext 用户请求 → 检查缓存 → 命中 → 返回缓存数据 ↓ 未命中 → 查询数据库 → 写入缓存 → 返回数据 ``` 3. **更新策略** - **定期生成**:按固定周期刷新缓存(如每日更新商品列表)[^2]。 - **实时生成**:数据变更时立即更新缓存(如用户修改个人信息)。 - **淘汰算法**:通过LRU(最近最少使用)、LFU(最不经常使用)等算法淘汰旧数据。 #### 三、核心应用场景 1. **数据库查询加速** 将高频查询结果(如电商首页商品信息)缓存至内存,降低数据库压力,响应时间可从毫秒级降至微秒级[^1]。 2. **静态资源缓存** 对图片、CSS、JS等静态文件进行缓存(如CDN节点缓存),减少服务器负载并提升用户体验。 3. **分布式系统** 使用分布式缓存(如Redis、Memcached)实现跨服务的共享数据存储,例如会话(Session)管理、分布式锁等[^2]。 4. **应对突发流量** 在秒杀活动中,通过缓存库存信息避免数据库被瞬时高并发请求击穿。 #### 四、常见问题与解决方案 | 问题类型 | 原因 | 解决方案 | 引用 | |----------------|-------------------------------|------------------------------------------|-------| | **缓存雪崩** | 大量缓存同时失效或服务宕机 | 分散过期时间、集群部署、熔断降级机制 | [^3] | | **缓存穿透** | 恶意查询不存在的数据 | 布隆过滤器拦截非法请求、空值缓存 | [^4] | | **缓存击穿** | 热点数据过期后瞬时高并发请求 | 互斥锁更新、永不过期+异步刷新 | [^2] | #### 五、技术选型建议 - **本地缓存**(Guava Cache、Caffeine):适用于单机高频小数据量场景,如配置信息缓存[^1]。 - **分布式缓存**(Redis、Memcached):适合跨服务、大数据量、高可用需求场景,如用户会话共享[^2]。 --- ### 相关问题 1. 如何根据业务场景选择本地缓存与分布式缓存? 2. Redis的持久化机制(RDB/AOF)如何影响缓存可靠性? 3. 布隆过滤器如何减少缓存穿透?其误判率如何控制[^4]? 4. 缓存与数据库双写一致性问题有哪些解决方案?
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值