How To Optimize Your Site With HTTP Caching

[转载] http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/

 

 

I’ve been on a web tweaking kick lately: how to speed up your javascript , gzip files with your server , and now how to set up caching. But the reason is simple: site performance is a feature.

For web sites, speed may be feature #1. Users hate waiting , we get frustrated by buffering videos and pages that pop together as images slowly load. It’s a jarring (aka bad) user experience. Time invested in site optimization is well worth it, so let’s dive in.


What is Caching?

Caching is a great example of the ubiquitous time-space tradeoff in programming. You can save time by using space to store results.

In the case of websites, the browser can save a copy of images, stylesheets, javascript or the entire page. The next time the user needs that resource (such as a script or logo that appears on every page), the browser doesn’t have to download it again. Fewer downloads means a faster, happier site.

Here’s a quick refresher on how a web browser gets a page from the server:

HTTP_request.png

1. Browser: Yo! You got index.html?
2. Server: (Looking it up…)
3. Sever: Totally, dude! It’s right here!
4. Browser: That’s rad, I’m downloading it now and showing the user.

(The actual HTTP protocol may have minor differences; see Live HTTP Headers for more details.)


Caching’s Ugly Secret: It Gets Stale

Caching seems fun and easy. The browser saves a copy of a file (like a logo image) and uses this cached (saved) copy on each page that needs the logo. This avoids having to download the image ever again and is perfect, right?

Wrongo. What happens when the company logo changes? Amazon.com becomes Nile.com? Google becomes Quadrillion?

We’ve got a problem. The shiny new logo needs to go with the shiny new site, caches be damned.

So even though the browser has the logo, it doesn’t know whether the image can be used. After all, the file may have changed on the server and there could be an updated version.

So why bother caching if we can’t be sure if the file is good? Luckily, there’s a few ways to fix this problem.


Caching Method 1: Last-Modified

One fix is for the server to tell the browser what version of the file it is sending. A server can return a Last-modified date along with the file (let’s call it logo.png), like this:

Last-modified: Fri, 16 Mar 2007 04:00:25 GMT
File Contents (could be an image, HTML, CSS, Javascript...)

Now the browser knows that the file it got (logo.png) was created on Mar 16 2007. The next time the browser needs logo.png, it can do a special check with the server:

HTTP-caching-last-modified_1.png

1. Browser: Hey, give me logo.png, but only if it’s been modified since Mar 16, 2007.
2. Server: (Checking the modification date)
3. Server: Hey, you’re in luck! It was not modified since that date. You have the latest version.
4. Browser: Great! I’ll show the user the cached version.

Sending the short “Not Modified” message is a lot faster than needing to download the file again, especially for giant javascript or image files. Caching saves the day (err… the bandwidth).


Caching Method 2: ETag

Comparing versions with the modification time generally works, but could lead to problems. What if the server’s clock was originally wrong and then got fixed? What if daylight savings time comes early and the server isn’t updated? The caches could be inaccurate.

ETags to the rescue. An ETag is a unique identifier given to every file. It’s like a hash or fingerprint: every file gets a unique fingerprint, and if you change the file (even by one byte), the fingerprint changes as well.

Instead of sending back the modification time, the server can send back the ETag (fingerprint):

ETag: ead145f
File Contents (could be an image, HTML, CSS, Javascript...)

The ETag can be any string which uniquely identifies the file. The next time the browser needs logo.png, it can have a conversation like this:

HTTP_caching_if_none_match.png

1. Browser: Can I get logo.png, if nothing matches tag “ead145f”?
2. Server: (Checking fingerprint on logo.png)
3. Server: You’re in luck! The version here is “ead145f”. It was not modified .
4. Browser: Score! I’ll show the user my cached version.

Just like last-modifed, ETags solve the problem of comparing file versions , except that “if-none-match” is a bit harder to work into a sentence than “if-modified-since”. But that’s my problem, not yours. ETags work great.


Caching Method 3: Expires

Caching a file and checking with the server is nice, except for one thing: we are still checking with the server. It’s like analyzing your milk every time you make cereal to see whether it’s safe to drink. Sure, it’s better than buying a new gallon each time, but it’s not exactly wonderful.

And how do we handle this milk situation? With an expiration date!

If we know when the milk (logo.png) expires, we keep using it until that date (and maybe a few days longer, if you’re a college student). As soon as it goes expires, we contact the server for a fresh copy, with a new expiration date. The header looks like this:

Expires: Tue, 20 Mar 2007 04:00:25 GMT
File Contents (could be an image, HTML, CSS, Javascript...)

In the meantime, we avoid even talking to the server if we’re in the expiration period:

HTTP_caching_expires.png

There isn’t a conversation here; the browser has a monologue.

1. Browser: Self, is it before the expiration date of Mar 20, 2007? (Assume it is).
2. Browser: Verily, I will show the user the cached version.

And that’s that. The web server didn’t have to do anything. The user sees the file instantly.


Caching Method 4: Max-Age

Oh, we’re not done yet. Expires is great, but it has to be computed for every date. The max-age header lets us say “This file expires 1 week from today”, which is simpler than setting an explicit date.

Max-Age is measured in seconds. Here’s a few quick second conversions :

  • 1 day in seconds = 86400
  • 1 week in seconds = 604800
  • 1 month in seconds = 2629000
  • 1 year in seconds = 31536000 (effectively infinite on internet time)


Bonus Header: Public and Private

The cache headers never cease. Sometimes a server needs to control when certain resources are cached.

  • Cache-control: public means the cached version can be saved by proxies and other intermediate servers, where everyone can see it.
  • Cache-control: private means the file is different for different users (such as their personal homepage). The user’s private browser can cache it, but not public proxies.
  • Cache-control: no-cache means the file should not be cached. This is useful for things like search results where the URL appears the same but the content may change.

However, be wary that some cache directives only work on newer HTTP 1.1 browsers. If you are doing special caching of authenticated pages then read more about caching .


Ok, I’m Sold: Enable Caching

We’ve seen the following headers that really help our caching:

  • Last-modified:
  • ETag:
  • Expires:
  • Cache-control: max-age=86400

Now let’s put it all together and get Apache to return the right headers. If your resource changes:

  • Daily or more: Use last-modifed or ETag. Apache does this for you automatically!
  • Weekly-monthly: Use max-age for a day or week. Put the .htaccess file in the directory you want to cache:






How can a file never change? Simple. Put different versions of the file in different directories.

For instacalc, I keep the core files of each build in a unique directory, such as “build490″. When I’m using build490, index.html pulls all images, stylesheets, and javascripts from that directory. I can cache the the files in build490 forever because build490 will never change.

If I have a new version (build491… how creative), index.html will point to that folder instead. I’ve created scripts to take care of this find/replace housekeeping, though you can use URL rewriting rules as well. I prefer to have the HTML point to the actual file.

Remember that index.html cannot be cached forever , since it changes every now and then to point to new directories. So for the “loader” file, I’m using the regular Last-Modified caching strategy. I think it’s fine to have that small “304 Not Modified” communication with the server — we still avoid sending requests for all the files in the build490 folder. If you want, monkey around and give the index.html file a small expiration (say a few hours).


Final Step: Check Your Caching

To see whether your files are cached, do the following:

  • Online: Examine your site in the cacheability query (green means cacheable)
  • In Browser: Use FireBug or Live HTTP Headers to see the HTTP response (304 Not Modified, Cache-Control, etc.). In particular, I’ll load a page and use Live HTTP Headers to make sure no packets are being sent to load images, logos, and other cached files. If you press ctrl+refresh the browser will force a reload of all files.

Read more about caching , or the HTTP header fields . Caching doesn’t help with the initial download (that’s what gzip is for), but it makes the overall site experience much better.

Remember: Creating unique URL s is the simplest way to caching heaven. Have fun streamlining your site!

 

 

内容概要:本文介绍了一种利用元启发式算法(如粒子群优化,PSO)优化线性二次调节器(LQR)控制器加权矩阵的方法,专门针对复杂的四级倒立摆系统。传统的LQR控制器设计中,加权矩阵Q的选择往往依赖于经验和试错,而这种方法难以应对高维度非线性系统的复杂性。文中详细描述了如何将控制器参数优化问题转化为多维空间搜索问题,并通过MATLAB代码展示了具体实施步骤。关键点包括:构建非线性系统的动力学模型、设计适应度函数、采用对数缩放技术避免局部最优、以及通过实验验证优化效果。结果显示,相比传统方法,PSO优化后的LQR控制器不仅提高了稳定性,还显著减少了最大控制力,同时缩短了稳定时间。 适合人群:控制系统研究人员、自动化工程专业学生、从事机器人控制或高级控制算法开发的技术人员。 使用场景及目标:适用于需要精确控制高度动态和不确定性的机械系统,特别是在处理多自由度、强耦合特性的情况下。目标是通过引入智能化的参数寻优手段,改善现有控制策略的效果,降低人为干预的需求,提高系统的鲁棒性和性能。 其他说明:文章强调了在实际应用中应注意的问题,如避免过拟合、考虑硬件限制等,并提出了未来研究方向,例如探索非对角Q矩阵的可能性。此外,还分享了一些实践经验,如如何处理高频抖动现象,以及如何结合不同类型的元启发式算法以获得更好的优化结果。
To optimize queries in Hive, you can follow these best practices: 1. Use partitioning: Partitioning is a technique of dividing a large table into smaller, more manageable parts based on specific criteria such as date, region, or category. It can significantly improve query performance by reducing the amount of data that needs to be scanned. 2. Use bucketing: Bucketing is another technique of dividing a large table into smaller, more manageable parts based on the hash value of a column. It can improve query performance by reducing the number of files that need to be read. 3. Use appropriate file formats: Choose the appropriate file format based on the type of data and the query patterns. For example, ORC and Parquet formats are optimized for analytical queries, while Text and SequenceFile formats are suitable for batch processing. 4. Optimize data storage: Optimize the way data is stored on HDFS to improve query performance. For example, use compression to reduce the amount of data that needs to be transferred across the network. To create a partition table with Hive, you can follow these steps: 1. Create a database (if it doesn't exist) using the CREATE DATABASE statement. 2. Create a table using the CREATE TABLE statement, specifying the partition columns using the PARTITIONED BY clause. 3. Load data into the table using the LOAD DATA statement, specifying the partition values using the PARTITION clause. Here's an example: ``` CREATE DATABASE my_db; USE my_db; CREATE TABLE my_table ( id INT, name STRING ) PARTITIONED BY (date STRING); LOAD DATA LOCAL INPATH '/path/to/data' OVERWRITE INTO TABLE my_table PARTITION (date='2022-01-01'); ``` This creates a table called `my_table` with two columns `id` and `name`, and one partition column `date`. The data is loaded into the table with the partition value `2022-01-01`.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值