1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
|
---
title: HTTP caching
slug: Web/HTTP/Caching
translation_of: Web/HTTP/Caching
tags:
- Caching
- 快取
- Guide
- 指引
- HTTP
- 超文本傳輸協定
---
<div>{{HTTPSidebar}}</div>
<p class="summary">藉由重複使用先前取過的資源,網站與網頁應用程式能夠顯著地提升效能。caching可以減少網路傳輸量以降低一個資源可展示的延遲時間。善用 HTTP caching可以讓網站可以回應更多請求。</p>
<h2 id="不同種類的快取">不同種類的快取</h2>
<p>快取是一種儲存伺服器回復的訊息且用此存檔回覆給請求者的技術。當快取伺服器有存者一份請求檔案的回覆,快取伺服器會攔截此請求訊息,回覆給請求者存在快取上的檔案,而不是從請求者請求的網頁伺服器去請求原始檔案。這樣的運作機制能達成下列幾個目的:讓網頁伺服器不用處理每個從客戶端發出的請求,減輕機器運作的負擔。且由於傳輸起點距離更接近請求端,能讓整體請求的過程效能更加,整體請求需要更少的時間傳送資源。對一個要達成高效能的網站來講,快取一個很重要的一塊。另一方面來講,快取的請求、回復、儲存機制必須設定好,別讓存在快取伺服器的檔案都是同一個:重要的是當資源改變才去使用快取,而不是一直存放著。</p>
<p>快取有好幾種,但他們可以分為兩大類:共用和私有的快取。共用的快取定義是指快取伺服器上存的回覆能給好幾個不同的請求者服務。而私有的快取就相對只會服務一個請求者。此頁面講到的快取大部分都是指代理伺服器和瀏覽器的快取,但是快取還有像是閘道器快取、CDN快取、反向代理伺服器快取 、負載平衡器快取,它們都是部屬在網頁伺服器那邊,讓網站和網頁應用程式更加穩定,效能更好,且有更好的擴增姓。</p>
<p><img alt="What a cache provide, advantages/disadvantages of shared/private caches." src="https://mdn.mozillademos.org/files/13777/HTTPCachtType.png" style="height: 573px; width: 910px;"></p>
<h3 id="瀏覽器私有的快取">瀏覽器私有的快取</h3>
<p>私有的快取只會服務一個使用者。你可能已經在設定瀏覽器的時候看過快取了。一個瀏覽器快取會存放所有透過HTTP協定下載的檔案。這類型的快取是為了方便使用者上下頁移動、存檔、或者檢視檔案原始碼等等,讓使用者不用再次向原始伺服器請求檔案。此機制同樣的增進線下瀏覽快取。</p>
<h3 id="代理伺服器的共用快取">代理伺服器的共用快取</h3>
<p>一個共用的快取伺服器,是指快取存放者能讓多位使用者請求的檔案副本。舉例來說,ISP或者你的公司內部網路可能會設置代理伺服器,用來服務每個使用者,讓一些較常用的檔案可以重複使用多次,減少網路交通的流量。</p>
<h2 id="Targets_of_caching_operations">Targets of caching operations</h2>
<p>HTTP caching is optional, but reusing a cached resource is usually desirable. However, common HTTP caches are typically limited to caching responses to {{HTTPMethod("GET")}} and may decline other methods. The primary cache key consists of the request method and target URI (oftentimes only the URI is used as only GET requests are caching targets). Common forms of caching entries are:</p>
<ul>
<li>Successful results of a retrieval request: a {{HTTPStatus(200)}} (OK) response to a {{HTTPMethod("GET")}} request containing a resource like HTML documents, images or files.</li>
<li>Permanent redirects: a {{HTTPStatus(301)}} (Moved Permanently) response.</li>
<li>Error responses: a {{HTTPStatus(404)}} (Not Found) result page.</li>
<li>Incomplete results: a {{HTTPStatus(206)}} (Partial Content) response.</li>
<li>Responses other than {{HTTPMethod("GET")}} if something suitable for use as a cache key is defined.</li>
</ul>
<p>A cache entry might also consist of multiple stored responses differentiated by a secondary key, if the request is target of content negotiation. For more details see the information about the {{HTTPHeader("Vary")}} header <a href="#Varying_responses">below</a>.</p>
<h2 id="控制快取">控制快取</h2>
<h3 id="Cache-control_檔頭"><code>Cache-control</code> 檔頭</h3>
<p>{{HTTPHeader("Cache-Control")}} 是HTTP/1.1用來特別指令快取如何處理回覆和要求的通用檔頭欄位。使用此欄位和多種的指令,來定義你的快取機制。</p>
<h4 id="不要存任何快取">不要存任何快取</h4>
<p>快取不該存取任何的使用者請求或者伺服器的回覆。每個請求都是送到原始的伺服器去取得資源。</p>
<pre>Cache-Control: no-store
</pre>
<h4 id="快取需存取,但是要重新驗證">快取需存取,但是要重新驗證</h4>
<p>快取伺服器在把已儲存的複製版本傳給請求者之前,先會送一個請求給網頁伺服器做驗證。</p>
<pre>Cache-Control: no-cache</pre>
<h4 id="私有或共用的快取">私有或共用的快取</h4>
<p>共用(Public)這個指令指出此回覆訊息可以由任何快取給存取。這點可以變成很有用處,假如頁面有不容易快取成功的HTTP驗證的訊息或者回覆狀態碼,現在應該很容易被存取了。</p>
<p>相對的,私有(Private)的指令指示快取只給一個使用者使用,且不能被共用的快取伺服器給儲存過。隱私視窗(無痕模式)的快取就可能是這樣子。</p>
<pre>Cache-Control: private
Cache-Control: public
</pre>
<h4 id="有效期限">有效期限</h4>
<p>在這裡最重要的指令就是"<code>max-age=<seconds></code>" ,意思是指存放在快取伺服器上的資源有剩下多少時間被認定還是新鮮的。 跟{{HTTPHeader("Expires")}}不太一樣,這個檔頭欄位快取指的是請求此回覆的日期和時間。對於程式中不常更新的檔案,你可以積極地使用此機制。這些檔案包含了,圖檔、CSS、Javascripts檔案等等。</p>
<p>想要了解更多的話,請參見下面的<a href="#Freshness">Freshness</a>。</p>
<pre>Cache-Control: max-age=31536000</pre>
<h4 id="驗證">驗證</h4>
<p>當使用"<code>must-revalidate</code>"指令時,快取伺服器一定要先發送請求訊息給網頁伺服器驗證,請已經確認是過有效期限且檔案有更新的回覆的話,舊的檔案就不能使用。假如想了解更多,請參見下面的<a href="#Cache_validation">Validation</a>。</p>
<pre>Cache-Control: must-revalidate</pre>
<h3 id="Pragma檔頭欄位"><code>Pragma</code>檔頭欄位</h3>
<p>{{HTTPHeader("Pragma")}} 是HTTP/1.0的檔頭欄位,此檔頭欄位沒有特別指是HTTP回覆怎麼處理,所以用此來取代HTTP/1.1 <code>Cache-Control</code>通用檔頭欄位並不是很穩定。假如<code>Cache-Control</code> 檔頭欄位在傳送請求訊息時被省略掉了,此檔頭欄位運作的結果跟 <code>Cache-Control: no-cache</code>一樣<code>。此Pragma</code>欄位只能跟 HTTP/1.0的請求者使用。</p>
<h2 id="Freshness">Freshness</h2>
<p>Once a resource is stored in a cache, it could theoretically be served by the cache forever. Caches have finite storage so items are periodically removed from storage. This process is called <em>cache eviction</em>. On the other side, some resources may change on the server so the cache should be updated. As HTTP is a client-server protocol, servers can't contact caches and clients when a resource changes; they have to communicate an expiration time for the resource. Before this expiration time, the resource is <em>fresh</em>; after the expiration time, the resource is <em>stale</em>. Eviction algorithms often privilege fresh resources over stale resources. Note that a stale resource is not evicted or ignored; when the cache receives a request for a stale resource, it forwards this request with a {{HTTPHeader("If-None-Match")}} to check if it is in fact still fresh. If so, the server returns a {{HTTPStatus("304")}} (Not Modified) header without sending the body of the requested resource, saving some bandwidth.</p>
<p>Here is an example of this process with a shared cache proxy:</p>
<p><img alt="Show how a proxy cache acts when a doc is not cache, in the cache and fresh, in the cache and stale." src="http_staleness.png"></p>
<p>The freshness lifetime is calculated based on several headers. If a "<code>Cache-Control: max-age=N</code>" header is specified, then the freshness lifetime is equal to N. If this header is not present, which is very often the case, it is checked if an {{HTTPHeader("Expires")}} header is present. If an <code>Expires</code> header exists, then its value minus the value of the {{HTTPHeader("Date")}} header determines the freshness lifetime.</p>
<h3 id="Heuristic_freshness_checking">Heuristic freshness checking</h3>
<p>If an origin server does not explicitly specify freshness (e.g. using {{HTTPHeader("Cache-Control")}} or {{HTTPHeader("Expires")}} header) then a heuristic approach may be used.</p>
<p>In this case look for a {{HTTPHeader("Last-Modified")}} header. If this header is present, then the cache's freshness lifetime is equal to the value of the <code>Date</code> header minus the value of the <code>Last-modified</code> header divided by 10. The expiration time is computed as follows:</p>
<pre>expirationTime = responseTime + freshnessLifetime - currentAge</pre>
<p>where <code>responseTime</code> is the time at which the response was received according to the browser. For more information see <a href="https://datatracker.ietf.org/doc/html/rfc7234#section-4.2.2">RFC 7234: Hypertext Transfer Protocol (HTTP/1.1): 4.2.2. Calculating Heuristic Freshness</a>.</p>
<h3 id="Revved_resources">Revved resources</h3>
<p>The more we use cached resources, the better the responsiveness and the performance of a Web site will be. To optimize this, good practices recommend to set expiration times as far in the future as possible. This is possible on resources that are regularly updated, or often, but is problematic for resources that are rarely and infrequently updated. They are the resources that would benefit the most from caching resources, yet this makes them very difficult to update. This is typical of the technical resources included and linked from each Web pages: JavaScript and CSS files change infrequently, but when they change you want them to be updated quickly.</p>
<p>Web developers invented a technique that Steve Souders called <em>revving</em><sup><a href="https://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring/">[1]</a></sup>. Infrequently updated files are named in a specific way: in their URL, usually in the filename, a revision (or version) number is added. That way each new revision of this resource is considered as a resource on its own that <em>never</em> changes and that can have an expiration time very far in the future, usually one year or even more. In order to have the new versions, all the links to them must be changed, that is the drawback of this method: additional complexity that is usually taken care of by the tool chain used by Web developers. When the infrequently variable resources change they induce an additional change to often variable resources. When these are read, the new versions of the others are also read.</p>
<p>This technique has an additional benefit: updating two cached resources at the same time will not lead to the situation where the out-dated version of one resource is used in combination with the new version of the other one. This is very important when web sites have CSS stylesheets or JS scripts that have mutual dependencies, i.e., they depend on each other because they refer to the same HTML elements.</p>
<p><img alt="How the revved cache mechanism works. With minor typo fix to grammar issue: https://github.com/mdn/sprints/issues/2618" src="http_revved_fix_typo.png"></p>
<p>The revision version added to revved resources doesn't need to be a classical revision string like 1.1.3, or even a monotonously growing suite of number. It can be anything that prevent collisions, like a hash or a date.</p>
<h2 id="Cache_validation">Cache validation</h2>
<p>When a cached document's expiration time has been reached, it is either validated or fetched again. Validation can only occur if the server provided either a <em>strong validator</em> or a <em>weak validator</em>.</p>
<p>Revalidation is triggered when the user presses the reload button. It is also triggered under normal browsing if the cached response includes the "<code>Cache-Control: must-revalidate</code>" header. Another factor is the cache validation preferences in the <code>Advanced->Cache</code> preferences panel. There is an option to force a validation each time a document is loaded.</p>
<h3 id="ETags">ETags</h3>
<p>The {{HTTPHeader("ETag")}} response header is an <em>opaque-to-the-useragent</em> value that can be used as a strong validator. That means that a HTTP user-agent, such as the browser, does not know what this string represents and can't predict what its value would be. If the <code>ETag</code> header was part of the response for a resource, the client can issue an {{HTTPHeader("If-None-Match")}} in the header of future requests – in order to validate the cached resource.</p>
<h3 id="Last-Modified">Last-Modified</h3>
<p>The {{HTTPHeader("Last-Modified")}} response header can be used as a weak validator. It is considered weak because it only has 1-second resolution. If the <code>Last-Modified</code> header is present in a response, then the client can issue an {{HTTPHeader("If-Modified-Since")}} request header to validate the cached document.</p>
<p>When a validation request is made, the server can either ignore the validation request and respond with a normal {{HTTPStatus(200)}} <code>OK</code>, or it can return {{HTTPStatus(304)}} <code>Not Modified</code> (with an empty body) to instruct the browser to use its cached copy. The latter response can also include headers that update the expiration time of the cached document.</p>
<h2 id="Varying_responses">Varying responses</h2>
<p>The {{HTTPHeader("Vary")}} HTTP response header determines how to match future request headers to decide whether a cached response can be used, or if a fresh one must be requested from the origin server.</p>
<p>When a cache receives a request that has a <code>Vary</code> header field, it must not use a cached response by default unless all header fields specified in the <code>Vary</code> header match in both the original (cached) request and the new request.</p>
<p><img alt="The Vary header leads cache to use more HTTP headers as key for the cache." src="http_vary.png">This feature is commonly used to allow a resource to be cached in uncompressed and (various) compressed forms, and served appropriately to user agents based on the encodings that they support. For example, a server can set <code>Vary: Accept-Encoding</code> to ensure that a separate version of a resource is cached for all requests that specify support for a particular set of encodings, e.g. <code>Accept-Encoding: gzip,deflate,sdch</code>.</p>
<pre>Vary: Accept-Encoding</pre>
<div class="notecard note">
<h4>Note</h4>
<p>Use <code>Vary</code> with care—it can easily reduce the effectiveness of caching! A caching server should use <a href="#normalization">normalization</a> to reduce duplicated cache entries and unnecessary requests to the origin server. This is particularly true when using <code>Vary</code> with headers and header values that can have many values.</p>
</div>
<p>The <code>Vary</code> header can also be useful for serving different content to desktop and mobile users, or to allow search engines to discover the mobile version of a page (and perhaps also tell them that no <a href="https://en.wikipedia.org/wiki/Cloaking">Cloaking</a> is intended). This is usually achieved with the <code>Vary: User-Agent</code> header, and works because the {{HTTPHeader("User-Agent")}} header value is different for mobile and desktop clients.</p>
<pre>Vary: User-Agent</pre>
<h3 id="Normalization">Normalization</h3>
<p>As discussed above, caching servers will by default match future requests <em>only</em> to requests with <em>exactly</em> the same headers and header values. That means a request will be made to the origin and a new cache will be created for every slight variant that might be specified by different user-agents.</p>
<p>For example, by default all of the following result in a separate request to the origin and a separate cache entry: <code>Accept-Encoding: gzip,deflate,sdch</code>, <code>Accept-Encoding: gzip,deflate</code>, <code>Accept-Encoding: gzip</code>. This is true even though the origin server will probably respond with — and store — the same resource for all requests (a gzip)!</p>
<p>To avoid unnecessary requests and duplicated cache entries, caching servers should use <strong>normalization</strong> to pre-process the request and cache only files that are needed. For example, in the case of <code>Accept-Encoding</code> you could check for <code>gzip</code> and other compression types in the header before doing further processing, and otherwise unset the header. In "pseudo code" this might look like:</p>
<pre>// Normalize Accept-Encoding
if (req.http.Accept-Encoding) {
if (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
}
// elsif other encoding types to check
else {
unset req.http.Accept-Encoding;
}
}
</pre>
<p><code>User-Agent</code> has even more variation than <code>Accept-Encoding</code>. So if using <code>Vary: User-Agent</code> for caching mobile/desktop variants of files you'd similarly check for the presence of <code>"mobile"</code> and <code>"desktop"</code> in the request <code>User-Agent</code> header, and then clear it.</p>
<h2 id="See_also">See also</h2>
<ul>
<li><a href="https://datatracker.ietf.org/doc/html/rfc7234">RFC 7234: Hypertext Transfer Protocol (HTTP/1.1): Caching</a></li>
<li><a href="https://www.mnot.net/cache_docs">Caching Tutorial – Mark Nottingham</a></li>
<li><a href="https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching">HTTP caching – Ilya Grigorik</a></li>
<li><a href="https://redbot.org/">RedBot</a>, a tool to check your cache-related HTTP headers.</li>
</ul>
|