aboutsummaryrefslogtreecommitdiff
path: root/files/zh-cn/mozilla/tech/xpcom/guide/hashtables/index.html
blob: 24740b535c9d463fd6115c0242f11db737619369 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
title: Hashtables
slug: Mozilla/Tech/XPCOM/Guide/Hashtables
tags:
  - XPCOM
  - 所有分类
translation_of: Mozilla/Tech/XPCOM/Guide/Hashtables
---
<p> </p>
<h2 id="What_Is_a_Hashtable.3F" name="What_Is_a_Hashtable.3F">What Is a Hashtable?</h2>
<p>A hashtable is a data construct that stores a set of <b>items</b>. Each item has a <b>key</b> that identifies the item. Items are found, added, and removed from the hashtable by using the key. Hashtables may seem like <a href="cn/XPCOM_array_guide">arrays</a>, but there are important differences:</p>
<p>哈希表是一个存储一系列<b>元素</b>的数据结构。每个元素都由一个<b>关键字</b>来标识。元素可以通过关键字来进行查找,添加,删除操作。哈希表非常类似<a href="cn/XPCOM_array_guide">arrays</a>,但是也有一些很大的区别。</p>
<table class="standard-table">
 <tbody>
  <tr>
   <th> </th>
   <th class="header">数组</th>
   <th class="header">哈希表</th>
  </tr>
  <tr>
   <td class="header">关键字:</td>
   <td>
    <i>
     整数:</i>
    arrays are always keyed on integers, and must be contiguous. 数组必须用整数作为关键字,而且每两个元素之间必须相邻接</td>
   <td>
    <i>
     任意类型:</i>
    almost any datatype can be used as key, including strings, integers, XPCOM interface pointers, IIDs, and almost anything else. Keys can be disjunct (i.e. you can store entries with keys 1, 5, and 3000). 任意类型都可以作为关键字,包括字符串,整数,XPCOM接口指针,IIDs等等。关键字之间可以不在一起(例如,你可以用1,5,和3000来作为关键字。)</td>
  </tr>
  <tr>
   <td class="header">查询时间:</td>
   <td>
    <i>
     O(1):</i>
    lookup time is a simple constant。查找时间是个简单的定值</td>
   <td>
    <i>
     O(1):</i>
    lookup time is mostly-constant, but the constant time can be larger than an array lookup。查找时间几乎是定值,但是比数组慢点。</td>
  </tr>
  <tr>
   <td class="header">排序:</td>
   <td>
    <i>
     sorted:</i>
    stored sorted; enumerated in a sorted fashion.</td>
   <td>
    <i>
     unsorted:</i>
    stored unsorted; cannot be enumerated in a sorted manner.</td>
  </tr>
  <tr>
   <td class="header">插入/删除:</td>
   <td>
    <i>
     O(n):</i>
    adding and removing items from a large array can be time-consuming</td>
   <td>
    <i>
     O(1):</i>
    adding and removing items from hashtables is a quick operation</td>
  </tr>
  <tr>
   <td class="header">浪费空间:</td>
   <td>
    <i>
     none:</i>
    Arrays are packed structures, so there is no wasted space.</td>
   <td>
    <i>
     some:</i>
    hashtables are not packed structures; depending on the implementation, there may be significant wasted memory.</td>
  </tr>
 </tbody>
</table>
<p>In their implementation, hashtables take the key and apply a mathematical <b>hash function</b> to <b>randomize</b> the key and then use the hash to find the location in the hashtable. Good hashtable implementations will automatically resize the hashtable in memory if extra space is needed, or if too much space has been allocated.</p>
<h2 id="When_Should_I_Use_a_Hashtable.3F" name="When_Should_I_Use_a_Hashtable.3F">When Should I Use a Hashtable?</h2>
<p>Hashtables are useful for</p>
<ul>
 <li>sets of data that need swift <b>random access</b>;</li>
 <li>with <b>non-integral keys</b> or <b>non-contiguous integral keys</b>;</li>
 <li>or where <b>items will be frequently added or removed</b>.</li>
</ul>
<p>Hashtables should
 <i>
  not</i>
 be used for</p>
<ul>
 <li>Sets that need to be <b>sorted</b>;</li>
 <li>Very small datasets (less than 12-16 items);</li>
 <li>Data that does not need random access.</li>
</ul>
<p>In these situations, an array, a linked-list, or various tree data structures are more efficient.</p>
<h2 id="Mozilla.27s_Hashtable_Implementations" name="Mozilla.27s_Hashtable_Implementations">Mozilla's Hashtable Implementations</h2>
<p>Mozilla has several hashtable implementations, which have been tested and, tuned, and hide the inner complexities of hashtable implementations:</p>
<ul>
 <li><code><a href="#PLDHash_.28JSDHash.29">PLDHash</a></code> - low-level C API; stores keys and data in one large memory structure; uses the heap efficiently; client must declare an "entry class" and may not hold onto entry pointers.</li>
 <li><code><a href="#PLHashTable">PLHashTable</a></code> - low-level C API; entry class pointers are constant; more efficient for large entry structures; often wastes memory making many small heap allocations.</li>
 <li><code><a href="#nsTHashtable">nsTHashtable</a></code> - low-level C++ wrapper around <code>PLDHash</code>; generates callback functions and handles most casting automagically. Client writes their own entry class which can include complex key and data types.</li>
 <li><code><a href="#nsBaseHashtable_and_friends:_nsDataHashtable.2C_nsInterfaceHashtable.2C_and_nsClassHashtable">nsDataHashtable/nsInterfaceHashtable/nsClassHashtable</a></code> - high-level C++ wrappers around <code>PLDHash</code>; simplifies the common usage pattern mapping a simple keytype to a simple datatype; client does not need to declare or manage an entry class; <code><b>nsDataHashtable</b></code> datatype is a scalar such as <code>RUint32</code>; <code><b>nsInterfaceHashtable</b></code> datatype is an interface; <code><b>nsClassHashtable</b></code> datatype is a class pointer owned by the hashtable.</li>
</ul>
<h3 id="Which_Hashtable_Should_I_Use.3F" name="Which_Hashtable_Should_I_Use.3F">Which Hashtable Should I Use?</h3>
<table class="standard-table">
 <tbody>
  <tr>
   <th class="header" colspan="2" rowspan="2"> </th>
   <th class="header" colspan="5">Key Type:</th>
  </tr>
  <tr>
   <th class="header">integer</th>
   <th class="header">String/CString</th>
   <th class="header">nsID</th>
   <th class="header">nsISupports*</th>
   <th class="header">Complex</th>
  </tr>
  <tr>
   <td class="header" rowspan="8">Data Type:</td>
   <td class="header">None (Hash Set)</td>
   <td><code>nsInt32HashSet</code></td>
   <td><code>ns(C)StringHashSet</code></td>
   <td colspan="3"><code>nsTHashtable&lt;...&gt;</code></td>
  </tr>
  <tr>
   <td class="header" rowspan="2">Simple (PRUint32)</td>
   <td colspan="4"><code>nsDataHashtable</code></td>
   <td rowspan="6"><code>nsTHashtable&lt;...&gt;</code></td>
  </tr>
  <tr>
   <td><code>&lt;nsUint32HashKey,<br>
    PRUint32&gt;</code></td>
   <td><code>&lt;ns(C)StringHashKey,<br>
    PRUint32&gt;</code></td>
   <td><code>&lt;nsIDHashKey,<br>
    PRUint32&gt;</code></td>
   <td><code>&lt;nsISupportsHashKey,<br>
    PRUint32&gt;</code></td>
  </tr>
  <tr>
   <td class="header" rowspan="2">Interface (nsISupports)</td>
   <td colspan="4"><code>nsInterfaceHashtable</code></td>
  </tr>
  <tr>
   <td><code>&lt;nsUint32HashKey,<br>
    nsISupports&gt;</code></td>
   <td><code>&lt;ns(C)StringHashKey,<br>
    nsISupports&gt;</code></td>
   <td><code>&lt;nsIDHashKey,<br>
    nsISupports&gt;</code></td>
   <td><code>&lt;nsISupportsHashKey,<br>
    nsISupports&gt;</code></td>
  </tr>
  <tr>
   <td class="header" rowspan="2">Class (nsString*)</td>
   <td colspan="4"><code>nsClassHashtable</code></td>
  </tr>
  <tr>
   <td><code>&lt;nsUint32HashKey,<br>
    nsString&gt;</code></td>
   <td><code>&lt;ns(C)StringHashKey,<br>
    nsString&gt;</code></td>
   <td><code>&lt;nsIDHashKey,<br>
    nsString&gt;</code></td>
   <td><code>&lt;nsISupportsHashKey,<br>
    nsString&gt;</code></td>
  </tr>
  <tr>
   <td class="header">Complex<br>
    (structures, etc.)</td>
   <td colspan="5"><code>nsTHashtable&lt;...&gt;</code></td>
  </tr>
 </tbody>
</table>
<h3 id="PLDHash_.28JSDHash.29" name="PLDHash_.28JSDHash.29">PLDHash (JSDHash)</h3>
<p><code>PLDHash</code> and <code>JSDHash</code> are the same thing; one is linked from the XPCOM libraries, the other from the JS libraries. <code>JSDHash</code> is used extensively in the SpiderMonkey JS engine.</p>
<p>The <code>PLDHash</code> implementation is a fairly low-level implementation, written in C. It is extremely flexible, but requires some time to understand and use. A basic guide is included here, but you should read most of <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h" rel="custom">xpcom/glue/pldhash.h</a></code> if you intend to use <code>PLDHash</code>. The C++ wrappers for <code>PLDHash</code> (see below) are often much easier and safer to use in C++ code, as many potential casting errors are easily avoided.</p>
<p>You must declare an <b>entry struct</b> type, deriving from <a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h#81" rel="custom">&lt;code&gt;PLDHashEntryHdr&lt;/code&gt;</a>. This entry struct should contain whatever data you wish to store in the hashtable (any pointer or fixed-length data type). <b>Note:</b> because of the double-hashing implementation, entries may move in memory when the hashtable is altered. If you need entry pointers to remain constant, you may want to consider using <code><a href="#PLHashTable">PLHashTable</a></code> instead.</p>
<p>You must also initialize a <a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h#312" rel="custom">&lt;code&gt;PLDHashTableOps&lt;/code&gt;</a> structure. This serves similarly to a vtable in C++, with pointers to appropriate user-defined functions that initialize, compare, and match entries. Because <code>PLDHash</code> does not know what datatype your key is, all functions that work with keys are declared using <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h#354" rel="custom">const void*</a></code>, and your client code must cast these pointers to the appropriate type.</p>
<p>PLDHashTables can be allocated on the stack or the heap:</p>
<ul>
 <li>When allocated on the stack, or as a C++ class member, the table must be initialized using <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h#427" rel="custom">PL_DHashTableInit</a></code>, and finalized using <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h#459" rel="custom">PL_DHashTableFinish</a></code>;</li>
 <li>When allocated on the heap, use <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h#410" rel="custom">PL_NewDHashTable</a></code> and <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.h#420" rel="custom">PL_DHashTableDestroy</a></code> to allocate and delete the table.</li>
</ul>
<h3 id="PLHashTable" name="PLHashTable">PLHashTable</h3>
<p><code>PLHashTable</code> is a part of NSPR. The header file can be found at <code><code><a href="https://dxr.mozilla.org/mozilla-central/source/nsprpub/lib/ds/plhash.h" rel="custom">nsprpub/lib/ds/plhash.h</a></code></code>. In general, <code><a href="#PLDHash_.28JSDHash.29">PLDHash</a></code> is a better solution than <code>PLHashTable</code>, because <code>PLHashTable</code> makes many heap allocations.</p>
<p>There are two situations where <code>PLHashTable</code> may be preferable to <code>PLDHash</code>:</p>
<ul>
 <li>You need entry-pointers to remain constant.</li>
 <li>The entries stored in the table are very large (larger than 12 words). <code>PLDHash</code> does not handle large entry structures efficiently.</li>
</ul>
<h3 id="nsTHashtable" name="nsTHashtable">nsTHashtable</h3>
<p><code>nsTHashtable</code> is a C++ template that wraps <code>PLDHash</code>. It hides many of the complexities of <code>PLDHash</code> (callback functions, the ops structure, etc). You should read <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTHashtable.h" rel="custom">xpcom/glue/nsTHashtable.h</a></code>.</p>
<p>To use <code>nsTHashtable</code>, you must declare an entry-class in a <a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTHashtable.h#65" rel="custom">pre-defined format</a>. This entry class contains the key and the data that you are hashing (just like <code>PLDHash</code>, above). It also declares functions that manipulate the key. In most cases, the functions of this entry class can be entirely inline. For examples of entry classes, see the declarations at <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsHashKeys.h" rel="custom">xpcom/glue/nsHashKeys.h</a></code>.</p>
<p>The template parameter is the entry class. You must use the <code>Init()</code> function to initalize the table properly. At this point, use the functions <code>PutEntry/GetEntry/RemoveEntry</code> to alter the hashtable. <code>EnumerateEntries</code> will do enumeration, but beware that the enumeration will occur in a seemingly-random order (no sorting).</p>
<ul>
 <li><code>nsTHashtable</code>s can be allocated on the stack, as class members, or on the heap.</li>
 <li>Entry pointers can and do change when items are added to the hashtable, or removed. Do not keep long-lasting pointers to entries.</li>
 <li>because of this, <code>nsTHashtable</code> is not inherently thread-safe. If you use a hashtable in a multi-thread environment, you must provide locking as appropriate.</li>
</ul>
<p>Before using <code>nsTHashtable</code>, see if <code>nsBaseHashtable</code> and relatives will work for you. They are much easier to use, because you do not have to declare an entry class. If you are hashing a simple key type to a simple data type, they are generally a better choice.</p>
<h3 id="nsBaseHashtable_and_friends:_nsDataHashtable.2C_nsInterfaceHashtable.2C_and_nsClassHashtable" name="nsBaseHashtable_and_friends:_nsDataHashtable.2C_nsInterfaceHashtable.2C_and_nsClassHashtable">nsBaseHashtable and friends: nsDataHashtable, nsInterfaceHashtable, and nsClassHashtable</h3>
<p>These C++ templates provide a high-level interface for using hashtables that hides most of the complexities of <code>PLDHash</code>. They provide the following features:</p>
<ul>
 <li>hashtable operations can be completed without using an entry class, making code easier to read;</li>
 <li>optional thread-safety: the hashtable can manage a read-write lock around the table;</li>
 <li>predefined key classes provide automatic cleanup of strings/interfaces</li>
 <li><code>nsInterfaceHashtable</code> and <code>nsClassHashtable</code> automatically release/delete data pointers to avoid leaks.</li>
</ul>
<p><code>nsBaseHashtable</code> is not used directly; choose one of the three derivative classes based on the data type you want to store. The <code>KeyClass</code> is taken from <code>nsHashKeys.h</code> and is the same for all three classes:</p>
<ul>
 <li><code>nsDataHashtable&lt;KeyClass,
  <i>
   DataType</i>
  &gt;</code> - <code>DataType</code> is a simple type such as <code>PRUint32</code> or <code>PRBool</code>.</li>
 <li><code>nsInterfaceHashtable&lt;KeyClass,
  <i>
   Interface</i>
  &gt;</code> - <code>Interface</code> is an XPCOM interface such as <code>nsISupports</code> or <code>nsIDOMNode</code></li>
 <li><code>nsClassHashtable&lt;KeyClass,
  <i>
   T</i>
  &gt;</code> - <code>T</code> is any C++ class. The hashtable stores a pointer to the class, and deletes it when the entry is removed.</li>
</ul>
<p>The important files to read are <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsBaseHashtable.h" rel="custom">xpcom/glue/nsBaseHashtable.h</a></code> and <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsHashKeys.h" rel="custom">xpcom/glue/nsHashKeys.h</a></code>. These classes can be used on the stack, as a class member, or on the heap. Initialize using the <code>Init()</code> function; you can specify whether you need thread-safety at this time. Use the <code>Put()</code>, <code>Get()</code>, and <code>Remove()</code> methods to alter the table.</p>
<p>There are two enumeration functions:</p>
<ul>
 <li><code>EnumerateRead()</code> performs a read-only enumeration, where entries cannot be changed or removed;</li>
 <li><code>Enumerate()</code> performs a read-write enumeration, where entries may be changed or removed as necessary.</li>
</ul>
<h3 id="Using_nsTHashtable_as_a_hash-set" name="Using_nsTHashtable_as_a_hash-set">Using nsTHashtable as a hash-set</h3>
<p>A hash set only tracks the existence of keys: it does not associate data with the keys. This can be done using <code>nsTHashtable&lt;nsSomeHashKey&gt;</code>. The appropriate entries are GetEntry and PutEntry.</p>
<h2 id="Future_Plans" name="Future_Plans">Future Plans</h2>
<h3 id="nsISimpleEnumerator_support" name="nsISimpleEnumerator_support">nsISimpleEnumerator support</h3>
<p>The (obsolete) <code>nsHashtable</code> has a wrapper that exposes an <code>nsISimpleEnumerator</code> on its items. I will add this support to the various <code>nsBaseHashtable</code> classes as well, as needed.</p>
<h2 id="Hash_Functions" name="Hash_Functions">Hash Functions</h2>
<p>All of the above hashtables need a <a class="external" href="http://www.nist.gov/dads/HTML/hash.html">Hash Function</a>. This function converts the key into a semi-unique integer. The mozilla codebase already contains hash functions for most key types, including narrow and wide strings, pointers, and most binary data:</p>
<table class="standard-table">
 <tbody>
  <tr>
   <td><code>void*<br>
    (or nsISupports*)</code></td>
   <td>cast using <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/base/nscore.h#443" rel="custom">NS_PTR_TO_INT32</a></code></td>
  </tr>
  <tr>
   <td><code>char*</code> string</td>
   <td rowspan="2"><code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsCRT.h#228" rel="custom">nsCRT::HashCode()</a></code></td>
  </tr>
  <tr>
   <td><code>PRUnichar*</code> string</td>
  </tr>
  <tr>
   <td><code>nsAString</code></td>
   <td rowspan="2"><code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTHashtable.cpp#41" rel="custom">HashString()</a></code></td>
  </tr>
  <tr>
   <td><code>nsACString</code></td>
  </tr>
  <tr>
   <td><code>nsID&amp;</code></td>
   <td><code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsHashKeys.h#227" rel="custom">nsIDHashKey::HashKey()</a></code></td>
  </tr>
 </tbody>
</table>
<p>Writing a good hash function is well beyond the scope of this document, and has been discussed extensively in computer-science circles for many years. There are many different types of hash functions. Mozilla has tuned a good general-purpose hash algorithm for strings and <code>nsID</code>.</p>
<h2 id="Mozilla.27s_Old.2FObsolete.2FDeprecated.2FDecrepit_Hashtables" name="Mozilla.27s_Old.2FObsolete.2FDeprecated.2FDecrepit_Hashtables">Mozilla's Old/Obsolete/Deprecated/Decrepit Hashtables</h2>
<h3 id="nsHashtable" name="nsHashtable">nsHashtable</h3>
<p><code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsHashtable.h" rel="custom">nsHashtable</a></code> was a C++ wrapper around <code>PLHashTable</code>, and now wraps <code>PLDHash</code>. The design of the key classes is not optimal, however, and <code>nsHashtable</code> has been deprecated in favor of <code>nsDataHashtable</code> and friends.</p>
<h3 id="nsObjectHashtable" name="nsObjectHashtable">nsObjectHashtable</h3>
<p><code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsHashtable.h#163" rel="custom">nsObjectHashtable</a></code> is a form of <code>nsHashtable</code>. It has been replaced by <code>nsClassHashtable</code>.</p>
<h3 id="nsSupportsHashtable" name="nsSupportsHashtable">nsSupportsHashtable</h3>
<p><code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsHashtable.h#193" rel="custom">nsSupportsHashtable</a></code> is a form of <code>nsHashtable</code>. It has been replaced by <code>nsInterfaceHashtable</code>.</p>
<h3 id="nsHashSets" name="nsHashSets">nsHashSets</h3>
<p><code>nsHashSets</code> has predefined hash sets for common keys, which are trivially easy to use. See <code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsHashSets.h" rel="custom">xpcom/ds/nsHashSets.h</a></code>. This functionality has been replaced by <code>nsTHashtable&lt;nsSomeHashKey&gt;</code>.</p>
<h3 id="nsDoubleHashtable" name="nsDoubleHashtable">nsDoubleHashtable</h3>
<p><code><a href="https://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsDoubleHashtable.h" rel="custom">nsDoubleHashtable</a></code> is the (obsolete) precursor to <code>nsTHashtable</code>. It uses macros instead of C++ templates.</p>
<div class="originaldocinfo">
 <h2 id="Original_Document_Information" name="Original_Document_Information">Original Document Information</h2>
 <ul>
  <li>Author(s): Benjamin Smedberg &lt;<a class="link-mailto" href="mailto:benjamin@smedbergs.us" rel="freelink">benjamin@smedbergs.us</a>&gt;</li>
 </ul>
</div>
<p> </p>