files/zh-cn/web/javascript/guide/regular_expressions/character_classes/index.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216

---
title: 字符类
slug: Web/JavaScript/Guide/Regular_Expressions/Character_Classes
tags:
  - 字符类
translation_of: Web/JavaScript/Guide/Regular_Expressions/Character_Classes
---
<p>{{JSSidebar("JavaScript Guide")}}</p>

<p>字符类可以区分各种字符，例如区分字母和数字。</p>

<div>{{EmbedInteractiveExample("pages/js/regexp-character-classes.html")}}</div>

<h2 id="类型">类型</h2>

<div class="hidden">The following table is also duplicated on <a href="/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet">this cheatsheet</a>. Do not forget to edit it as well, thanks!</div>

<table class="standard-table">
 <thead>
  <tr>
   <th scope="col">Characters</th>
   <th scope="col">Meaning</th>
  </tr>
 </thead>
 <tbody>
 </tbody>
 <tbody>
  <tr>
   <td><code>.</code></td>
   <td>
    <p>有下列含义之一:</p>

    <ul>
     <li>匹配除行终止符之外的任何单个字符: <code>\n</code>, <code>\r</code>, <code>\u2028</code> or <code>\u2029</code>. 例如, <code>/.y/</code> 在“yes make my day”中匹配“my”和“ay”，而不是“yes”。</li>
     <li>在字符集内，点失去了它的特殊意义，并与文字点匹配。</li>
    </ul>

    <p>需要注意的是，<code>m</code> multiline标志不会改变点的行为。因此，要跨多行匹配一个模式，可以使用字符集<code>[^]</code>—它将匹配任何字符，包括新行。</p>

    <p>ES2018 添加了 <code>s</code> "dotAll" 标志,它允许点也匹配行终止符。</p>
   </td>
  </tr>
  <tr>
   <td><code>\d</code></td>
   <td>
    <p>匹配任何数字(阿拉伯数字)。 相当于 <code>[0-9]</code>. 例如, <code>/\d/</code> 或 <code>/[0-9]/</code> 匹配 “B2is the suite number”中的“2”。</p>
   </td>
  </tr>
  <tr>
   <td><code>\D</code></td>
   <td>
    <p>匹配任何非数字(阿拉伯数字)的字符。相当于<code>[^0-9]</code>. 例如, <code>/\D/</code> or <code>/[^0-9]/</code> 在 "B2 is the suite number" 中 匹配 "B".</p>
   </td>
  </tr>
  <tr>
   <td><code>\w</code></td>
   <td>
    <p>匹配基本拉丁字母中的任何字母数字字符，包括下划线。相当于 <code>[A-Za-z0-9_]</code>. 例如, <code>/\w/</code> 在 "apple" 匹配 "a" , "5" in "$5.28", "3" in "3D" and "m" in "Émanuel".</p>
   </td>
  </tr>
  <tr>
   <td><code>\W</code></td>
   <td>
    <p>匹配任何不是来自基本拉丁字母的单词字符。相当于 <code>[^A-Za-z0-9_]</code>. 例如, <code>/\W/</code> or <code>/[^A-Za-z0-9_]/</code> 匹配 "%" 在 "50%" 以及 "É" 在 "Émanuel" 中.</p>
   </td>
  </tr>
  <tr>
   <td><code>\s</code></td>
   <td>
    <p>Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to <code>[ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code>. For example, <code>/\s\w*/</code> matches " bar" in "foo bar".</p>
   </td>
  </tr>
  <tr>
   <td><code>\S</code></td>
   <td>
    <p>Matches a single character other than white space. Equivalent to <code>[^ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code>. For example, <code>/\S\w*/</code> matches "foo" in "foo bar".</p>
   </td>
  </tr>
  <tr>
   <td><code>\t</code></td>
   <td>Matches a horizontal tab.</td>
  </tr>
  <tr>
   <td><code>\r</code></td>
   <td>Matches a carriage return.</td>
  </tr>
  <tr>
   <td><code>\n</code></td>
   <td>Matches a linefeed.</td>
  </tr>
  <tr>
   <td><code>\v</code></td>
   <td>Matches a vertical tab.</td>
  </tr>
  <tr>
   <td><code>\f</code></td>
   <td>Matches a form-feed.</td>
  </tr>
  <tr>
   <td><code>[\b]</code></td>
   <td>Matches a backspace. If you're looking for the word-boundary character (<code>\b</code>), see <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Boundaries">Boundaries</a>.</td>
  </tr>
  <tr>
   <td><code>\0</code></td>
   <td>Matches a NUL character. Do not follow this with another digit.</td>
  </tr>
  <tr>
   <td><code>\c<em>X</em></code></td>
   <td>
    <p>Matches a control character using <a href="https://en.wikipedia.org/wiki/Caret_notation">caret notation</a>, where "X" is a letter from A–Z (corresponding to codepoints <code>U+0001</code><em>–</em><code>U+001F</code>). For example, <code>/\cM/</code> matches "\r" in "\r\n".</p>
   </td>
  </tr>
  <tr>
   <td><code>\x<em>hh</em></code></td>
   <td>Matches the character with the code <code><em>hh</em></code> (two hexadecimal digits).</td>
  </tr>
  <tr>
   <td><code>\u<em>hhhh</em></code></td>
   <td>Matches a UTF-16 code-unit with the value <code><em>hhhh</em></code> (four hexadecimal digits).</td>
  </tr>
  <tr>
   <td><code>\u<em>{hhhh} </em>or <em>\u{hhhhh}</em></code></td>
   <td>(Only when the <code>u</code> flag is set.) Matches the character with the Unicode value <code>U+<em>hhhh</em></code> or <code>U+<em>hhhhh</em></code> (hexadecimal digits).</td>
  </tr>
  <tr>
   <td><code>\</code></td>
   <td>
    <p>Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.</p>

    <ul>
     <li>For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally. For example, <code>/b/</code> matches the character "b". By placing a backslash in front of "b", that is by using <code>/\b/</code>, the character becomes special to mean match a word boundary.</li>
     <li>For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally. For example, "*" is a special character that means 0 or more occurrences of the preceding character should be matched; for example, <code>/a*/</code> means match 0 or more "a"s. To match <code>*</code> literally, precede it with a backslash; for example, <code>/a\*/</code> matches "a*".</li>
    </ul>

    <div class="blockIndicator note">
    <p>To match this character literally, escape it with itself. In other words to search for <code>\</code> use <code>/\\/</code>.</p>
    </div>
   </td>
  </tr>
 </tbody>
</table>

<h2 id="Examples">Examples</h2>

<h3 id="Looking_for_a_series_of_digits">Looking for a series of digits</h3>

<pre class="brush: js">var randomData = "015 354 8787 687351 3512 8735";
var regexpFourDigits = /\b\d{4}\b/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// \d{4} indicates a digit, four times
// \b indicates another boundary (i.e. do not end matching in the middle of a word)


console.table(randomData.match(regexpFourDigits));
// ['8787', '3512', '8735']
</pre>

<h3 id="Looking_for_a_word_from_the_latin_alphabet_starting_with_A">Looking for a word (from the latin alphabet) starting with A</h3>

<pre class="brush: js">var aliceExcerpt = "I’m sure I’m not Ada,’ she said, ‘for her hair goes in such long ringlets, and mine doesn’t go in ringlets at all.";
var regexpWordStartingWithA = /\b[aA]\w+/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// [aA] indicates the letter a or A
// \w+ indicates any character *from the latin alphabet*, multiple times

console.table(aliceExcerpt.match(regexpWordStartingWithA));
// ['Ada', 'and', 'at', 'all']
</pre>

<h3 id="Looking_for_a_word_from_Unicode_characters">Looking for a word (from Unicode characters)</h3>

<p>Instead of the Latin alphabet, we can use a range of Unicode characters to identify a word (thus being able to deal with text in other languages like Russian or Arabic). The "Basic Multilingual Plane" of Unicode contains most of the characters used around the world and we can use character classes and ranges to match words written with those characters.</p>

<pre class="brush: js">var nonEnglishText = "Приключения Алисы в Стране чудес";
var regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
// BMP goes through U+0000 to U+FFFF but space is U+0020

console.table(nonEnglishText.match(regexpBMPWord));
[ 'Приключения', 'Алисы', 'в', 'Стране', 'чудес' ]
</pre>

<div class="hidden">
<p>Note for MDN editors: please do not try to add funny examples with emoji as those characters are not handled by the platform (Kuma).</p>
</div>

<h2 id="Specifications">Specifications</h2>

<table class="standard-table">
 <tbody>
  <tr>
   <th scope="col">Specification</th>
  </tr>
  <tr>
   <td>{{SpecName('ESDraft', '#sec-characterclass', 'RegExp: Character classes')}}</td>
  </tr>
 </tbody>
</table>

<h2 id="Browser_compatibility">Browser compatibility</h2>

<p>For browser compatibility information, check out the <a href="/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Browser_compatibility">main Regular Expressions compatibility table</a>.</p>

<h2 id="See_also">See also</h2>

<ul>
 <li><a href="/en-US/docs/Web/JavaScript/Guide/Regular_Expressions">Regular expressions guide</a>

  <ul>
   <li><a href="/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Assertions">Assertions</a></li>
   <li><a href="/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Quantifiers">Quantifiers</a></li>
   <li><a href="/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes">Unicode property escapes</a></li>
   <li><a href="/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges">Groups and ranges</a></li>
  </ul>
 </li>
 <li><a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp">The <code>RegExp()</code> constructor</a></li>
</ul>