--- title: 正規表達式 slug: Web/JavaScript/Guide/Regular_Expressions tags: - Guide - JavaScript - RegExp - 正規表達式 translation_of: Web/JavaScript/Guide/Regular_Expressions ---
{{jsSidebar("JavaScript Guide")}} {{PreviousNext("Web/JavaScript/Guide/Text_formatting", "Web/JavaScript/Guide/Indexed_collections")}}
正規表達式是被用來匹配字串中字元組合的模式。在 JavaScript 中,正規表達式也是物件,這些模式在 {{jsxref("RegExp")}} 的 {{jsxref("RegExp.exec", "exec")}} 和 {{jsxref("RegExp.test", "test")}} 方法中,以及 {{jsxref("String")}} 的 {{jsxref("String.match", "match")}}、{{jsxref("String.replace", "replace")}}、{{jsxref("String.search", "search")}}、{{jsxref("String.split", "split")}} 等方法中被運用。這一章節將解說 JavaScript 中的正規表達式。
您可透過下列兩種方法去創建一條正規表達式:
使用正規表達式字面值(regular expression literal),包含兩個 /
字元之間的模式如下:
var re = /ab+c/;
正規表達式字面值在 script 載入時會被編譯,當正規表達式為定值時,使用此方法可獲得較佳效能。
或呼叫 {{jsxref("RegExp")}} 物件的建構函式,如下:
var re = new RegExp('ab+c');
使用建構子函數供即時編譯正則表達式,當模式會異動、事先未知匹配模式、或者您將從其他地方取得時,使用建構子函數將較為合適。
正規表達模式由數個簡易字元組成,例如 /abc/
,或是由簡易字元及特殊符號組合而成,例如 /ab*c/
、/Chapter (\d+)\.\d*/ )
。最後一個範例用到了括號,這在正規表達式中用作記憶組,使用括號的匹配將會被留到後面使用,在 {{ web.link("#Using_Parenthesized_Substring_Matches", "使用帶括號的配對子字串 Using Parenthesized Substring Matches") }} 有更多解釋。
簡易的模式是有你找到的直接匹配所構成的。比如:/abc/
這個模式就匹配了在一個字符串中,僅僅字符 'abc'
同時出現並按照這個順序。這兩個句子中「Hi, do you know your abc's?」和「The latest airplane designs evolved from slabcraft.」就會匹配成功。在上面的兩個實例中,匹配的是子字符串 'abc'。在字符串中的 "Grab crab"('ab c') 中將不會被匹配,因為它不包含任何的 'abc' 字符串。
當你需要搜尋一個比直接匹配需要更多條件的匹配,比如搜尋一或多個 'b',或者搜尋空格,那麼這個模式將要包含特殊字符。例如: 模式 /ab*c/
匹配了一個單獨的 'a' 後面跟了零或多個 'b'(* 的意思是前面一項出現了零或多個),且後面跟著 'c' 的任何字符組合。在字符串 "cbbabbbbcdebc" 中,這個模式匹配了子字符串 "abbbbc"。
下面的表格列出了在正則表達式中可以利用的特殊字符完整列表以及描述。
字元 | 解說 |
---|---|
\ |
反斜線放在非特殊符號前面,使非特殊符號不會被逐字譯出,代表特殊作用。 |
^ |
匹配輸入的開頭,如果 multiline flag 被設為 true,則會匹配換行字元後。 例如: 「 |
$ |
匹配輸入的結尾,如果 multiline flag 被設為 true,則會匹配換行字元。 例如: |
* |
匹配前一字元 0 至多次。 例如: |
+ |
匹配前一字元 1 至多次,等同於 例如: |
? |
匹配前一字元 0 至 1 次,等同於 例如: 如果是使用在
|
. |
(小數點)匹配除了換行符號之外的單一字元。 例如:/.n/ 匹配「nay, an apple is on the tree」中的 an 和 on,但在「nay」中沒有匹配。 |
(x) |
Capturing Parentheses 匹配 'x' 並記住此次的匹配,如下面的範例所示。 在 正則表達示 /(foo) (bar) \1 \2/ 中的 (foo) 與 (bar) 可匹配了 "foo bar foo bar" 這段文字中的前兩個字,而 \1 與 \2 則匹配了後面的兩個字。注意, \1, \2, ..., \n 代表的就是前面的pattern,以本範例來說,/(foo) (bar) \1 \2/ 等同於 /(foo) (bar) (foo) (bar)/。 用於取代(replace)的話,則是用 $1, $2,...,$n。如 'bar boo'.replace(/(...) (...)/, '$2 $1'). |
(?:x) |
Non-Capturing Parentheses 找出 'x',這動作不會記憶 有無 有無 更多資訊詳見 Using parentheses 。 |
x(?=y) |
符合'x',且後接的是'y'。'y'為'x'存在的意義。 |
x(?!y) |
符合'x',且後接的不是'y'。'y'為否定'x'存在的意義,後面不行前功盡棄(negated lookahead)。 |
x|y |
符合「x」或「y」。 舉例來說, |
{n} |
規定符號確切發生的次數,n為正整數 例如: |
{n,m} |
搜尋條件:n為至少、m為至多,其n、m皆為正整數。若把m設定為0,則為Invalid regular expression。 例如: |
[xyz] |
字元的集合。此格式會匹配中括號內所有字元, including escape sequences。特殊字元,例如點(. ) 和米字號(* ),在字元集合中不具特殊意義,所以不需轉換。若要設一個字元範圍的集合,可以使用橫線 "-" ,如下例所示:[a-d] 等同於 [abcd]。 會匹配 "brisket" 的 "b" 、"city" 的 'c' ……等。 而/[a-z.]+/ 和 /[\w.]+/ 均可匹配字串 "test.i.ng" 。 |
[^xyz] |
bracket中寫入的字元將被否定,匹配非出現在bracket中的符號。
|
[\b] |
吻合倒退字元 (U+0008). (不會跟 \b 混淆) |
\b |
吻合文字邊界。A word boundary matches the position where a word character is not followed or preceded by another word-character. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero. (Not to be confused with Examples: Note: JavaScript's regular expression engine defines a specific set of characters to be "word" characters. Any character not in that set is considered a word break. This set of characters is fairly limited: it consists solely of the Roman alphabet in both upper- and lower-case, decimal digits, and the underscore character. Accented characters, such as "é" or "ü" are, unfortunately, treated as word breaks. |
\B |
吻合非文字邊界。This matches a position where the previous and next character are of the same type: Either both must be words, or both must be non-words. The beginning and end of a string are considered non-words. For example, |
\cX |
Where X is a character ranging from A to Z. Matches a control character in a string. For example, |
\d |
吻合數字,寫法等同於 例如: |
\D |
吻合非數字,寫法等同於 例如: |
\f |
Matches a form feed (U+000C). |
\n |
Matches a line feed (U+000A). |
\r |
Matches a carriage return (U+000D). |
\s |
Matches a single white space character, including space, tab, form feed, line feed. Equivalent to For example, |
\S |
Matches a single character other than white space. Equivalent to For example, |
\t |
Matches a tab (U+0009). |
\v |
Matches a vertical tab (U+000B). |
\w |
包含數字字母與底線,等同於 例如: For example, |
\W |
Matches any non-word character. Equivalent to For example, |
\n |
Where n is a positive integer, a back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, |
\0 |
Matches a NULL (U+0000) character. Do not follow this with another digit, because \0<digits> is an octal escape sequence. Instead use \x00 . |
\xhh |
Matches the character with the code hh (two hexadecimal digits) |
\uhhhh |
Matches the character with the code hhhh (four hexadecimal digits). |
Escaping user input that is to be treated as a literal string within a regular expression—that would otherwise be mistaken for a special character—can be accomplished by simple replacement:
function escapeRegExp(string) { return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string }
The g after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches. It is explained in detail below in Advanced Searching With Flags.
Parentheses around any part of the regular expression pattern causes that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use, as described in {{ web.link("#Using_parenthesized_substring_matches", "Using Parenthesized Substring Matches") }}.
For example, the pattern /Chapter (\d+)\.\d*/
illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter ' followed by one or more numeric characters (\d
means any numeric character and +
means 1 or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character 0 or more times (\d
means numeric character, *
means 0 or more times). In addition, parentheses are used to remember the first matched numeric characters.
This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapter 3 and 4", because that string does not have a period after the '3'.
To match a substring without causing the matched part to be remembered, within the parentheses preface the pattern with ?:
. For example, (?:\d+)
matches one or more numeric characters but does not remember the matched characters.
Regular expressions are used with the RegExp
methods test
and exec
and with the String
methods match
, replace
, search
, and split
. These methods are explained in detail in the JavaScript reference.
Method | Description |
---|---|
{{jsxref("RegExp.exec", "exec")}} | A RegExp method that executes a search for a match in a string. It returns an array of information or null on a mismatch. |
{{jsxref("RegExp.test", "test")}} | A RegExp method that tests for a match in a string. It returns true or false. |
{{jsxref("String.match", "match")}} | A String method that executes a search for a match in a string. It returns an array of information or null on a mismatch. |
{{jsxref("String.search", "search")}} | A String method that tests for a match in a string. It returns the index of the match, or -1 if the search fails. |
{{jsxref("String.replace", "replace")}} | A String method that executes a search for a match in a string, and replaces the matched substring with a replacement substring. |
{{jsxref("String.split", "split")}} | A String method that uses a regular expression or a fixed string to break a string into an array of substrings. |
When you want to know whether a pattern is found in a string, use the test
or search
method; for more information (but slower execution) use the exec
or match
methods. If you use exec
or match
and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp
. If the match fails, the exec
method returns null
(which coerces to false
).
In the following example, the script uses the exec
method to find a match in a string.
var myRe = /d(b+)d/g; var myArray = myRe.exec('cdbbdbsbz');
If you do not need to access the properties of the regular expression, an alternative way of creating myArray
is with this script:
var myArray = /d(b+)d/g.exec('cdbbdbsbz'); // similar to "cdbbdbsbz".match(/d(b+)d/g); however, // the latter outputs Array [ "dbbd" ], while // /d(b+)d/g.exec('cdbbdbsbz') outputs Array [ "dbbd", "bb" ]. // See below for further info (CTRL+F "The behavior associated with the".)
If you want to construct the regular expression from a string, yet another alternative is this script:
var myRe = new RegExp('d(b+)d', 'g'); var myArray = myRe.exec('cdbbdbsbz');
With these scripts, the match succeeds and returns the array and updates the properties shown in the following table.
物件 | Property or index | 說明 | 範例 |
---|---|---|---|
myArray |
The matched string and all remembered substrings. | ['dbbd', 'bb', index: 1, input: 'cdbbdbsbz'] |
|
index |
The 0-based index of the match in the input string. | 1 |
|
input |
The original string. | "cdbbdbsbz" |
|
[0] |
The last matched characters. | "dbbd" |
|
myRe |
lastIndex |
The index at which to start the next match. (This property is set only if the regular expression uses the g option, described in {{ web.link("#Advanced_searching_with_flags", "Advanced Searching With Flags") }}.) | 5 |
source |
The text of the pattern. Updated at the time that the regular expression is created, not executed. | "d(b+)d" |
As shown in the second form of this example, you can use a regular expression created with an object initializer without assigning it to a variable. If you do, however, every occurrence is a new regular expression. For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:
var myRe = /d(b+)d/g; var myArray = myRe.exec('cdbbdbsbz'); console.log('The value of lastIndex is ' + myRe.lastIndex); // "The value of lastIndex is 5"
However, if you have this script:
var myArray = /d(b+)d/g.exec('cdbbdbsbz'); console.log('The value of lastIndex is ' + /d(b+)d/g.lastIndex); // "The value of lastIndex is 0"
The occurrences of /d(b+)d/g
in the two statements are different regular expression objects and hence have different values for their lastIndex
property. If you need to access the properties of a regular expression created with an object initializer, you should first assign it to a variable.
Including parentheses in a regular expression pattern causes the corresponding submatch to be remembered. For example, /a(b)c/
matches the characters 'abc' and remembers 'b'. To recall these parenthesized substring matches, use the Array
elements [1]
, ..., [n]
.
The number of possible parenthesized substrings is unlimited. The returned array holds all that were found. The following examples illustrate how to use parenthesized substring matches.
下面這個 script 以 {{jsxref("String.replace", "replace()")}} 方法移轉字串位置。對於要被置換的文字內容,以 $1
和 $2
來代表先前 re 這個變數裡面,找出來綑綁且照順序來表示兩個子字串。
var re = /(\w+)\s(\w+)/; var str = 'John Smith'; var newstr = str.replace(re, '$2, $1'); console.log(newstr); // "Smith, John"
Regular expressions have five optional flags that allow for global and case insensitive searching. These flags can be used separately or together in any order, and are included as part of the regular expression.
Flag | Description |
---|---|
g |
Global search. |
i | Case-insensitive search. |
m | Multi-line search. |
u | unicode; treat a pattern as a sequence of unicode code points |
y | Perform a "sticky" search that matches starting at the current position in the target string. See {{jsxref("RegExp.sticky", "sticky")}} |
To include a flag with the regular expression, use this syntax:
var re = /pattern/flags;
or
var re = new RegExp('pattern', 'flags');
Note that the flags are an integral part of a regular expression. They cannot be added or removed later.
For example, re = /\w+\s/g
creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.
var re = /\w+\s/g; var str = 'fee fi fo fum'; var myArray = str.match(re); console.log(myArray); // ["fee ", "fi ", "fo "]
You could replace the line:
var re = /\w+\s/g;
with:
var re = new RegExp('\\w+\\s', 'g');
and get the same result.
The behavior associated with the 'g
' flag is different when the .exec()
method is used. (The roles of "class" and "argument" get reversed: In the case of .match()
, the string class (or data type) owns the method and the regular expression is just an argument, while in the case of .exec()
, it is the regular expression that owns the method, with the string being the argument. Contrast str.match(re)
versus re.exec(str)
.) The 'g
' flag is used with the .exec()
method to get iterative progression.
var xArray; while(xArray = re.exec(str)) console.log(xArray); // produces: // ["fee ", index: 0, input: "fee fi fo fum"] // ["fi ", index: 4, input: "fee fi fo fum"] // ["fo ", index: 7, input: "fee fi fo fum"]
The m
flag is used to specify that a multiline input string should be treated as multiple lines. If the m
flag is used, ^
and $
match at the start or end of any line within the input string instead of the start or end of the entire string.
The following examples show some uses of regular expressions.
The following example illustrates the formation of regular expressions and the use of string.split()
and string.replace()
. It cleans a roughly formatted input string containing names (first name last) separated by blanks, tabs and exactly one semicolon. Finally, it reverses the name order (last name first) and sorts the list.
// The name string contains multiple spaces and tabs, // and may have multiple spaces between first and last names. var names = 'Orange Trump ;Fred Barney; Helen Rigby ; Bill Abel ; Chris Hand '; var output = ['---------- Original String\n', names + '\n']; // Prepare two regular expression patterns and array storage. // Split the string into array elements. // pattern: possible white space then semicolon then possible white space var pattern = /\s*;\s*/; // Break the string into pieces separated by the pattern above and // store the pieces in an array called nameList var nameList = names.split(pattern); // new pattern: one or more characters then spaces then characters. // Use parentheses to "memorize" portions of the pattern. // The memorized portions are referred to later. pattern = /(\w+)\s+(\w+)/; // Below is the new array for holding names being processed. var bySurnameList = []; // Display the name array and populate the new array // with comma-separated names, last first. // // The replace method removes anything matching the pattern // and replaces it with the memorized string—the second memorized portion // followed by a comma, a space and the first memorized portion. // // The variables $1 and $2 refer to the portions // memorized while matching the pattern. output.push('---------- After Split by Regular Expression'); var i, len; for (i = 0, len = nameList.length; i < len; i++) { output.push(nameList[i]); bySurnameList[i] = nameList[i].replace(pattern, '$2, $1'); } // Display the new array. output.push('---------- Names Reversed'); for (i = 0, len = bySurnameList.length; i < len; i++) { output.push(bySurnameList[i]); } // Sort by last name, then display the sorted array. bySurnameList.sort(); output.push('---------- Sorted'); for (i = 0, len = bySurnameList.length; i < len; i++) { output.push(bySurnameList[i]); } output.push('---------- End'); console.log(output.join('\n'));
In the following example, the user is expected to enter a phone number. When the user presses the "Check" button, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script shows a message thanking the user and confirming the number. If the number is invalid, the script informs the user that the phone number is not valid.
Within non-capturing parentheses (?:
, the regular expression looks for three numeric characters \d{3}
OR |
a left parenthesis \(
followed by three digits \d{3}
, followed by a close parenthesis \)
, (end non-capturing parenthesis )
), followed by one dash, forward slash, or decimal point and when found, remember the character ([-\/\.])
, followed by three digits \d{3}
, followed by the remembered match of a dash, forward slash, or decimal point \1
, followed by four digits \d{4}
.
The Change
event activated when the user presses Enter sets the value of RegExp.input
.
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <meta http-equiv="Content-Script-Type" content="text/javascript"> <script type="text/javascript"> var re = /(?:\d{3}|\(\d{3}\))([-\/\.])\d{3}\1\d{4}/; function testInfo(phoneInput) { var OK = re.exec(phoneInput.value); if (!OK) window.alert(phoneInput.value + ' isn\'t a phone number with area code!'); else window.alert('Thanks, your phone number is ' + OK[0]); } </script> </head> <body> <p>Enter your phone number (with area code) and then click "Check". <br>The expected format is like ###-###-####.</p> <form action="#"> <input id="phone"><button onclick="testInfo(document.getElementById('phone'));">Check</button> </form> </body> </html>
{{PreviousNext("Web/JavaScript/Guide/Text_formatting", "Web/JavaScript/Guide/Indexed_collections")}}