The regexarray( function applies a regular expression to a text value, then builds an array containing all of the substrings that match the regular expression (see Regular Expressions).
Parameters
This function has two required parameters:text – The text you want to match.
pattern – The regular expression.
…separator – The separator character to be placed between array elements. This parameter is optional, if it is missing the array elements will be separated by carriage returns.
…subseparator – This parameter is optional. If it is included, the output array is two dimensional. Each row (separated by the SEPARATOR character) corresponds to one match. Each column corresponds to a capture group – the first column is the entire match, while colums 2, 3, 4, etc. contain data from the regular expression capture groups (if any). The captured items are separated by the SUBSEPARATOR character.
…options – This parameter is optional. If it is included, it specifies the regular expression options. The parameter should be a number obtained by adding up the options you want from the table in the description section below.
Description
This function scans text with a regular expression, looking for matches (see Regular Expressions). If any matches are found, they are placed into the output array generated by this function, one match per array element. If you don’t specify an array separator, carriage returns will be used. This example extracts all of the numbers from a string of text.
regexarray("234 plus 189 is 423","[0-9]+") ☞ 234
189
423
Here’s the same regular expression but with a separator specified, in this case comma space.
regexarray("234 plus 189 is 423","[0-9]+",", ") ☞ 234, 189, 423
Here is an example that matches e-mail addresses.
regexarray("Reply to support@acme.com or help@everyone.com",
"([\w!#$%&'*+/=?`{|}~^.-]+)@([A-Z0-9.-]+\.[A-Z]{2,3})",", ")
☞ support@acme.com, help@everyone.com
Notice that this regular expression has matched e-mail addresses in lower case, even though the regular expression specifies upper case letters. The regexarray( function is case insensitive so it ignores differences between upper and lower case. If you want the regular expression to be case sensitive use the regexarrayexact( function, or use the options parameter (see below).
If your regular expression includes capture groups you can include them in the output as well by adding a second separator character, in this case a pipe (|) symbol.
regexarray("Reply to support@acme.com or help@everyone.com",
"([\w!#$%&'*+/=?`{|}~^.-]+)@([A-Z0-9.-]+\.[A-Z]{2,3})",", ","|")
☞ support@acme.com|support|acme.com, help@everyone.com|help|everyone.com
This regular expression contains two capture groups – one for the user name, and one for the domain name. By adding the pipe delimiter this captured data is now included in the output array. As you can see the first user name is support, while the first domain name is acme.com. If you wanted an array of just the domain names you could use the arraycolumn( function to extract them.
arraycolumn(regexarray(...),3,", ","|")
The optional fifth parameter, options, allows you to customize the overall operation of the regular expression. To calculate the value for this parameter, add up the options you want from the table below. For example, if you want the regular expression to be case insensitive and want it to ignore whitespace and comments, this parameter should be 3 (1+2). Note: If you specify an option value this overrides the normal case sensitive/insensitive operation of both regexarray( and regexarrayexact(. If your option value specifies case insensitive, the regex will be case insensitive whether you are using the regexarray( or regexarrayexact( functions.
0 | No options |
1 | Case Insensitive - match letters in the pattern independent of case. |
2 | Ignore whitespace and #-prefixed comments in the pattern. |
4 | Treat the entire pattern as a literal string. |
8 | Allow . to match any character, including line separators. |
16 | Allow ^ and $ to match the start and end of lines. |
32 | Treat only \n as a line separator (otherwise, all standard line separators are used). |
64 | Use Unicode TR#29 to specify word boundaries (otherwise, traditional regular expression word boundaries are used). |
0x01000000 | Output positions of each match instead of match content, positions are output in text funnel format, i.e. start,end with positions numbered from 1. |
See Also
History
Version | Status | Notes |
10.0 | New | Regular expression support is new in this version |