regexarray( function

regexarray(

TEXT

PATTERN

...SEPARATOR

...SUBSEPARATOR

...OPTIONS

)

The regexarray( function applies a regular expression to a text value, then builds an array containing all of the substrings that match the regular expression (see Regular Expressions).

Parameters

This function has two required parameters:

text – The text you want to match.

pattern – The regular expression.

…separator – The separator character to be placed between array elements. This parameter is optional, if it is missing the array elements will be separated by carriage returns.

…subseparator – This parameter is optional. If it is included, the output array is two dimensional. Each row (separated by the SEPARATOR character) corresponds to one match. Each column corresponds to a capture group – the first column is the entire match, while colums 2, 3, 4, etc. contain data from the regular expression capture groups (if any). The captured items are separated by the SUBSEPARATOR character.

…options – This parameter is optional. If it is included, it specifies the regular expression options. The parameter should be a number obtained by adding up the options you want from the table in the description section below.

Description

This function scans text with a regular expression, looking for matches (see Regular Expressions). If any matches are found, they are placed into the output array generated by this function, one match per array element. If you don’t specify an array separator, carriage returns will be used. This example extracts all of the numbers from a string of text.

regexarray("234 plus 189 is 423","[0-9]+") ☞ 234
189
423

Here’s the same regular expression but with a separator specified, in this case comma space.

regexarray("234 plus 189 is 423","[0-9]+",", ") ☞ 234, 189, 423

Here is an example that matches e-mail addresses.

regexarray("Reply to support@acme.com or help@everyone.com",
"([\w!#$%&'*+/=?`{|}~^.-]+)@([A-Z0-9.-]+\.[A-Z]{2,3})",", ")
    ☞ support@acme.com, help@everyone.com

Notice that this regular expression has matched e-mail addresses in lower case, even though the regular expression specifies upper case letters. The regexarray( function is case insensitive so it ignores differences between upper and lower case. If you want the regular expression to be case sensitive use the regexarrayexact( function, or use the options parameter (see below).

If your regular expression includes capture groups you can include them in the output as well by adding a second separator character, in this case a pipe (|) symbol.

regexarray("Reply to support@acme.com or help@everyone.com",
"([\w!#$%&'*+/=?`{|}~^.-]+)@([A-Z0-9.-]+\.[A-Z]{2,3})",", ","|")
    ☞ support@acme.com|support|acme.com, help@everyone.com|help|everyone.com

This regular expression contains two capture groups – one for the user name, and one for the domain name. By adding the pipe delimiter this captured data is now included in the output array. As you can see the first user name is support, while the first domain name is acme.com. If you wanted an array of just the domain names you could use the arraycolumn( function to extract them.

arraycolumn(regexarray(...),3,", ","|")

The optional fifth parameter, options, allows you to customize the overall operation of the regular expression. To calculate the value for this parameter, add up the options you want from the table below. For example, if you want the regular expression to be case insensitive and want it to ignore whitespace and comments, this parameter should be 3 (1+2). Note: If you specify an option value this overrides the normal case sensitive/insensitive operation of both regexarray( and regexarrayexact(. If your option value specifies case insensitive, the regex will be case insensitive whether you are using the regexarray( or regexarrayexact( functions.

0	No options
1	Case Insensitive - match letters in the pattern independent of case.
2	Ignore whitespace and #-prefixed comments in the pattern.
4	Treat the entire pattern as a literal string.
8	Allow . to match any character, including line separators.
16	Allow ^ and $ to match the start and end of lines.
32	Treat only \n as a line separator (otherwise, all standard line separators are used).
64	Use Unicode TR#29 to specify word boundaries (otherwise, traditional regular expression word boundaries are used).
0x01000000	Output positions of each match instead of match content, positions are output in text funnel format, i.e. start,end with positions numbered from 1.

See Also

arraycolumn( -- extracts a column from a two dimensional array.
regexarrayexact( -- applies a regular expression to a text value, then builds an array containing all of the substrings that match the regular expression (see Regular Expressions).
regexliteral( -- adds \ characters to text as necessary so that it can be used as a literal in a regular expression.
regexmatch -- checks to see if the text on the left matches the regular expression on the right (see Regular Expressions).
regexmatchexact -- checks to see if the text on the left matches the regular expression on the right (see Regular Expressions).
regexreplace( -- replaces text with new text. The text to be replaced is determined by a regular expression.
regexreplaceexact( -- replaces text with new text. The text to be replaced is determined by a regular expression.
regexreplacefirst( -- replaces the first occurrence of a regular expression pattern with new text.
regexreplacefirstexact( -- replaces the first occurrence of a regular expression pattern with new text.
Regular Expressions -- Regular expressions can be used to look for a specified pattern in text and then specify what to do with each pattern match that is found

History

Version	Status	Notes
10.0	New	Regular expression support is new in this version