JavaScript Currency Conversion to Thousands with Regex (Detailed Explanation of Non-Capturing Group Matches)

Published June 7, 2017 at 12:51 PM

If you have a string of numbers and need to convert it into a currency format with thousands separators, how would you do it? For example: 123123123 -> 123,123,123

1. The Origin of an Interesting Regular Expression

This has actually been an old problem, but for some reason, it has been particularly prominent recently, so I decided to discuss it here.

First, let's look at a traditional approach: add a comma after every three digits from the right side. Thus, the following method was created:

function money(num){
    // First convert the number to a string, then into an array, reverse it, and then combine it back into a string
    var reverseStr = num.toString().split('')
        .reverse().join('');
    // Use regex to replace and add a comma every 3 digits
    reverseStr = reverseStr.replace(/(\d{3})/g,'$1,');
    // Handle cases of exactly three digits, like 123 -> ,123
    reverseStr = reverseStr.replace(/\,$/, '');
    // Reverse back to normal order
    reverseStr = reverseStr.split('').reverse().join('');
    return reverseStr;
}

Although this method can meet our needs, it feels somewhat low-tech and is not the focus of today's discussion.

Today, we will attempt to solve this problem using a short regex. Here’s the code:

function money(num){
    return (''+num).replace(/(\d)(?=(\d{3})+(?!\d))/g, '$1,');
}

A very simple regex /(\d)(?=(\d{3})+(?!\d))/ does all the work. Although the regex is short, it is not simple. The purpose today is actually to analyze the contents of this regex together.

We will first discuss three concepts involved here:

The lastIndex of regex matching
Zero-width positive lookahead assertion (?=exp) in regex
Zero-width negative lookahead assertion (!=exp) in regex

1.1 lastIndex

lastIndex: Refers to the position of the last match result, which is also the starting position for the next match. By default, it is 0 (it should be emphasized that this is only valid when the matching mode is g or y; otherwise, after each match is completed, lastIndex will become 0). The reason for emphasizing lastIndex is that subsequent assertions (also called non-capturing) will affect its value. Let’s look at an example:

let re = /\d\d/g;
let str = '0123456789';
console.log(re.lastIndex);
// 0 
// By default, the value of lastIndex is 0
console.log(re.exec(str));
// ["01",...] Matched 01
console.log(re.lastIndex);
// 2 
// Since the first match result was 01, the next match should start after 01, so lastIndex is now 2

Similarly, if you manually modify the value of lastIndex, the match result will also be affected.

let re = /\d\d/g;
let str = '0123456789';
re.lastIndex = 5;
// Manually modify lastIndex to 5, the next match starts from the fifth character
console.log(re.exec(str));
// ["56",...]
console.log(re.lastIndex);
// 7 Automatically updated lastIndex to after the match result

1.2 (?=exp) Zero-width positive lookahead assertion and (?!exp) Zero-width negative lookahead assertion

The first time hearing this name feels like not knowing Chinese, it’s quite confusing, with thousands of llamas running through my mind.

By definition, it states: It asserts that the position at which it appears can match expression exp after it. [I was completely baffled] the first time I didn’t understand it.

Let’s use a simple example to help everyone understand: suppose someone tells you Please find a person on a bicycle, and bring this person over (the bicycle is not needed), how would you do it? This sentence can be abstracted as using Person riding a bike to match regex /人(?=骑车)/, where the result is the person without needing the bike.

It is exactly because the expression within the parentheses has no effect on the result that they belong to non-capturing group matching.

Here’s an example:

var re = /ap(?=ple)/g;
console.log(re.exec('I like apple not app!'));
// ["ap", index: 7, input: "I like apple not app!"]
console.log(re.lastIndex)
// 9
console.log(re.exec('I like apple not app!'));
// null
console.log(re.lastIndex)
// 0

In the above example, the first .exec finds the "ap" before "ple" in the sentence, so the 7th to 11th characters apple meet our criteria. However, since (?=ple) is non-capturing, "ple" does not count toward the result, and naturally, the three characters of "ple" do not affect lastIndex, so the value of lastIndex is 7 + 2 = 9, not 7 + 5 = 11.

The non-capturing concept can easily lead to misunderstanding, so let me emphasize again:

!!Non-capturing does not affect the value of lastIndex!!

If you understand, please think about what the results of the following two questions are:

Question 1

var re = /ap(?=ple)pie/g;
console.log(re.test('applepie'))

Question 2

var re = /ap(?=ple)plepie/g;
console.log(re.test('applepie'))

Don’t cheat!

The answers are:

Question 1: false

Question 2: true

If you didn’t guess correctly, let’s look at why:

In fact, we can break down /ap(?=ple)pie/ into two parts: ap(?=ple) + pie, while matching the string applepie, we go through the following steps:

The initial value of lastIndex is 0, ap(?=ple) matches the ap of apple, so the result is ap, lastIndex=2.
After the first step, lastIndex=2, the following expression is pie, indicating that it should immediately follow with pie, but actually the character after the second character is ple, so the condition is not satisfied, thus returning false.
Likewise, the second regex /ap(?=ple)plepie/ indicates that from the second character, what immediately follows should be plepie, which completely matches the provided string, so the result is true.

As for (?!exp) zero-width negative lookahead assertion, there’s no need to elaborate; it states that something is not followed by exp, and it is also non-capturing (for example, .+?(?!xyz) when matching uvw and uvwxyz will only match the first).

2. Matching Steps for /(\\d)(?=(\\d{3})+(?!\\d))/

If you are still confused at this point, let’s take a detailed example to illustrate how the above regex works.

We will take the example '1234567.88'.replace(/(\d)(?=(\d{3})+(?!\d))/g, '$1,').

First, let’s explain what (?=(\d{3})+(?!\d)) means. It states that after a certain matching rule, there is one or more groups of digits +, each consisting of three digits \d{3} (for example, 123456, 123, 123456789 are all strings composed of digit groups) and that the groups of digits are not followed by a number (?!\d) (this is to find the end; we consider it the end as long as it is not followed by a number).

Finally, let’s use a chart to explain this process:

index	(\d) which is the value of $1	(?=(\d{3})+(?!\d)) matching result	lastIndex	resulting string
0	'1'	(234)(567)	1	'1,234567'
1	'2'	-- (24567 cannot be divided into two)	2	'1,234567'
2	'3'	-- (4567 cannot be divided into two)	3	'1,234567'
3	'4'	(567)	4	'1,234,567'
...	...	...	...	...
9	'8'	--	10	'1,234,567'

Alright, that's all for now. If you have any questions, feel free to leave a comment below.

This post is just my perspective—your input will make it richer!

Mofei's Friend (Click to edit)

Hey, say something!