Earlier today a friend of mine, Marc Grabanski, pinged me with a question: What’s the optimal way, in JavaScript, to convert a query string like “foo=1&foo=2&foo=3&blah=a&blah=b” into one that looks like this: “foo=1,2,3&blah=a,b”. He had already come up with a solution of his own and was curious as to if it could be improved upon.
I pondered this for a moment and came up with a solution:
var q = {}, ret = “”;
data.replace(/([^=&]+)=([^&]*)/g, function(m, key, value){
q[key] = (q[key] ? q[key] + “,” : “”) + value;
});
for ( var key in q )
ret = (ret ? ret + “&” : “”) + key + “=” + q[key];
return ret;
}
Besides being 10 lines shorter than Mark’s solution it also didn’t require the use of any libraries (his required the use of jQuery). He was especially surprised at my result – and the replace technique that I used, in particular. I want to outline two quick tricks that I used that some may not be familiar with.
Array Joining Without an Array
One step to solving the above problem is to collect together the various values associated with a key – joining them with a ‘,’. At first glance the obvious solution might be to construct an array for each key, push each of the values on, and then .join() the array results into a string. However all of this is both costly (the overhead of creating all these extra arrays) and verbose.
The alternative is one that I use twice in the program:
The key part is that we’re concatenating the value
onto the q[key]
string adding in an extra “,” if there’s nothing in the string already: q[key] ? q[key] + "," : ""
. If no existing value is in the key we seed it with an empty string, otherwise we’re simply merging on to the already-existing value along with the needed “,”.
The end result is that we’re only ever dealing with strings and string operations (instead of arrays) but achieving the result of joining a set of values without using an array.
Search and Replace Without Replacing
The second, and arguably more interesting, aspect is in using the JavaScript string replace function as a means of traversing a string for values, rather than as an actual search-and-replace mechanism. The trick is two-fold: Passing in a function as the replace value argument to the .replace()
method and, instead of returning a value, simply utilizing it as a means of searching.
Let’s examine this piece of code:
The regular expression, itself, captures two things: A key in the query string and its associated value. This match is performed globally, locating all the key-value pairs within the query string.
The second argument to the replace method is a function. It’s not uncommon to utilize this function-as-an-argument technique when attempting to replace matches with complex values (that is values that are dependent upon their associated matches). The function is called every time a new match occurs and it receives a variable number of arguments. The first argument is always the entire matched portion of the string and all remaining arguments are each of the (...)
capturing blocks, in order. The return value of the function is injected back into the string as its replacement. In this example we don’t return a value from the function therefore we end up injecting the serialized “undefined” value, repeatedly, back into the string.
We can see this behavior here:
// => "undefined b c"
Now that we’ve collected all of our key values pairs (to be re-serialized back into a query string) the final step isn’t really a step at all: We simply don’t save the search-and-replaced data
query string (which, most likely, looks something like "undefined&undefined&undefined"
).
In this manner we can use a string’s replace method as our very-own string searching mechanism. The result is, not only, fast but also simple and effective. Another excellent tool in your JavaScript toolbox.
This topic will be discussed, in depth, in my work-in-progress book: Secrets of the JavaScript Ninja. To be released Fall 2008.
Tuzemec (March 14, 2008 at 4:07 am)
So sleek and brilliant :-)
The book looks more and more interesting.
BTW… I really would like to see your solution for converting a json object into a string. ;-)
serkanyersen (March 14, 2008 at 4:19 am)
Nice example of using replace with function. I really love the way how javascript works.
Andrea Giammarchi (March 14, 2008 at 4:35 am)
John, good stuff as usual.
I am not sure about this conversion purpose, but I think the uri example is wrong.
What I mean is that if you have a query string like this one:
key=value1&key=value2
the server usually recieve a single key, called key, with last used value, value2.
To send arrays via querystring you have to use this notation:
key[]=value1&key[]=value2
in this way the server will receive a parameter, with key as key, and an array that will contain value1, value2
I am sure you know all these things, but in this case, the regexp and the code should be a bit different.
function dataToQueryString(data){
var q = {}, ret = [], key;
data.replace(/([^=&]+?)(\[\])?=([^&]*)/g, function(match, key, isArray, value){
isArray ? (q[key] || (q[key] = [])).push(value) : q[key] = [value];
});
for(key in q)
ret.push(key + "=" + q[key]);
return ret.join("&");
}
alert(dataToQueryString("foo[]=1&foo[]=2&foo[]=3&blah[]=a&blah[]=b&no=array"));
This is just to have a regular query string instead of ambiguous one, but at this point, I am not so sure about the real application of this conversion (reduce query string size? … ok, that’s one :-))
Kind Regards
DAH (March 14, 2008 at 5:50 am)
Andrea: You are wrong. I think you have been confused by PHP.
michele (March 14, 2008 at 6:09 am)
DAH is right, it’s only a PHP “feature”. The server receives the entire url and can manipulates the query string as it wants to.
Andrea Giammarchi (March 14, 2008 at 6:32 am)
That’s why I said about “less ambiguous”. I am not sure this is a PHP only feature, even some ini file could contain arrays defined in that way. But I am probably wrong (however, yse, I am working hardly with PHP these days :D)
Ben (March 14, 2008 at 7:27 am)
Great article again. I’m sure I’ll be using the “search don’t replace” feature in the future!
Chris (March 14, 2008 at 7:50 am)
I’ve been doing string tricks like that for a while but then realized that I could still do it in a shorter (and IMHO clearer because no ?: is used) way using arrays:
function compress3(data){
var q = {}, ret = [];
data.replace(/([^=&]+)=([^&]*)/g, function(m, key, value){
(q[key] = q[key] || []).push(value);
});
for ( var key in q )
ret.push(key + "=" + q[key].join(","));
return ret.join("&");
}
The only tricky part is the (q[key] = q[key] || []) where I initialize and return an array at the same time. Not necessary in languages like PHP where arrays are automatically initialized ;-)
Andrea Giammarchi (March 14, 2008 at 7:54 am)
Chris … I do not know why … but I have a little dejavu’ …
Chris (March 14, 2008 at 8:16 am)
Should teach me to fully read the previous comments *grin*. Your code is too complicated but folows the same idea as mine: Array joining without arrays might not be worth it after all.
Andrea Giammarchi (March 14, 2008 at 8:36 am)
My code follow a different idea but uses the same trick … that is not really so trick, it’s just JavaScript :D
Anyway, this is a different proposal:
function compress(data, sep, php){
var q = {}, ret = [], key;
data.replace(php ? /([^=&]+?)(\[\])?=([^&]*)/g : /([^=&]+)(=)([^&]*)/g, function(match, key, isArray, value){
isArray ? (q[key] || (q[key] = [])).push(value) : q[key] = [value];
});
for(key in q)
ret.push(key + "=" + q[key].join(sep || ","));
return ret.join("&");
};
alert(
compress("foo=1&foo=2&foo=3&blah=a&blah=b") ===
compress("foo[]=1&foo[]=2&foo[]=3&blah[]=a&blah[]=b", true)
); // true
Brian (March 14, 2008 at 8:53 am)
Clever but dumb.
Granted the trick is kind of clever, but it is awful.
Its a programming trick based solely on Side-Effects.
Which in general is bad, because it makes programs hard to reason
about. But in this case is obscure side-effect that is hard to read.
Basically its incredibly unreadable code.
Andrea Giammarchi (March 14, 2008 at 9:09 am)
I wonder which part of this code is unreadable …
(q[key] || (q[key] = [])).push(value)
it does not assign every time the array, it uses the OR behavior only the first time … and it is totally as clean as simple, at least this is my opinion.
Side effects … why? As I said, this is not a trick, this is JavaScript
Clint Ecker (March 14, 2008 at 9:29 am)
I’m definitely going to have to pick up this book :)
Michael (March 14, 2008 at 9:34 am)
Seems like in the original code it would be faster if
(1) You return the original value (no replacement = no string conversions or string munging by the JS engine)
(2) You assign q[key] to a temp to avoid reading it twice.
That is:
function compress(data){
var q = {}, ret = "";
data.replace(/([^=&]+)=([^&]*)/g, function(m, key, value){
var tmp = q[key];
q[key] = (tmp ? tmp + "," : "") + value;
return m;
});
for ( var key in q )
ret = (ret ? ret + "&" : "") + key + "=" + q[key];
return ret;
}
Kevin H (March 14, 2008 at 9:39 am)
Andrea, I think Brian was referring the John’s blog post and not your comment.
And indeed, while I wouldn’t say it as condescendingly as he did, I have to agree with Brian’s point. Overloading the replace() function, while clever, is not how you write maintainable code. And besides, it is completely unnecessary. According to my Core Javascript Reference 1.5 document, “If your regular expression uses the “g” flag, you can use the exec method multiple times to find successive matches in the same string”. Thus, here is a more proper solution to Marc’s problem:
function compress(data){
var match = [], q = {}, ret = "", re = /([^=&]+)=([^&]*)/g;
while (match = re.exec(data)){
q[match[1]] = (q[match[1]] ? q[match[1]] + "," : "") + match[2];
}
for ( var key in q )
ret = (ret ? ret + "&" : "") + key + "=" + q[key];
return ret;
}
Scott (March 14, 2008 at 9:46 am)
I happened to be testing this technique a few months ago, and I discovered its performance is bizarrely inconsistent across different browsers. I think it’s generally better to have the replacement function explicitly return a value (since repeatedly appending
""
will be faster than converting and appendingundefined
when there are a large number of matches), but what value you choose here really just depends on what browser you like best.You could return an empty string, the first argument,
undefined
(with no return statement), or something else entirely (which you could also try caching outside the replacement function). My tests in Firefox, Opera and Safari all preferred different return values, and in some cases the fastest function in one browser was the slowest in another.So I’m curious to know what’s going on here, and if there is a best choice for return value. I do think that hijacking the replace function in this manner is useful, and performs much better than the traditional alternative:
var m;
while (m = re.exec(data)) {
var $1 = m[1],
...
$n = m[n];
// do stuff
}
P.S. I also couldn’t resist posting my stab at a short solution to Marc’s problem:
function compress(data) {
data = data.replace(/([^&=]+=)([^&]*)(.*?)&\1([^&]*)/g, "$1$2,$4$3");
return /([^&=]+=).*?&\1/.test(data) ? compress(data) : data;
}
Kevin H (March 14, 2008 at 10:06 am)
2 more quick points before I disappear into the ether.
Why all this talk about speed and efficiency? Michael above me says “it would be faster if…”; John in the OP says the overhead of creating a bunch of arrays is “costly”. Smells like pointless optimization, to me. The javascript engine can whip about thousands of arrays and/or strings in mere milliseconds. A URI is never going to be long enough that we need to worry about the overhead in parsing it.
Why does the Note: for this “Leave a Comment” box say to wrap my code block in <code> … </code> if my indentation formatting is lost regardless?
David G. (March 14, 2008 at 10:17 am)
Hi,
You said: ‘… into one that looks like this: “foo=1,2,3&blah=a,b”. ‘
Doesn’t it always look like this though: “foo=1,2,3,&blah=a,b,” since your anonymous function appends a “,” onto every value it finds (including the last one, eg. the 3) and you never trim off the last trailing comma from the end of the string …
I enjoy writing code like this alot :) (eg. http://snippets.dzone.com/user/tenken/)
John T (March 14, 2008 at 10:54 am)
Scott: I think you win, but I have no idea why!
Kevin: This code doesn’t need to be maintainable… its ostensibly a one-off.
David: It adds the comma *before* the value, between the existing list and the new value.
Jonathan Snook (March 14, 2008 at 11:08 am)
Scott’s solution wins in my book, even for readability (if you can consider regex readable).
John T: the regular expression looks for a key and then checks to see if the key occurs again, moving the value of a repeated key into the comma delimited value of the first key. Then it checks to see if there’s any more dupes. If there are, it runs the compress function again (until no more key dupes are found).
Cris (March 14, 2008 at 11:18 am)
Kevin H: I can see that your example is more linear (and thus, arguably, more maintainable — at least to a wider audience) but why do you feel it is “more proper” to use a while loop instead of a global flag?
David G: John’s script only appends a comma to q[key] when `key` already exists in q, and then it immediately appends `value`.
q[key] = (q[key] ? q[key] + "," : "") + value;
There will be a trailing comma only if value was already zero-length.
Antonio (March 14, 2008 at 11:36 am)
Great article but, why would you use it? .htaccess and php are not enough?
Sam (March 14, 2008 at 1:29 pm)
Is there an International Obfuscated JavaScript competition?
Tzury Bar Yochay (March 14, 2008 at 2:30 pm)
this is a great lesson in programming. waiting for the ninja book!
Anon (March 14, 2008 at 3:41 pm)
Most languages have an alternative way of reading parameters. The “&a[]=b&a[]=c” is an ugly hack that ignores what other languages have done for years.
Example:
In Java you have two different methods:
Query: &a=1&a=2
request.getParameter("a"); // returns only one parameter like PHP
request.getParameterValues("a"); // returns an array
Ted Henry (March 14, 2008 at 4:27 pm)
Using String.prototype.replace to search a string is a good example of atrociously bad JavaScript programming. Please read the following reference:
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp:exec
Apparently JavaScript “ninjas” are going to be white belts.
Oliver Steele (March 14, 2008 at 6:03 pm)
“The end result is that we’re only ever dealing with strings and string operations (instead of arrays) but achieving the result of joining a set of values without using an array”: well, yes, but copying the entire string each time. This is probably the right tradeoff for constructing really short strings like this, but it’s a standard performance bug (in many languages) for constructing anything longer.
BTW, if you know that all your ‘foo’s are contiguous as in the example, and you have a mania for line counts, then you can shorten this to (still using string append):
function compress(data) {
var ret = '', last;
data.replace(/([^=&]+)=([^&]*)/g, function(m, key, value) {
ret += (last == key ? ',' : '&' + (last = key) + '=') + value;
});
return ret.replace(/^&$/, '');
}
pawel (March 14, 2008 at 7:13 pm)
Solutions with regular expressions are really cool, I didn’t even think I could try this way – thanks for sharing this solution.
Before reading this article I think I’d try using split(‘&’) on the provided string, like this:
function compress(data, ret, temp, i){
data = data.split('&'), ret = [], temp = {}, i=0;
while(data[i]){
var t = data[i].split('=');i++;
temp[t[0]] = (temp[t[0]] ? (temp[t[0]]+',') : '') + t[1];
}
for(var j in temp){ ret.push(j+'='+temp[j]) }
return ret.join('&');
}
Benjamin Lupton (March 14, 2008 at 9:33 pm)
Here is my solution:
function compress(str) {
return str.replace(/(&[^=&]*=)([^&]*)((?:\1[^&]*)+)/g, function(m, key, value){
return key+m.replace(/&[^=&]*=([^&]*)/g, ',$1').substring(1);
});
}
Andrea Giammarchi (March 15, 2008 at 3:36 am)
Using this callback and this source:
var callback = function(match, position, source){
// do your stuff
};
source = "abc";
I wonder which code you prefer:
source.replace(/\w/g, callback);
or
var re = /\w/g,
position = 0,
found;
while(found = re.exec(source))
callback.call(null, found[0], position = source.indexOf(found[0], position), source);
Making simple things boring, is atrociously bad as well, IMO.
Anyway, for this task, Scott solution is probably my favorite, good stuff!
yuweijun (March 15, 2008 at 4:12 am)
Benjamin, your code has something wrong.
Ted Henry (March 15, 2008 at 11:47 am)
@Andrea: Your theoretical example is particularly tuned to favor the hack of using replace(). Even so, your second example is artificially bloated. For a ECMAScript-262 conforming implementation there is no need for |call| or |indexOf| in your second example. What I would prefer is knowing how to use the language properly and doing so.
var m;
while(m = (/\w/g).exec(source)) {
callback(m[0], m.index, source);
}
The intent is clearer and no one will be fooled into thinking I’m doing a replace operation. There is no overhead of a replace operation.
Jeffrey (March 15, 2008 at 12:37 pm)
@Ted Henry: Putting aside your snarky comment for a moment, can you clarify your point?
Are you only concerned about best practices here? Or is there actual real-world performance issues using a regex command in a string.replace?
When it comes to javascript over the web, I’m more concerned with performance on different browsers and download file sizes than anything else.
JK
Andrea Giammarchi (March 15, 2008 at 1:34 pm)
ooops, I use the replace with callback technique so much that I forgot the index, even if I used them far ago to fix replace in old IE browsers (and I do not know why I used call as well …)
Anyway, I cannot understand your first post Ted … there’s nothing wrong in replace usage, IMO, and if we have more than a way to do something, performances and best practices a parte, it’s a point of view, which one is better. Regards
timothy (March 15, 2008 at 1:36 pm)
@Jeffrey,
I’m assuming Ted is concerned about clarity and maintainability. But I, also, would like to know more. Exactly what about borrowing a built-in method (which is likely to be a good and fast implementation) is offensive? Are you also against borrowing with call or apply? Are you against borrowing Math.max and applying it to find the maximum value in an array?
Best practices are rarely universally agreed upon.
I’ve pre-ordered Crockford’s book. I feel like I already know a lot of what he likes and dislikes about JavaScript, but I’m dying to see the 250 pages he’s put down for O’Reilly to spell it all out.
Ted Henry (March 15, 2008 at 1:57 pm)
@Jeffrey: We are looking at choosing between a hack and using the language as intended. Making decisions based on performance testing may be a bad idea because there are so many ECMAScript implementations and each will optimize different parts of the language. The hope would always be that using the language as intended would be optimized by the implementation and if not then it will be soon.
// TRIAL ONE
var callback = function(match, position, source) {};
var start = new Date();
for (var i=0; i
// TRIAL TWO
var callback = function(match, position, source) {};
var start = new Date();
for (var i=0; i
I get mixed results
Firefox 2
trial one is 280 ms
trial two is 470 ms
OPERA 9
trial one is 518 ms
trial two is 272 ms
You could decide which version to use based on the fact that Firefox is more popular but look what has been tested. We are doing 10000 iterations and that is much larger than common JavaScript operation. Both trials are very fast for common amounts of text manipulation. Readability and maintainability is important too.
Ted Henry (March 15, 2008 at 1:57 pm)
@Jeffrey: We are looking at choosing between a hack and using the language as intended. Making decisions based on performance testing may be a bad idea because there are so many ECMAScript implementations and each will optimize different parts of the language. The hope would always be that using the language as intended would be optimized by the implementation and if not then it will be soon.
// TRIAL ONE
var callback = function(match, position, source) {};
var start = new Date();
for (var i=0; i<10000; i++) {
var source = "abc";
var m;
source.replace(/\w/g, callback);
}
alert((new Date()).getTime() - start.getTime());
// TRIAL TWO
var callback = function(match, position, source) {};
var start = new Date();
for (var i=0; i<10000; i++) {
var source = "abc";
var m;
while(m = (/\w/g).exec(source)) {
callback(m[0], m.index, source);
}
}
alert((new Date()).getTime() - start.getTime());
I get mixed results
Firefox 2
trial one is 280 ms
trial two is 470 ms
OPERA 9
trial one is 518 ms
trial two is 272 ms
You could decide which version to use based on the fact that Firefox is more popular but look what has been tested. We are doing 10000 iterations and that is much larger than common JavaScript operation. Both trials are very fast for common amounts of text manipulation. Readability and maintainability is important too.
timothy (March 15, 2008 at 2:21 pm)
>>We are looking at choosing between a hack and using the language as intended.
Given the origin story of JavaScript, I don’t think anyone uses the language “as intended.” JavaScript’s chief advantage, in my opinion, is its incredible malleability. What’s clear to someone who came to JavaScript via Scheme may not be clear to someone who came from C or Java.
Borrowing methods is certainly a JavaScript strength. In what way was this flexibility not intended? It doesn’t look like a hack to me. I’m very much sympathetic to your call for clarity and maintainability, though.
Andrea Giammarchi (March 16, 2008 at 12:57 pm)
@Ted, considering that var m; in replace loop, is superfluous, try to consider callback features, portability, and maintainability. You are using a single example that uses a simple regexp. Try to do the same with a regexp that contains much more informations, and try to have a callback behavior with arbitrary number of arguments.
In both case you have a callback to use, in the second case you have to write manually both arguments callback and its direct call specifying indexes, length, etc … the case is more close to a forEach call, you can obviously use a for loop calling manually the callback choosing or not desired scope, or you can use forEach and take care only about callback. I prefer the second one, and in replace case I would like to concentrate into regexp and callback instead of regexp, callback, and manual callback call.
As final point, I do not know how missed return is managed, but if it’s temporary stored as one or more “undefined”, the single thing to do is to return a “” at the end of the callback.
I agree with you when you say that JavaScript should be used properly, but I cannot agree with you in this case, where malleability is used (and it is a gorgeous part of JavaScript) and both ways are good enough for this purpose.
Regards
rafsoaken (March 19, 2008 at 3:53 pm)
A minor cleanup, but maybe now a tick simpler than yours John..
function compress2(data){
var q = {}, ret = "";
data.replace(/([^=&]+)=([^&]*)/g, function(m, key, value){
q[key] = q[key]? q[key]+","+value : "&"+m;
});
for ( var key in q )
ret += q[key];
return ret.substr(1);
}
Thx for the always good quality of your posts btw.
Chetan (March 21, 2008 at 1:43 am)
Hi All,
Anyone please tell me how can i get unique value from array/string in javascript.