Using an idea grabbed from a mailing list post, I implemented the diff algorithm discussed in the following paper (free registration required):
P. Heckel, A technique for isolating differences between files
Comm. ACM, 21, (4), 264–268 (1978).
The implementation, itself, has two functions, one of which is recommended for use:
- diffString( String oldFile, String newFile )
- This method takes two strings and calculates the differences in each. The final result is the ‘newFile’ marked up with HTML (to signify both deletions from the oldFile and additions to the newFile).
Sample Code
document.body.innerHTML = diffString( "The red brown fox jumped over the rolling log.", "The brown spotted fox leaped over the rolling log" );
Sample Output
The red brown spotted fox jumped leaped over the rolling log.
Projects Using this Code:
Downloads
This work is licensed under the MIT License.
Joe's Space (June 24, 2005 at 8:05 pm)
Today’s Jots posts
eHomeUpgrade Forums – How-To: Boot Windows XP Off a Compact Flash CardTags: usb, windows
Javascript Diff Algorithm – …
ãƒ™ã‚¤ã‚¨ãƒªã‚¢æƒ…å ±å±€ (June 27, 2005 at 7:27 am)
Javascriptã§diff
Javascript Diff Algorithm Javascriptã«ã‚ˆã‚‹d…
Waxy.org (June 27, 2005 at 7:20 pm)
Wikipedia History Contest Winners
Two weeks ago, I summoned the Lazyweb for a way to automatically generate a slideshow of Wikipedia revision history. I…
Andrew (June 27, 2005 at 9:18 pm)
Fantastic, a work of art. Even if it is just a re-implementation of some C or VBscript algorithms. Great contribution.
John Resig (June 27, 2005 at 10:26 pm)
Andrew I really don’t mind that it was a ‘re-implementation’, a public version of the algorithm, done in Javascript, didn’t exist, so it was a niche that needed to be filled. It was a fun, and simple, algorithm to implement – they should have 200-level CS students give it a try.
Mathieu 'P01' HENRI (June 29, 2005 at 10:20 am)
I don’t mind either. Having such function in JavaScript may prove to be extremely useful. Thank you for disclosing it.
Philipp Keller (June 30, 2005 at 6:09 am)
Cool function! Thank you.
I think I spotted a mistake in your example: “jumped” should be striked-through and “leaped” green. Or am I mistaken?
Simon (June 30, 2005 at 6:09 am)
Nice.
Why not use INS and DEL to markup insertions and deletions respectively? That way you’re semantic from the off.
John Resig (June 30, 2005 at 7:12 am)
Philipp I was lazy and wrote it out by hand – thanks for catching that! I’m so ashamed.
Simon I’ve changed the code to use ins and del, honestly, I have no idea why I didn’t use those the first time around, it’s not like me to use non-semantic elements.
Philipp Keller (June 30, 2005 at 8:01 am)
Simon: I know this situation. You write a post and you instantly get lot of people reading it (via del.icio.us popular) and then all of a sudden, the errors really matter.. :-)
Carl (July 4, 2005 at 6:03 pm)
I might use this in my Firefox extension, Page Update Checker (see link). Any tips for diffing an arbitrary HTML page?
John Resig (July 4, 2005 at 9:52 pm)
Carl Give this code a try:
If you’re currently displaying the old page and you want to show the new page, with diffs.
document.body.innerHTML = diffString( document.body.innerHTML, newHTML );
OR if you’re currently displaying the new page and you want to show the diffs with the old page
document.body.innerHTML = diffString( oldHTML, document.body.innerHTML );
Running diffString against the contents of document.body.innerHTML will (should) work just fine, for most cases. If you have any problems, let me know.
Carl (July 5, 2005 at 1:59 am)
Your code seems to be working excellently. My main battle right now is with XUL and character encodings. I think this will be a nice addition to this extension! You can expect it in PUC 0.3. (If you’d like you can see the source of my latest unstable version — follow link, “pucNotify.xul” uses jsdiff.js)
Carl (July 6, 2005 at 10:52 pm)
Any ideas why i might be getting this error?
Error: ns[n[i]].rows has no properties
Source File: chrome://puc/content/jsdiff.js
Line: 44
I’m passing it text that is parsing fine as XML. It isn’t a null string either.
Thanks,
Carl
John Resig (July 7, 2005 at 9:55 am)
Carl Not offhand. If you could email me the text that you’re trying to test this against, I’ll see if I can duplicate it.
Vitaly Friedman (July 8, 2005 at 6:46 am)
Thanks, it’s really a useful function – I have never thought about it :)
Erik J (July 19, 2005 at 11:09 am)
I’d just recently received a request to have a web based app have this type of functionality, so this came in handy and I really appreciate your effort. I’ve spent the last couple of days playing with it and ran into some issues. Hopefully my insights are accurate and can be of use to other visitors.
I noticed that the routine would not display text deleted from the beginning of the old string so I added this to the end of the diffString() function right before the return str; line:
————————————————————–
// prepend any deleted text from the old string.
pre = “”;
for (var i = 0; i 0 ) {
str = “” + pre +” ” + str;
}
————————————————————–
also, I used a span instead of del ins tags because if the info was copied to word all the deleted text was removed as was the formatting for the inserted text.
Finally, in the diff() function the loops forward and backwards through the arrays looking for duplicate words needs to check for the existance of elements (at least for me it errored out).
— snippet —————————————————-
for ( var i = 0; i 0; i– ) {
if ( i – 1 > -1 && // added this check
n[i].text != null &&
n[i-1].text == null &&
n[i].row – 1 > -1 && // added this check
n[i].row – 1
Carl (July 19, 2005 at 11:14 pm)
Erik: This might be exactly the changes i need for my mozilla extension, but It seems that some of your code didn’t get transfered correctly. for example:
for (var i = 0; i 0)
and did you mean to do this: str = “” + pre + ” “…it looks like pre never gets set to anything but the null string. I just don’t see the purpose.
Also, adding ” i – 1 > -1″ to the for loop doesn’t affect anything, right? The for look ends when i = 0, so that can never be true. (That doesn’t mean that n[i].row – 1 > -1 isn’t a good addition — I honestly don’t know what that code does).
Thanks,
Carl
Erik J (July 21, 2005 at 1:49 pm)
Carl. Sorry, I didn’t think to covert special characters so part of the code was stripped.
diffString() change:
// prepend any deleted text from the old string.
pre = “”;
for (var i = 0; i < out.o.length – 1; i++) {
if (out.o[i].text != null)
{break; // get out of loop}
else
{ pre += out.o[i] + ” “; }
}
if (pre.length > 0 ) {
str = “<span style=’BACKGROUND:#ffe6e6;TEXT-DECORATION:line-through’>” + pre +” </span>” + str; }
diff() changes:
— snippet —————————————————-
for ( var i = 0; i < n.length – 1; i++ ) {
if ( n[i].text != null &&
i + 1 < n.length – 1 && // added this check
n[i+1].text == null &&
n[i].row + 1 < o.length – 1 && // addded this check
o[ n[i].row + 1 ].text == null &&
n[i+1] == o[ n[i].row + 1 ] )
— snippet —————————————————-
for ( var i = n.length – 1; i > 0; i– ) {
if ( i – 1 > -1 && // added this check
n[i].text != null &&
n[i-1].text == null &&
n[i].row – 1 > -1 && // added this check
n[i].row – 1 < o.length – 1 && // added this check
o[ n[i].row – 1 ].text == null &&
n[i-1] == o[ n[i].row – 1 ] ) {
Erik J (July 21, 2005 at 2:02 pm)
Carl,
sorry didn’t answer all of your questions:
Also, adding †i – 1 > -1″ to the for loop doesn’t affect anything, right? The for look ends when i = 0, so that can never be true. (That doesn’t mean that n[i].row – 1 > -1 isn’t a good addition — I honestly don’t know what that code does).
in the loop I added that code the loop is stepping backwards thru the arrays items. You are correct that it will end at 0, however, at 0 it would attempt to get the item at 0 – 1, so, the line n[i-1].text == null would be evaluated as n[-1].text -== null and should throw an error.
Also, the n[i].text element was populated with the text from the new string and n[i].row element was populated with the position the word was found in the old string. As the n[i].row line is returning the row (or position, if you will) of the word in the old (or original) string, it could have also been position 0, so attempting to find element 0 – 1 (that is, -1) should throw an error.
Erik J
Yuri (August 1, 2005 at 10:33 am)
Tried this one, but haven’t got it working correctly. Perhapse I misunderstand this, but I like to use a diff on HTML. It seems that this code only likes certain texts or something.
For instance, try this:
Test
document.body.innerHTML = diffString(
“This is a \ntest”,
“This is a \ntest2”
);
You will never see the inserted 2.
Stefan (August 28, 2005 at 6:09 pm)
congrats for this great functions. When I compare
diffString(â€a b c d e f gâ€, “a b c d e f gâ€)
I get
“a b c d e fâ€
Then changing the 3rd code line inside ‘diffString’ from
for ( var i = 0; i < out.n.length – 1; i++ ) {
to
for ( var i = 0; i < out.n.length; i++ ) {
I get the expected result
“a b c d e f gâ€
but now there maybe any side effects.
Thanks again for contributing
—
Stefan
Mitchua (August 30, 2005 at 8:33 pm)
Love the script!
I don’t know if its just me, but I get an error message about “text” on line 62 when I run it in IE6. Fixed it by changing:
if ( n[i].text != null && n[i+1].text == null && o[ n[i].row + 1 ].text == null &&
n[i+1] == o[ n[i].row + 1 ] ) {
to
if ( n[i].test && n[i].text != null && n[i+1].text == null && o[ n[i].row + 1 ].text == null &&
n[i+1] == o[ n[i].row + 1 ] ) {
sprite (October 17, 2005 at 9:51 am)
Hi~
I have downloaded your code and tried to use it, and have discovered some bugs in the process:
in diff():
1. added bound checking to pass 4 and 5, as suggested by many people above.
2. use Object instead of Array for ns and os to prevent errors when the names of the methods of Array object appears in the text. (e.g. array.concat is already defined)
in diffString():
1. splitting “” will results in array containing an empty string (i.e. “”.split(/\s+/).length == 1), i have made empty string to correspond to empty array instead.
2. the deletion for the first change will not be dropped (as mentioned before)
3. diffString() will work even if the new string is an empty string. (the paper also does not address this special case)
4. string with special characters will appear now.
enhancements:
1. diffString() will output the original spacing.
After some thinking of the algorithm, I found that the output format used by diffString() does not suit the algorithm because the algorithm also detects block movements along with insertion and deletion (you can experiment this using diffString(“a b”, “b a”) and observe the output of the diff() and diffString().
Therefore I have made a side-by-side diff output format in to complement diffString() called diffString2().
Demo of the modified version can be found at:
http://appsrv.cse.cuhk.edu.hk/~achu3/hdiff/
The modified source can be downloaded at:
http://appsrv.cse.cuhk.edu.hk/~achu3/hdiff/jsdiff.js
Vishal Vishnoi (October 31, 2005 at 12:57 am)
hmm…ok.my html part got escaped out…let me try again
In added style background:#FFE6E6; to del element and style background:#E6FFE6; to ins element
Sherman Dorn (November 18, 2005 at 1:09 am)
Wow! I wish I figure out how to use this to create a paraphrasing exercise for students–ask them to paraphrase a section, and then have the function check to see what’s in common (hopefully very little!).
Sherman Dorn (November 18, 2005 at 8:57 am)
Wanted: paraphrase teaching tool
Browsing online late last night, I came across a Javascript tool for distinguishing between two and an implementation of corrected code that works great as a check on paraphrasing. I need to modify this for students at some point, as a task (and then p…
Sherman Dorn (November 26, 2005 at 7:19 am)
Thanks to both John Resig and Chu Alan, I now have my paraphrasing self-test. You’re both wonderful!
Sherman Dorn (November 26, 2005 at 7:32 am)
Paraphrasing self-test
After looking for a way to help students test their paraphrasing skills, I’ve modified what real programmers have done to create a test-your-skills-at-paraphrasing page (and the links to the folks who really deserve credit are at the bottom of it). It…
fikara (January 13, 2006 at 12:40 am)
retyu njjo ffrwxvbv iopojnf wewet bmiyre vfgjhtu
sdftvv
dyuubdhj tyuju
ghjon
moieqwmc
Cacycle (February 1, 2006 at 5:57 pm)
I have written another extended diff code in JavaScript from scratch:
http://en.wikipedia.org/wiki/User:Cacycle/diff
This version can visualize moved blocks and their original position. A post-pass-5 code resolves ‘islands’ caused by adding two common words at the end of sequences of common words. The source code is extensively commented.
TTrek (February 10, 2006 at 9:13 am)
Is there any updated version available? :)
Ben (February 17, 2006 at 5:26 pm)
Onother problem with the version posted here is that it does not flag moved blocks of test as differences. For example compare the following two strings:
The red brown fox jumped over the rolling log
The red fox brown jumped over the rolling log
In word “fox” was moved but it will not be flagged as a delete and an insert. The code written by Cacycle does catch this and show it correctly.
Mike (February 26, 2006 at 1:28 pm)
Why doesn’t the function catch the change from “log.” to “log”? Or is that another error in the manually generated sample output?
raf mathijs (March 10, 2006 at 5:15 am)
i’m trying to use the function in my addition to the cms here at my internship,
bu the strange thing is that when two texts are compared, the last word always seems to vanish, this is not that much of a problem since the main differences are displayed, but if any of you has an idea how i can resolve this, any help will be much appreciated.
Surfer (March 12, 2006 at 1:22 am)
It does appear that if only the location of words changes places, no difference is detected. i wonder if this is by design.
Cacycle diff would be perfect except that it does not work in IE :(
Neil Fraser (March 19, 2006 at 5:16 pm)
This diff algoritm is buggy in its implementation and flawed in its design. Here is a drop-in replacement which behaves properly: http://neil.fraser.name/news/2006/03/19/
Hope this helps.
codeable (March 21, 2006 at 11:30 am)
I think Neil Fraser should implement phrasing before being so crass. I tested his with simple tense changes, and was not satisfied.
try: “We are great.” “We have been great.”
All the implementations mentioned here (and there) seem to bomb out when the “replaced” section’s phrase count differs. Is this a common problem with js implementations?
Steve Bennett (March 25, 2006 at 12:08 pm)
Neil’s solution definitely works (and, surprisingly, truly is “drop in”). The problem is that it diffs at the character level, rather than the word level. So it’s been a bit unsatisfying for my needs – displaying changes in a readily understandable form to a non-technical audience. It’s also a *lot* more code than the original solution.
Cacycle (March 27, 2006 at 5:07 pm)
Neil Fraser’s algorithm is not able to detect block moves, while Heckel’s algorithm does this easily as you can see in my implementation (see above, use blocks of at least 4 words). My implementation should now work for all modern browsers and has been tested with Mozilla, Firefox, SeaMonkey, Opera, and Internet Explorer.
Neil C. Obremski (April 5, 2006 at 1:01 pm)
First, let me say this is a wonderful find.
Instead of using += for string concatenation/appending, you might consider pushing pieces to an array and then calling join(“”) at the end. It works much better for large blobs of text.
Cristian (May 10, 2006 at 4:31 am)
this pair breaks all implementations:
b a b e s
b a b s e
___________________
http://www.mento.ro – romanian froogle
Kishore (June 6, 2006 at 11:27 am)
Great work folks! This javascript function is quite handy, but recently I have been troubled by the accessibility issue… because as you guys no doubt would be aware, not everyone has Javascript turned on… so is there a PHP version of this diff algo? I Googled and saw a couple of versions.. but none of them seemed to be simple enough and used and tags etc. Anyone has any ideas on this? Thx.
Cristian (June 7, 2006 at 11:53 am)
by the way, I found a good implementation of the js diff algo which works very well here: neil.fraser.name/news/2006/03/19/
__________________________________________
http://www.mento.ro – romanian froogle
sorin (June 10, 2006 at 8:39 am)
that one is good also, but as far as I can see, it’s a completely different algorithm that John’s.
Albert Jin (August 5, 2006 at 3:35 pm)
XinDiff in Javascript was just born! I believe it is faster than all these existing ones. Have a try and prove I am wrong. :)
http://xindiff.cvs.sourceforge.net/*checkout*/xindiff/XinDiff/POC/XinDiff.html
pianio11 (November 6, 2006 at 1:40 pm)
diffString(‘abc’,’ab’) returns ”, but ab is deletion of c from abc.
Axel Boldt (November 6, 2006 at 11:27 pm)
The code contains the following lines at the beginning of diff():
var ns = new Array();
[…]
if ( ns[ n[i] ] == null )
ns is treated as a hash (associative array), the string n[i] being the key. The above line attempts to check whether the key n[i] exists in the hash. It is incorrect however: consider for example the case n[i]=”some”, ns[n[i]] being equivalent to ns.some and ‘some’ being a method of Array…
Arrays should never be used as hashes; see John Resig’s explanation.
Peter Goodman (June 14, 2007 at 11:01 am)
A remeber a while ago I was looking around for javascript diff implementations and found a few. What’s worse is that then I tried to translate them to php.. oh well, I settled for PEAR’s Text_Diff.