The excellent John Gruber recently released a Perl script which is capable of providing pretty capitalization of titles (generally most useful for posting links or blog posts).
The code handles a number of edge cases, as outlined by Gruber:
- It knows about small words that should not be capitalized. Not all style guides use the same list of words — for example, many lowercase with, but I do not. The list of words is easily modified to suit your own taste/rules: “a an and as at but by en for if in of on or the to v[.]? via vs[.]?” (The only trickery here is that “v” and “vs” include optional dots, expressed in regex syntax.)
- The script assumes that words with capitalized letters other than the first character are already correctly capitalized. This means it will leave a word like “iTunes” alone, rather than mangling it into “ITunes” or, worse, “Itunes”.
- It also skips over any words with line dots; “example.com” and “del.icio.us” will remain lowercase.
- It has hard-coded hacks specifically to deal with odd cases I’ve run into, like “AT&T” and “Q&A”, both of which contain small words (at and a) which normally should be lowercase.
- The first and last word of the title are always capitalized, so input such as “Nothing to be afraid of” will be turned into “Nothing to Be Afraid Of”.
- A small word after a colon will be capitalized.
He goes on to provide a full list of edge cases that this script handles.
My Perl is a little bit rusty but I worked through the code and ported it to JavaScript.
You would use the above code like so:
titleCaps("Nothing to Be Afraid of?") "Nothing to Be Afraid Of?" titleCaps("Q&A With Steve Jobs: 'That's What Happens In Technology'") "Q&A With Steve Jobs: 'That's What Happens in Technology'"
I hope this code will be useful to some – I suspect that it’ll be easy to plug into most blogging software (without having to worry about messing around with the server-side code), or even useful as some sort of bookmarklet.
Dot. (May 21, 2008 at 7:41 pm)
In CSS you can do
text-transform: capitalize
; though, it’s certainly not as adjustable.Ben (May 21, 2008 at 10:05 pm)
Uhhhhh…. yeah, that’s the whole point! text-transform:capitalize is incredibly dumb and just caps everything, which no style guide I’ve ever come across espouses.
Adrian (May 21, 2008 at 10:14 pm)
It would be nice to see this in the form of a jQuery plugin :)
Great work regardless though!!
Jordan Sherer (May 21, 2008 at 10:22 pm)
You can check out the python version here: http://widefido.com/static/wf/files/title_case.py.txt
David Lindquist (May 21, 2008 at 10:36 pm)
And here is my stab at it:
http://www.stringify.com/static/js/titlecase.js
Jon hohle (May 21, 2008 at 10:57 pm)
this is a nice concise implementation, but it doesn’t work in IE (because of IE’s broken String.prototype.split method).
I wrote an implementation using Steven Levithan’s Cross-Browser Split. Tested in JavaScriptCore, SpiderMonkey, and JScript:
http://blogs.ittoolbox.com/emergingtech/macsploitation/archives/titlecase-in-javascript-24824
3Easy (May 21, 2008 at 11:46 pm)
John Resig: you, too, are completely excellent. So nice, elegant and delivered so quick. Excellent!
Dr Nic (May 22, 2008 at 12:02 am)
Ruby version here: http://github.com/samaaron/titlecase-rb/tree/master
Paul D. (May 22, 2008 at 12:04 am)
I was thinking. Ideally, “text-transform: capitalize”, to be of any use, would follow title capitalization norms for whatever that page’s language is (sort of how the quote tag uses language-sensitive quotation punctuation). Maybe this will be specified in the standard one day.
In the meantime, as a more accessible practice why not use “text-transform: capitalize” and have Javascript automatically re-parse all text with this CSS attribute?
Jeremy Ricketts (May 22, 2008 at 12:28 am)
I hereby dub thee SIR John Resig, Javascript Extraordinaire and demigod of all things wonderful and useful on the intertubes.
Adam (May 22, 2008 at 2:03 am)
Can you explain the last cryptic char of this regexp from your source?
var parts = title.split(/([:.;?!] |(?: |^)[“Ò])/);
David Lindquist (May 22, 2008 at 2:07 am)
Jon is absolutely right about String.prototype.split being broken in IE. What a pain! Here is a work-around that uses the
exec
method of a global regular expression object.In John’s script, replace this line:
var parts = title.split(/([:.;?!] |(?: |^)["Ò])/);
with this:
var split_re = /([:.;?!] |(?: |^)["Ò])/g;
var parts = [];
var m, idx = 0;
while ((m = split_re.exec(title)) != null) {
parts.push(title.substring(idx, m.index), m[1]);
idx = split_re.lastIndex;
}
parts.push(title.substring(idx));
This builds the same array as
split
would have, including the parenthesized substring matches thatsplit
is supposed to retain, but does not in IE.FWIW
Darren Ferguson (May 22, 2008 at 2:48 am)
Wouldn’t it be better if people just learned to write?
How lazy is it to have a machine correct case for you!
Ryan Tenney (May 22, 2008 at 3:00 am)
The cryptic char is because its UTF-8, and your browser isn’t picking up on it.
Just ported this to PHP: http://www.ryantenney.com/titlecase.php
Ryan Berdeen (May 22, 2008 at 3:08 am)
Why recreate the regular expressions each time through the loop (or each time the function is invoked)? Seems sloppy; am I missing something?
Asbjørn Ulsberg (May 22, 2008 at 3:12 am)
Paul D. is on to something quite clever. This could be implemented as a jQuery plugin that automatically wades through the DOM to find all elements with
text-transform
set tocapitalize
and fixes the capitalization so that it matches capitalization style guidelines instead of just stupidly capitalizing every word. I’m not sure how such a plugin would behave and much less perform, but it’s very interesting.lo j (May 22, 2008 at 6:42 am)
well, and what if we let user decide about capitalisation?
it’s more often useful to lowercase a well human-capitalized title than the opposite. I personally think this is an absolutely unuseful spend of time an processing power… unless you learn something from doing it…
millionmonkey (May 22, 2008 at 7:34 am)
After working with thousands of customers who want to manage their own website – I know that many just don’t pay attention to the details. Tools like this script help make things consistent across their website. Further, websites owned by small businesses are frequently maintained by “someone” in the office. I’m sure most of these folks would appreciate this kind of help. And the cool thing is you can deactivate a script if the end user wants to control their own capitalization.
5 stars John!
John Resig (May 22, 2008 at 8:50 am)
@Jon, David: I’ve re-worked my solution to work in all browsers and uploaded the new version. It’s slightly more convoluted, but it’s all there. Thanks for the input.
Chris (May 22, 2008 at 9:06 am)
“Wouldn’t it be better if people just learned to write? How lazy is it to have a machine correct case for you!”
I have 850 press releases with headlines in allcaps that I’m moving to a new CMS. Allcaps looks like shit and is hard to read, so I want to convert them (properly) into title case. A tool like this is a god-send.
Also, it WOULD be better if people learned to write, but the vast majority of them never will ;-)
John Lascurettes (May 22, 2008 at 9:14 am)
Why this:
(a|an|and|as|at|but|by|en|for|if|in|of|on|or|the|to|v[.]?|via|vs[.]?)
instead of this:(a|an|and|as|at|but|by|en|for|if|in|of|on|or|the|to|vs?[.]?|via)
Stephen (May 22, 2008 at 10:10 am)
Or:
(a(nd?|s|t)?|b(ut|y)|en|for|i(f|n)|o(f|n|r)|t(he|o)|v(s?[.]?|ia))
John Resig (May 22, 2008 at 10:14 am)
@John Lascurettes: Sure, that could work – I was just copying the regexp from Gruber’s original example.
@Stephen: I think a bit part of the list is keeping it readable (so that you can actually determine what words are a part of it).
Wade Harrell (May 22, 2008 at 10:38 am)
RE: “Wouldn’t it be better if people just learned to write?”
Even better if use cases were considered before making comments like that! i.e. Dynamically generated section title “Specifications for [prodType] Model [prodName]”, content displayed via ajax based on user choices made via select lists. Both values can be one or more words and may or may not be capitalized. A javascript soltution to deal with the display of the generated title is ideal.
david gouch (May 22, 2008 at 10:40 am)
I wrote a version that’s pretty sweet: http://individed.com/code/to-title-case/
It’s short, fast and works in IE. I’d appreciate feedback.
@John Resig: Your script doesn’t capitalize a small word directly after an opening double quote:
My Review of "The Lottery"
becomesMy Review of “the Lottery”
.david gouch (May 22, 2008 at 10:44 am)
Sorry, the input above should be
My Review of “The Lottery”
.coyote (May 22, 2008 at 11:16 am)
@gouch: Your script is a little *too* optimized, in that it is rather obtuse and hard to understand. Could you release a more accessible version of your code?
David Lindquist (May 22, 2008 at 11:23 am)
@coyote: I agree that David’s code is a bit hard to parse, but I think it is wonderful in its conciseness; very “JavaScript-esque”. And it handles all the test cases, and then some.
huxley (May 22, 2008 at 11:51 am)
Some names might trip the script up … for example von, van, van der in Germanic names (the rules might get complicated a bit because Van in Vietnamese is normally capitalized).
david gouch (May 22, 2008 at 12:42 pm)
@coyote: I added an explanation of the code to my site.
Joe di Stefano (May 22, 2008 at 1:28 pm)
@lo j: In just three lines you managed to exhibit poor grammer, spelling, punctuation and _capitalization_. Clearly, a tool such as this is completely “unuseful”.
Binny V A (May 22, 2008 at 1:48 pm)
I don’t need the JS version of this – but I need this in PHP. Thanks Ryan Tenneyy.
schnuck (May 22, 2008 at 2:55 pm)
fantastic – and already implemented. a plug-in version would be the icing.
lo j (May 22, 2008 at 4:37 pm)
@Joe di Stefano: ;) you got me! by the way, i do my best, english is not my language.
a*p (May 23, 2008 at 9:39 am)
Why do you even need this? Aren’t you writing your titles and capping them appropriately without a script?
Breton (May 24, 2008 at 4:16 am)
Bah, who needs spell-check? Shouldn’t people just learn to spell properly to begin with?
Joan Pieda (May 26, 2008 at 2:38 pm)
@JohnResig, @DavidGouch: Hehe this is interesting. How is suppose to parse this text?
Turn me into a” title”
Darren Ferguson (May 27, 2008 at 10:19 am)
@Wade Harrell don’t see what your point is with that use case at all.
jonathan (May 30, 2008 at 12:40 pm)
The problem with this whole idea is, though, that there’s no correct LIST of words not to capitalize in titles. These things are decided by a word’s part of speech, not the word itself. The best set of rules (if you’re going to have rules governing which words are capitalized and which aren’t—just capitalizing ALL the words in a title is certainly a legitimate style to use) to govern title capitalization are these:
· The following parts of speech are not capitalized: articles, coordinating conjunctions, and single-syllable prepositions
· The first and last words of a title are always capitalized, regardless of their parts of speech
· The first and last words of titles within titles are always capitalized, regardless of their parts of speech
So, what does that all add up to? Certain words—’to’ and ‘yet’ tend to be good examples—are capitalized sometimes but not others. The example in the original post, actually, includes an instance of this. In “Nothing To Be Afraid Of” ‘to’ should BE capitalized as it’s not a preposition but rather part of an infinitive (‘to be’—and no one can make a good argument for capitalizing half of an infinitive but not the other half). ‘Yet’ can be used either as an adverb or as a coordinating conjunction, and adverbs should always be capitalized.
Anyway, I think my point is that the best course of action would BE for people to learn to write.
Steve S (June 4, 2008 at 6:54 pm)
@jonathan: i think that you’re right.
i’ve spent some time on a similar function, and gave up when i realized that i would need not just grammar (as you point out), but actually AI.
“Famous Actor Not To Perform in the Tempest”… interesting, but probably not intended.
Tim (June 12, 2008 at 9:55 pm)
Anyone have something this for MySQL? I just inherited a database with city and county names in all-caps. Title case would be good enough, though admittedly some names wouldn’t end up perfect.
@Ryan Tenney thanks for the php version. Yours is probably what I’ll use to fix things up.