There's a code snippet in ABS guide.
Why do I have to quote the letter ranges?#!/bin/bash # uppercase.sh : Changes input to uppercase. tr 'a-z' 'A-Z' # Letter ranges must be quoted #+ to prevent filename generation from single-letter filenames. exit 0
The snippet's comments are wrong (as is much of the ABS; it's a very poor reference and should not be used).
If there were square brackets:
tr [A-Z] [a-z]
...then you'd have a concern about [A-Z] matching files named
A
,B
, etc. For a more visible demonstration, try this:mkdir -p ~/tmp cd ~/tmp touch A B C echo tr [A-Z] [a-z]
...and see what it emits.
As a note -- it's possible to get in trouble here even without single-character filenames on your disk if the
nullglob
option is set. To demonstrate that:rm -rf ~/tmp mkdir -p ~/tmp cd ~/tmp shopt -s nullglob echo tr [A-Z] [a-z]
...and you'll see that
tr
is invoked with no arguments at all, since[A-Z]
and[a-z]
are both interpreted as glob expressions that don't match any files, andnullglob
tells the shell to simply replace such glob expressions with nothing at all.
To be clear -- glob expansion has nothing to do with
tr
specifically; the shell would change an unquoted[A-Z]
to a list of single-character filenames matching the pattern no matter what program is being run.
///////////////////////////////////////////////////////////////////////////
Note that when using range expressions like [a-z], letters of the other case may be included, depending on the setting of LC_COLLATE.
LC_COLLATE
is a variable which determines the collation order used when sorting the results of pathname expansion, and determines the behavior of range expressions, equivalence classes, and collating sequences within pathname expansion and pattern matching.
Consider the following:
$ touch a A b B c C x X y Y z Z
$ ls
a A b B c C x X y Y z Z
$ echo [a-z] # Note the missing uppercase "Z"
a A b B c C x X y Y z
$ echo [A-Z] # Note the missing lowercase "a"
A b B c C x X y Y z Z
Notice when the command echo [a-z]
is called, the expected output would be all files with lower case characters. Also, with echo [A-Z]
, files with uppercase characters would be expected.
Standard collations with locales such as en_US
have the following order:
aAbBcC...xXyYzZ
- Between
a
andz
(in[a-z]
) are ALL uppercase letters, except forZ
. - Between
A
andZ
(in[A-Z]
) are ALL lowercase letters, except fora
.
See:
aAbBcC[...]xXyYzZ
| |
from a to z
aAbBcC[...]xXyYzZ
| |
from A to Z
If you change the LC_COLLATE
variable to C
it looks as expected:
$ export LC_COLLATE=C
$ echo [a-z]
a b c x y z
$ echo [A-Z]
A B C X Y Z
So, it's not a bug, it's a collation issue.
Instead of range expressions you can use POSIX defined character classes, such as upper
or lower
. They work also with different LC_COLLATE
configurations and even with accented characters:
$ echo [[:lower:]]
a b c x y z à è é
$ echo [[:upper:]]
A B C X Y Z