Bash — Filename expansion

Sergio Daniel Cortez Chavez
5 min readDec 6, 2021
Photo by GR Stocks on Unsplash

One of the biggest problems when starting in the Linux ecosystem is the huge amount of new concepts related to, and the large number of sources that make references to this concept in different ways.

Today, we are going to explain the concepts of filename expansion in bash and the differences with regular expressions. Both concepts are often confused but her differences become more evident in practice.

Defining patterns

Photo by Mr Cup / Fabien Barral on Unsplash

The expansion of filenames or globbing is the process of defining a pattern through a string of special characters generally known as wildcards of metacharacters and capturing or recognizing filenames that match with these patterns. For example, if we define the pattern [0-9] , this is going to recognize any of the next characters: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

A reduced list of metacharacters used by the expansion filename mechanism are:

  • *. Matches any sequence of characters, including the null string.
  • ?. Matches any character. The difference with * is that, ? only match with one character, while * match with all the sequences regardless of the number of characters
  • [...] Match with any character defined between the brackets symbols. These characters sometimes are named as the character class.

This can be considerer the more complex metacharacter, due to encapsulating many behaviors, for example:

  1. Allow defining a range of characters through the - symbol. Any character between this range can be accepted.
  • [A-Z]. Matches any uppercase alphabet character.
  • [0–9]. Matches any digits in the range 0 to 9.
  • [a-zA-Z]. Matches any character of the alphabet in uppercase or lowercase.
  • [bash]. Matches any character in the set {b, a, s , h}.

2. At the same time, support the classes defined in the POSIX standard:

  • [:alnum:]. Matches any alphanumeric character. Is similar a [A-Za-z0-9] .
  • [:alpha:]. Matches any alphabet character. Is similar to [A-Za-z] .
  • [:space:]. Matches with any character that represents a space in text ( \n, \t, \v, etc). Is similar to[\t\n\r\f\v] .
  • [:digit:]. Matches any digits. Is similar to[0–9] .
  • [:upper:]. Matches any uppercase alphabet character. Is similar to [A-Z] .

Resolution of pattern expansion

When you execute a command, Bash scans each word for these special metacharacters, if one of the characters appears, and is not quoted, the word is regarded as a pattern and replaced with a sorted list of filenames that match with the pattern.

If not matching filenames are found, bash checks the value of some shell options.

  1. If the shell option nullglob is disabled, the word is left unchanged and passed to the command.
  2. If the shell option nullglob is set, the word is removed.
  3. If the shell option failglob is set, and an error message is printed.

Note. is possible to active the not case sensitive in the pattern matching with the shell option nocaseglob.

A shell option is a setting that changes shell script behavior, in this case, the nullglob, failglob, and noncaseglob options affect the filename expansion on bash.

To activate some shell options you can use the set command, for example, for activate the nullglob, I use:

Command:

set -o nullglob

For deactivating this option, I replace -o by +o .

Command:

set +o nullglob

Examples

Suppose that you are in a directory with these files:

file0  file1  file10  file2  file3  file4  file5  file6  file7  file8  file9

Each file has some in common, all have the file prefix, and are enumerated.

To get a range of files you can use the class character, and specify the range, for example, all the files that end by a number between 1–3.

Example:

ls file[1-3]

Output:

file1  file2  file3

If you remember, the ? matches with any character. If you use this character twice times, you are matching with all the filenames that have two characters. In this case, if we use file?? , is going to match all the files that start with file and has two extra characters, in this case, the only file that matches is file10 .

Example:

ls file??

Output:

file10

In other cases, you can use the * character for match any sequence of characters, if you use f* it is going to match with all the filenames that start with f, in this case, all the files.

Example:

ls f*

Output:

file0  file1  file10  file2  file3  file4  file5  file6  file7  file8  file9

At the moment, all the examples use traditional filenames but are cases when you are required to expand the name of some files named dotfiles. These files are generally config files that are in the home directory and that start with a dot on her name, for example .vimrc . By default, these files are hidden and are not considered in the filename expansion process.

The manual reference of bash refers to this:

The filenames . and .. are always ignored when GLOBIGNORE is set and not null. However, setting GLOBIGNORE to a non-null value has the effect of enabling the dotglob shell option, so all other filenames beginning with a ‘.’ will match.

Differences between glob and regex

Regular expressions use the same concept of generating patterns, through metacharacters, that matches with strings and allow to validate input data, extract information, etc. In general, the concept of the regex is more general and is used by commands, programming languages, etc.

Regular expressions are much more expressive, but also more variable. Every program that offers regular expressions uses its own implementation and, in some cases, it is significantly different than the implementation of other languages, or commands.

Some of the metacharacters used by the filename expansion are used by regex, but regex tends to have more metacharacters and extra concepts that are more general. The biggest problem is that the * and ? characters have completely different meanings from their filename expansion counterparts. The * means zero or more of the previous pattern. So 10* will match 1, 10, 100, 1000 and so on.

To make matters even more strange, starting with version 3, bash offers regular expressions in scripting so you could have a script with both globs and regular expressions that are both going to bash.

Then there’s the fact that bash offers a different style of glob you can turn on with shopt -s extglob. These are actually closer to regular expressions, although the syntax is a bit reversed.

An excellent datasheet for regular expressions is given by Dave Clid.

Thanks for reading!

This is all for this post, I hope the content has been to your liking, see you in the next post.

--

--