Sed by Example

Tuesday, November 1, 2022

We are in the middle of a multipart series. Each post focuses on one member of the command-line text-processing trifecta: Grep, Sed and Awk. In part 1, we introduced Grep, which allowed us to search and select text. Now, we will explore Sed.

Sed stands for stream editor. As its name implies, it manipulates and edits data files and streams that are piped into the program. In this post, we'll work through several examples.

As before, whenever "rhyme.txt" is referenced, assume it contains the following content:

Hickory dickory dock

The mouse ran up the clock

The clock struck one

The mouse ran down

Hickory dickory dock

-- "Hickory, Dickory, Dock" (public domain)

Example 1: Find and Replace

If we wanted to change the rhyme to be about cats, we could run the following command:

sed 's/mouse/cat/' rhyme.txt

Which would produce the following output:

Hickory dickory dock

The cat ran up the clock

The clock struck one

The cat ran down

Hickory dickory dock

Note that, by default, all lines are returned, even if they don't meet the pattern.

As stated before, Sed is a stream editor. To have it operate on a data stream instead of a file, you can stream the file into Sed using one the following commands:

sed 's/mouse/cat/' < rhyme.txt

cat rhyme.txt | sed 's/mouse/cat/'

Both commands will produce the same output.

Example 2: Adding Commas

Let's say that we wanted to add a comma to the end of every line. Using Sed with regular expressions allows us to do this easily:

sed 's/\$/,/' rhyme.txt

Remember that '$' in regular expressions matches the end of the line. In other words, we are replacing the end of the line with a comma, which is how you append content to lines in Sed.

Example 3: Adding Dashes

If we were interested in adding dashes between each word, we might be tempted to use the following regular expression:

sed 's/\\s\*/-/g' rhyme.txt

We would be correct in thinking that "\s" matches any white-space character (which is what we want). However, since "`*`" means match zero or more occurrences, the regular expression would end up matching all characters and would produce the following output:

-H-i-c-k-o-r-y-d-i-c-k-o-r-y-d-o-c-k

-T-h-e-m-o-u-s-e-r-a-n-u-p-t-h-e-c-l-o-c-k

-T-h-e-c-l-o-c-k-s-t-r-u-c-k-o-n-e-

-T-h-e-m-o-u-s-e-r-a-n-d-o-w-n-

-H-i-c-k-o-r-y-d-i-c-k-o-r-y-d-o-c-k-

We actually want to use "+" to match one or more (verses `*`'s zero or more matches):

sed -r 's/\\s+/-/g' rhyme.txt

Which correctly produces:

Hickory-dickory-dock

The-mouse-ran-up-the-clock

The-clock-struck-one

The-mouse-ran-down

Hickory-dickory-dock

Please note:

You must run sed with a "-r" flag to enable the extended regular expression syntax, or "+" won't work.
Generally, Sed will only match/replace the first occurrence in a line. Since we want our pattern to run multiple times per line, we have to append the "g" mode to the end of the pattern.

Example 4: Grouping Word Pairs

Let's take the previous example one step further. Instead of simply adding dashes, we want to use dashes to group every other word together. For example, instead of producing:

The-mouse-ran-up-the-clock

We want Sed to output:

The-mouse ran-up the-clock

To do this, we have to use the following Sed command:

sed -r 's/(\\w+)\\s+(\\w+)/\\1-\\2/g' rhyme.txt

Which will produce:

Hickory-dickory dock

The-mouse ran-up the-clock

The-clock struck-one

The-mouse ran-down

Hickory-dickory dock

There are two new concepts introduced by our Sed command:

We define a sub-pattern group using parenthesis. This allows us to refer back to part of the matched pattern without having to refer to the entire match.
To reference a sub-pattern group, use "\N" where "N" is "1" for the first group, "2" for second group, etc.

Keep Learning!

Sed is insanely powerful! See below for sites where you can learn more.

Sed by Example

Example 1: Find and Replace

Example 2: Adding Commas

Example 3: Adding Dashes

Example 4: Grouping Word Pairs

Keep Learning!

Further Reading: