Search and replace across many files with a perl one-liner

Posted by Anonymous on Fri 25 Nov 2005 at 14:52

Tags: ,
Hello nerdlings! Steve has gone jolly wild with perl this festive season. And to add to the spirit, here's a piece from me on how to do a search and replace across many files in one line of perl, for several kinds of cases:

  1. The good- the simple case
  2. The ugly - the multiple line case
  3. The bad - the stupid case
  4. Executive summary

1. The good - The simple case

Sometimes you want to change the same pattern in a lot of files. How can you do that?

Now, vi and emacs can both open a lot of files, eg: vi *.html. And you can then do a search and replace on the same pattern. But here's a neat perl trick that sometimes comes in useful:

Suppose you have a bunch of files (file1, file2, file3 etc) which have the same chunk of text in them. Eg, the text:

Newspapers are about news.

In a fit of disgust at the tripe that is printed in the Sunday Spurt, you decide to change the statement to

Newspapers are about selling advertising space.

Then this perl in-place edit one-liner comes in handy here:

perl -pi -e 's/about news\.$/about selling advertising space\./' file*

Tada! You're done.

The flags (perldoc perlrun) work like this:

-p              loop and swallow the files, and print default.
-i edit the files in-place
-e do the command
The ugly - the multi-line case

But what if you originally had three lines in each file? Imagine you were a zombie, and that your actions were controlled by a MegaHurtz MindNumboJumbo Zombifier Ray. This Zombifier ray is being wielded by a cackling mad scientist Frankenstein stereotype, and he has had you write the following three lines in every file, for some unfathomable mad scientist type experiment:

Newspapers are about news.
McDonalds is a fast food restaurant.
Google is a search engine.

Note that the lines are next to each other (not scattered in separate places in the file).

Now, while the mad scientist is attending to his Van Der Graaf generator in the other room, Igor, the shambling, drooling, mumbling assistant to the mad scientist makes a hat to amuse himself. Then he puts this hat on your head, and has a giggle at your appearance. But what Igor doesn't realise is that the hat, made of tin foil, repels the mind control zombifier waves. Suddenly, you are free to do whatever you like! And, overwhelmed by feelings of extreme cyncism you decide to take revenge! Yes! You decide to change the lines to:

Newspapers are about selling advertising space.
McDonalds is a real estate business.
Google is a data-acquisition corporation.

That will show them!

Ah. Well, doing this transformation turns out to be rather more abstruse perl magic. The previous incantation: (perl -pi -e 's///' file*) won't work, because the search part stumbles over the new lines (the record separators) in each of the files it swallows.

Now, the -0 flag in perl specifies (in octal) the input record separator (new lines in a text file). If we rely on the fact that no 0x777 character exists, we can swallow each file in one big gulp, and then do the search and replace over many lines. So:

perl -p0777i -e 's/about news\.\n^McDonalds is a fast food restaurant\.$\n^Google is a search engine\.$\n/about selling advertising space\.\nMcDonalds is a fast food restaurant\nGoogle is a data-acquisition corporation./m' file*

is the (rather long) one-liner that will do the job. The m modifier allows the anchor match across multiple lines.

So, now you go ahead and edit all the files that contain this pattern in an instant, thereby foiling the mad scientist's evil experiment before he even comes back. Vengeance is yours!

The bad - the stupid case

One important caveat, before you go all wild and crazy and start applying this technique to every situation.

Perl in-place editing, while handy, does not usually scale up very well. Especially if you have single and double quotes to handle. The bash shell has some interesting quirks (including that you cannot escape a single quote within single quotes). That can result in eye-bleeding contructions like in this example:

Task:

replace this text which is inside a bunch of *.php files:
 
if (preg_match('/bsd.example.com/i', $httpbase)) {
  echo "  _uacct=\"UA-12345-2\";\n" ;
} elseif (preg_match('/beta.example.com/i', $httpbase)) {
  echo "  _uacct=\"UA-12345-3\";\n" ;
} elseif (preg_match('/example.com/i', $httpbase)) {
  echo "  _uacct=\"UA-12345-1\";\n" ;
}
 
with:
echo "  _uacct=\"$uacct\";\n";
 
Solution:
An approprite in-place edit in bash is then:
 
 
perl -p01000i -e "s/if \(preg_match\('\/bsd\.example\.com\/i', \\\$httpbase\)\) \{\n  echo \"  _uacct=\\\\\"UA-12345-2\\\\\";\\\n\" ;\n\} elseif \(preg_match\(\'\/beta\.example\.com\/i\', \\\$httpbase\)\) \{\n  echo \"  _uacct=\\\\\"UA-12345-3\\\\\";\\\n\" ;\n\} elseif \(preg_match\(\'\/example\.com\/i\', \\\$httpbase\)\) \{\n  echo \"  _uacct=\\\\\"UA-12345-1\\\\\";\\\n\" ;\n\}\n/echo \"  _uacct=\\\\\"\\\$uacct\\\\\";\\\n\";\n/" *.php

(Go on, scroll right to see the rest...)

Yes, I went wild and crazy and really did that one in real life. But do as I say, not do as I do!

Let's summarize all this now:

Executive summary:

  1. perl -pi -e 's/FINDTEXT/REPLACETEXT/' file*
  2. if FINDTEXT is one line.
  3. perl -p0777i -e 's/FINDTEXT/REPLACETEXT/m' file*
  4. if FINDTEXT is many lines
  5. Best for simple replacements.

There! Now you can go around thrilling and impressing everyone with your perl prowess by saying, "Yes, I can fix the site in one line of perl".

ps: http://www.noctilucent.org/blog/archives/2003/12/replacing_large.html covers a more sensible way of handling larger texts. In case you don't have that megalomaniac urge that all geeks secretly have.

PJ

 

 


Posted by Anonymous (85.99.xx.xx) on Fri 25 Nov 2005 at 21:08
What about using find + sed for dhis kind of stuff? I think they better fit this kind of small cases, and you can always write multiple sed expressions with -e parameter which makes the code look much less obfuscated. You can even write a script built of sed expresions using that (-e) parameter.

For example:

mysed.sed:
#!/bin/sed -e
s/Altavista/Google/g
s/Hotmail/GMail/g
s/RedHat/Debian/g
Then you can just use this find command to change everything:

find -type f -name "*.php" -exec mysed.sed {} \;

Of course you can use much more advanced regular expressions in your sed script or just pass them to sed by using multiple -e parameters.

[ Parent | Reply to this comment ]

Posted by Anonymous (203.122.xx.xx) on Sat 26 Nov 2005 at 04:35
Sed (the stream editor) is great for this sort of thing too. It is a very focussed language, installed by default on GNU/Linux, and it fits in well with the original unix philosophy of powerful, orthogonal tools.

Sed is also far more lightweight than perl, which means you can use it in places where you have to worry about the overhead (say on a heavily loaded mail system in a procmailrc - hmmm... maybe I'll write up something on that sometime too).

However sed regex syntax and s/// operator differ from perl's. And perl being more popular, as well as more powerful, means sysadmins are probably more comfortable using perl for their dirty hacks.

By all means use the sed language if you are comfortable with it. It is an elegant weapon from a more civilized time. A good springboard - a list of handy sed one-liners - is a http://sed.sourceforge.net/sed1line.txt

[ Parent | Reply to this comment ]

Posted by Anonymous (193.174.xx.xx) on Tue 29 Nov 2005 at 12:47
I'm not quite sure, if you can do it like this as sed will print the replaced content on stdout?

That way you have a nice listing of the content of all files with their replaced lines.

[ Parent | Reply to this comment ]

Posted by Anonymous (59.144.xx.xx) on Tue 29 Nov 2005 at 14:02
I found replace command, which does same stuff with limited regex support. e.g.
replace OLD-WORD NEW-WORD < oldfile > newfile

And I'm happy with this tool. Source is here

[ Parent | Reply to this comment ]

Posted by Anonymous (203.122.xx.xx) on Sat 26 Nov 2005 at 18:23
I just noticed Steve had written a similar article a few months earlier (http://www.debian-administration.org/articles/197). In it he mentions rpl, which is a python-based tool that does this sort of thing fairly easily. Probably a good thing to keep around.

BTW, that related links sidebar thing is a good idea, Steve. That's how I noticed your article.

PJ

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Sun 27 Nov 2005 at 21:12
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Thanks, it was an idea I borrowed from Slashdot...

It just relies on me remembering that I've covered something similar or related in the past to add the links!

I'm keen on adding keywords and other meta-data to the articles but it is something I keep putting off until I have more free time... you can imagine how well this works ;)

Steve

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Tue 29 Nov 2005 at 19:45
[ Send Message | View dkg's Scratchpad | View Weblogs ]
i was reading this, and i thought: hey wait a second: octal 777 requires 9 bits! bytes are only bits long. what's going on here?

a little bit of digging around turned up this paragraph in man 1 perlrun:

The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl to slurp files whole because there is no legal byte with that value.
So, just to clarify: using -0777 is actually a particular special case that should work for any file. you don't need to worry that it's going to make your code fail on some obscure file with weird content that you didn't expect.

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

Which init system are you using in Debian?






( 1609 votes ~ 7 comments )

 

 

Related Links