Weblog entry #380 for simonw

egrep performance improvements
Posted by simonw on Tue 27 Jul 2010 at 11:09
Tags: none.
Was investigating poor logcheck performance on a box (it was still running it 8 hours after it started), extracted a test case to compare on other hardware.

Discovered that GNU egrep in Squeeze runs this test case over 10x quicker than the egrep in Lenny!? The folks writing GNU grep have clearly been fixing bugs.

On Lenny server comparing packaged 2.5.3 with compiled from source 2.6.3

~/grep-2.6.3/src$ time ./egrep --text -f /tmp/patterns /tmp/logs | wc -l
9

real 0m0.115s
user 0m0.084s
sys 0m0.000s
~/grep-2.6.3/src$ time egrep --text -f /tmp/patterns /tmp/logs | wc -l
9

real 0m1.568s
user 0m1.564s
sys 0m0.004s

My issue was fixed by removing some pointless checks in /etc/logcheck/crack.d/logcheck (Postfix doesn't even support expn) which I suspect were left in from an earlier release of logcheck, but I'm sure there are some folks who use egrep/grep a lot thinking "is it too late to backport grep?", or who will just upgrade to Squeeze after realising there may be an order of magnitude improvement to be had for their workload.

 

Comments on this Entry

Posted by Anonymous (213.227.xx.xx) on Mon 16 Aug 2010 at 20:43
Maybe file was in cache the second time that was read ?

Run sync; echo 3 > /proc/sys/vm/drop_caches (after each command) and test again

Regards,

[ Parent | Reply to this comment ]

Posted by simonw (78.33.xx.xx) on Tue 26 Oct 2010 at 18:23
[ Send Message | View Weblogs ]
I allowed for that by running commands several times. But the second case in the example was slower than the first, egrep is simply now faster and less buggy.

[ Parent | Reply to this comment ]

Posted by linulin (91.202.xx.xx) on Tue 26 Oct 2010 at 22:21
[ Send Message ]

UTF locale causes significant grep performance degradation in older versions:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=401259#25

--
...Bye..Dmitry.

[ Parent | Reply to this comment ]