Personal tools
You are here: Home Blog WARNING: mismatch_cnt is not 0 on /dev/mdX

WARNING: mismatch_cnt is not 0 on /dev/mdX

Occasionally I receive the above message from various LINUX machines. They appear as output of a cron job.

The email containing the error message looks similar to the following:

Date:      Sun, 28 Mar 2010 04:22:28 +0200
From:     root@mylinuxbox.de (Cron Daemon)
To:     root@mylinuxbox.de
Subject:     Cron <root@mylinuxbox> run-parts /etc/cron.weekly

/etc/cron.weekly/99-raid-check:

WARNING: mismatch_cnt is not 0 on /dev/md0

This is somehow confusing, because it does not really tell what kind of check was done to raise this warning, nor does it give any clue on what impact this really has on the system what to speak of some hint on how to fix it .

Well the answer is:

This has to do with the LINUX software raid configuration on the system. This cronjob does a regular scrubbing of the system and finds logical inconsistencies in the RAID devices.

In this case the scrubbing was invoked by the following command:

$ echo check > /sys/block/md0/md/sync_action

Once the scrubbing is done the value in /sys/block/md0/md/sync_action changes back to "idle"

When this is the case, one can view the results of the check link this:

$ cat /sys/block/md0/md/mismatch_cnt

In our case here the output is a value != 0 what causes the cronjob to generate the above email.

So, what to do?

Actually it is totally unclear, since what is reported here is a situation where the data on one side of a mirror does not match the data on the other side of the mirror. (What really should not happen in the first place). But here we are...

The only way I found to get back to a clean state is doing the following

$ echo repair > /sys/block/md0/md/sync_action

After that ...

$ cat /sys/block/md0/md/mismatch_cnt 

... will still report the same error count, since it has again found the same errors but actually it has "fixed" them. (Whatever that means)

But after another ...

$ echo check > /sys/block/md0/md/sync_action
$ cat /sys/block/md0/md/mismatch_cnt

..the errors are gone.

This has worked for me in more the 10 occasions on various systems. I never experienced any oddities after following this procedure.

However I really wonder what the purpose behind this whole thing is. If I have no other choice of fixing the problem then running an "automatic selfhealing command", then why is the software bothering me in the first place and doesn't do it by its self?

And why is this whole thing -- check,report,fix -- not implemented in the mdadm command?

Knowing other software RAID solutions like vxvm, svm,ASM and zfs which appear much more transparent and streamlined, this really leaves me puzzled.

 Few more informations are available here:

https://raid.wiki.kernel.org/index.php/RAID_Administration

Document Actions