2

I have an extremely strange error message which causes a complete system crash and remount of the filesystem as read-only. It all started ages ago when I installed a dodgy $2 ebay PCI modem and there were kernel panics showing up monthly and the output was huge. A new hard disk and a dist-upgrade later I have gotten the error to be very sporadic and a lot smaller in terms of what is actually printed. (it's still rubbish to me - even after thorough googling )

This system when booted into Debian has been 'cursed' I was thinking about trashing the computer and getting a new one... but because it is only a Linux problem it must be software!!

Basically here it is (I post now because I crashed today but also yesterday):

EXT2-fs error (device hda1): ext2_check_page: bad entry in directory #5898285: rec_len is smaller than minimal - offset=0, inode=5898285, rec_len=8, name_len=1 Remounting filesystem read-only

What is going on? I then have to pull the power out, reboot, fsck -y, reboot and then that usually settles it for a while.

If this could be figured out I would be so happy.

Thanks in advance for any light you guys can shed on this matter.

--EDIT:

Now running updatedb causes this error every time (well twice) and that means it's reproducible and trackable! (now just to fix it...)

Is it time for a new computer?

--EDIT:

resize2fs /dev/hda1 says it's already the correct amount of blocks long and badblocks doesn't return anything (is it not meant to?)

--EDIT:

Is it possible something is corrupting all my new disks? A hardware problem - someone said it might be the disk controller, or a bios option - is there anyway to check this?

Thanks.

2 Answers2

2

That really does sound like the filesystem's idea of the partition size is different to what the actual partition size is. You said you installed a new hard drive; if you transferred the filesystem to the new hard drive with dd (or some other method that didn't involve a mkfs on the new disk) this could happen.

Try running resize2fs /dev/hda1 from within a rescue environment (after a fsck -f, etc) and see if the filesystem size changes. I'm guessing that it probably will, and your problems will mysteriously go away.

womble
  • 98,245
1

I surely think your disk contains bad sectors. You can verify it with badblocks (http://en.wikipedia.org/wiki/Badblocks)

man badblocks:

badblocks  is  used  to  search  for bad blocks on a device
(usually a disk partition).  device is the special file corresponding
to the device (e.g /dev/hdc1).  last-block is the last block to be checked; 
if it is not specified, the last block on the device is used as a default. 
start-block is an optional parameter specifying the starting block number
for the test, which allows the  testing to start in the middle of the disk.
If it is not specified the first block on the disk is used as a default.

if you really going to be through, you shall choose the -w option (read-write test) with 2-3 passes, but be sure to backup your data because read/write tests are destroying data on the physical media.

NOTE: you will be tempted to set ext* to ignore bad blocks, but I would strongly recommend replacing the drive. Drives usually contain a few bad blocks by default, but the internal logic relocates data on the fly if OS wants to write on a known bad block. Area for this relocation is fixed, so if it gets full, drive will stop relocating sectors. This is the point you are now at, so you can expect sectors becoming faulty more and more rapidly. IF you have any guarantee on your disk, you shall get the disk replaced, if not, get a new one.

You can also consider setting up a RAID1 (from new disks) and creating backup at regular intervals (for disk medias not stored on or near the actual server/workstation in topic)

NOTE2: although a memory problem does not manifests in strictly the same errors all the time, you could also run a memtest to be sure your server hasn't got "Alzheimer`s" :)

asdmin
  • 2,080