Weblog entry #1 for rsteuer

New Install of Debian 3.4
Posted by rsteuer on Tue 12 Jun 2007 at 22:00
Tags: none.
I've been able to successfully load a clean copy of 3.4 on an IBM xSeries 330 server. However, boot process hangs at "agpgart: detected VIA Apollo 133 Chipset".

Being the newb that I am, I have no idea how to get past this. The machine ran 3.1 without issue, so I'm guessing something in the kernel has changed that causes the hang. Any clues?

 

Comments on this Entry

Posted by dkg (216.254.xx.xx) on Tue 12 Jun 2007 at 22:27
[ Send Message | View dkg's Scratchpad | View Weblogs ]
I'm not aware of debian 3.4. the latest release (debian etch) is considered version 4.0. Where did you get version 3.4 from?

[ Parent | Reply to this comment ]

Posted by rsteuer (207.136.xx.xx) on Wed 13 Jun 2007 at 00:11
[ Send Message | View Weblogs ]
I don't know where I got 3.4! Actually, it is 4.0. Just got carried away with the 3.x series, I guess.

I found the same condition occurs if I try installing any kernel above 2.4.27-2-386.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Wed 13 Jun 2007 at 00:35
[ Send Message | View dkg's Scratchpad | View Weblogs ]
OK, that makes a bit more sense ;) Is it the subarch optimizations that make the system fail, or the later version? That is, have you tried 2.4.27-2-686 as well as 2.6.18-4-386? What kernel did the installer use? At least that kernel should work, yes?

If you try posting a few more lines of the pre-hang output (instead of just the final line) it might give a better sense of the context where the hang is happening. (using a serial console makes it much easier to cut-n-paste this stuff accurately)

Also, if it's a stock debian install with a modern initramfs, you might try adding a break=XXX command to get a (very limited) shell during the boot process. XXX should be one of top, modules, premount, mount, bottom, or init, depending on what point in the process you want to get access to the shell to poke around.

[ Parent | Reply to this comment ]

Posted by rsteuer (207.136.xx.xx) on Wed 13 Jun 2007 at 01:07
[ Send Message | View Weblogs ]
Thanks for quick reply. If I install Debian 3.1 with the default kernel (2.4.27), then all works fine. If I install Deb 3.1 (using same install disks)but install the 2.6.xx kernel, the server hangs. Installing Deb 4.0 with whatever kernel as its default also hangs.

I don't have a clue how to capture the info with a serial console during the boot process. If you can point me in the right direction for instructions on running a serial console, I'll get the info requested and post.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Wed 13 Jun 2007 at 02:21
[ Send Message | View dkg's Scratchpad | View Weblogs ]
The Remote Serial Console HOWTO is the canonical guide to serial console use, though i find parts of it to be a bit dated.

The basic idea is that the machine you're booting (the Host) uses a serial line instead of a video terminal/keyboard for all its console interactions. You hook another machine (the "Monitoring Machine", or MM) up to that serial port and can log the entire transaction.

The key steps are:

  1. connect the MM to the Host via a null-modem serial cable (no affiliation with the vendor, just pulled from a google search). This works best if you've got two machines each with a built-in RS232 (DB9) serial port, but you only really need one built-in on the Host. If you've got a laptop or a "legacy-free" machine without a serial port as the MM, you might want a USB to serial adapter (not affiliated with this vendor either). If you go this route, make sure you get one with a chipset that's supported by the kernel running on the MM. The Prolific chipset (handled by the pl2303 kernel module) is a good bet.
  2. On the MM, start up a logged session connected to the serial port. /usr/bin/screen is good for this. If the MM's serial port is /dev/ttyS0, you would run:
    screen -L /dev/ttyS0 115200
    This assumes your user has read/write access to /dev/ttyS0, and write access to the current directory, where the log (screenlog.0) will be written. You should see just a blank screen in that window on the MM.
  3. Boot the Host. In the bootloader, pass the Host's kernel an additional parameter of console=ttyS0,115200n8 (this assumes you've connected the serial cable to the first serial port on the Host).
  4. You should see the familiar boot messages scroll past in the screen session on the MM now, and you should see screenlog.0 starting to fill up.
You can now cut and paste from screenlog.0 (and you've also have a clean, grep-able, re-readable copy of any errors that might have flown by).

The Remote Serial Console HOWTO has a lot of other good information (like how to get your bootloader running over the serial console as well), but for this particular project, the above steps should be what you need to get going. Once you have this, though, you might not want to go back to using a monitor/keyboard on your servers. Having remotely-stored clean, logged boot messages is too nice to pass up! And once you get the bootloader running over the serial console, you can do remote reboots without touching the box with much greater confidence.

[ Parent | Reply to this comment ]

Posted by rsteuer (207.136.xx.xx) on Wed 13 Jun 2007 at 02:58
[ Send Message | View Weblogs ]
Thanks again for your reply. I will give this a try tomorrow and, if successful, will post some of the output.

[ Parent | Reply to this comment ]

Posted by rsteuer (207.136.xx.xx) on Wed 13 Jun 2007 at 12:40
[ Send Message | View Weblogs ]
This may, or may not, be helpful. I didn't have a null modem cable here, so, in the interest of trying to move this along, decided to rerun setup of 3.1 in expert mode, selecting the 2.6 kernel rather than 2.4 - all other configurations remained the same. After the initial install and during the reboot, I saw the following errors:

Detecting hardware: agpgart parport_pc100 via 82cxxx aic7xxx usb_uhci
Loading agpgart module
Linux agpgart interface v0.100 (c) Dave Jones
Skipping already loaded module parport_pc.
Skipping already loaded module e100.
Skipping already loaded module via82cxxx.
Skipping already loaded module aic7xxx.
Skipping already loaded module uhci_hcd
Running 0dns_down to make sure resolv.conf is ok...done
Setting up networking...done
Starting hotplug subsytem: pci
agpgart: Detected VIA Apollo Pro 133 chipset
agpgart: Maximum main memory to use for agp memory: 203M

The last line is actually further along than I've seen in the past. Usually, it hangs at the next to last line.

As mentioned previously, if I let 3.1 install with the default, or if I select the 2.4 kernel in expert mode, the system will boot properly and all is well.

Any comments would be greatly appreciated.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Wed 13 Jun 2007 at 17:26
[ Send Message | View dkg's Scratchpad | View Weblogs ]
Hrm. those don't look like actual errors to me. They look like legit status reports. This makes me think that the thing that's failing is some form of hardware probe that happens to be launched around the same time (though i can't be sure of that, of course).

How long have you let it wait at this hang point? Is it possible that the machine is just really really slow as its probing some unusual piece of hardware?

What alternate kernel parameters have you tried during boot time?

Another tack: stock etch kernels rely on udev, which really prefers that the hotplug package be purged. Have you tried purging hotplug, discover, and any other hardware-scanning packages? They're handy, but if they're getting in the way of a boot, it'd be better to get the machine booting first. Then you can load modules by hand, since you know which modules you need, and maybe try adding back in some of the packages if you feel you need them.

[ Parent | Reply to this comment ]

Posted by rsteuer (207.136.xx.xx) on Wed 13 Jun 2007 at 17:46
[ Send Message | View Weblogs ]
You are correct in that the messages aren't errors in and of themselves. The process hangs as a result of some sort of error.

The machine has been left overnight and also been rebooted and left for hours - all to no avail.

I haven't specified any other parameters for the kernel, only selected 2.4 or 2.6 when prompted.

I wouldn't know where to begin to purge hotplug packages. This is getting to be more complicated than it should to get an install completed.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Wed 13 Jun 2007 at 18:52
[ Send Message | View dkg's Scratchpad | View Weblogs ]
If you can still boot into the 2.4.27 kernel, you should be able to purge hotplug and discover with:
apt-get remove --purge hotplug discover discover1
dpkg --purge hotplug
dpkg --purge discover
dpkg --purge discover1
apt-get -f install
the three dpkg lines are to ensure that packages that have already been removed (but not purged) actually get purged.

You could also try booting the new kernel with an init=/bin/sh parameter, just to check that the system does in fact work,and that the initscripts are what is causing the failure. You could then invoke scripts in /etc/rcS.d sequentially by hand, (e.g. /etc/rcS.d/S01glibc.sh start) note which one of them immediately precedes the hang, and remove its symlink on a second boot with init=/bin/sh.

Please report back here what you find, so that other folks who hit this same bump can learn from you!

[ Parent | Reply to this comment ]

Posted by rsteuer (207.136.xx.xx) on Wed 13 Jun 2007 at 20:02
[ Send Message | View Weblogs ]
Again, thanks for the reply. I can probably purge the packages, but not sure what that will do for me. I'm doing a fresh install rather than an in-place upgrade so, won't any changes only get overwritten during the new install?

I will try your suggestion on the init parameter and let you know what I find.

[ Parent | Reply to this comment ]

Posted by Steve (62.30.xx.xx) on Thu 14 Jun 2007 at 02:38
[ Send Message | View Steve's Scratchpad | View Weblogs ]

As a quick test I'd suggest adding "noapic acpi=off irqpoll" to your command line; just in case it is ACPI/APIC related problems. I've certainly seen enough of those in my time.

You can edit the grub command line as the system is booting to append them to your "kernel ...." line.

Steve

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Thu 14 Jun 2007 at 18:16
[ Send Message | View dkg's Scratchpad | View Weblogs ]
I've used these before myself, but they just seem like voodoo to me: what do they actually do? acpi=off i think i understand, but noapic and irqpoll are mysteries to me. If they're fixes to common problems, why aren't they present by default? What new problems might you run into by having them always present?

Do you have a good link for documentation of these kernel options? i'd love to read more to understand what they really do. /usr/share/doc/linux-doc-2.6.18/Documentation/kernel-parameters.t xt.gz only gives a very limited description of each option, and doesn't talk about the tradeoffs associated with their use.

[ Parent | Reply to this comment ]

Posted by Anonymous (72.243.xx.xx) on Wed 27 Jun 2007 at 19:52
Hey all,
My name is David, I'm a Windows admin making the move to Debian to try out some OSS softare for our startup company. I know computers, but I barely know linux, got Ubuntu and Debian running on some new machines, but I'm stopped by this same issue. First boot after install, hangs on this very same line:

agpgart: via apollo 133 chipset detected.

I have the same problem on an ibm eSeries x300, with debian etch 4r0. Were you ever able to solve it?

I have tried everything I could find in this thread, including:
adding the noacpi, acpi=off, and several other boot params

I tried to blacklist agpgart, lm-sensors, via686a, and a couple other things by adding ...
blacklist agpgart
blacklist via686a
blacklist xyzpdq

..to the blacklist file in the modprobe.d directory, but everything I reference still loads. How do I determine what is launching agpgart, or anything else?

I looked at the .sh scripts in rcS.d folder and determined the S90 script is what hangs my boot, but I cannot decipher what its' actually loading. If I knew, I couldn't stop it.

I found the conf file for x, and replaced the 'savage?' driver with 'vesa'. Same result.

I'v tried other things that I cannot remember due to my brains natural response at blocking out unpleasant experiences...

I've installed unbuntu 7.04 sever,desktop, alternate, and finally debian etch before writing this post. All hanging on the same line....

All these things, and still my boot hangs with the last line of

agpgart: via apollo 133 chipset detected.

I've invested 3 days in this, learning way more about linux boot process than I ever wanted, but find that the more I learn, the more I realize is not documented in anyplace I can find. I'd be willing to pay $50 an hour to any Debian guru who can convince me they know more than what they can find in a google search to solve this undoubtedly simple problem and bootlinux on my old, but certainly not useless server.


Thanks for reading, gurus unite and rescue this poor Windows admin who is running at top speed to catch up to the FOSS bandwagon...

-David

[ Parent | Reply to this comment ]

Posted by Anonymous (72.243.xx.xx) on Thu 28 Jun 2007 at 18:36
Hey all,
I got a solution here for the next guy, thanks to the folks in #debian
If you install debian on an IBM eSeries 330 or similar, you might need to do this...
In a nutshell you need to disable agpgart.do, and then rebuild initrd

[stuff in single quotes was typed in command line, no quotes]

I booted to shell by adding 'init=/bin/sh' to the boot param in grub.[look up how to add boot params to gub if you don't know how]

found agpgart.do under /lib/modules/.... renamed it to agpgart.do.disabled
had to mount the boot partition with 'mount /boot'
ran dpkg-reconfigure linux-image-$whateverversion, my version is 2.6.18-4-686

it does a lot of stuff, rebuilding initrd

I did 3 finger salute and it booted all the way to the desktop.

SWEET!

I learned a lot, run man initrd to see the way debian boots, pretty cool.

email me at debian AT atsfl.com if you have questions.

-david

[ Parent | Reply to this comment ]

Posted by Anonymous (213.151.xx.xx) on Thu 25 Oct 2007 at 00:21
Thank you for your post ... helped me to get my IBM eSeries x300 running...

Debian IBM eSeries x300 problem with Apollo 133 agpgart

had to edit kernel line in grub before booting (move on selected kernel you want to edit and press e), then on line starting with kernel (mine was: kernel /boot/vmlinuz-2.6.18-5-686 root=/dev/sda1 ro) and edit it (press e).

mine looked like this:
kernel /boot/vmlinuz-2.6.18-5-686 root=/dev/sda1 rw init=/bin/sh

Press enter and then b to boot kernel with theese settings.

I had to use command (becouse dpkg-reconfigure didn't work for me):

first to setup the PATH in enviroment by issuing:

export PATH=/usr/local/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:usr/bin

renamed the agpgart.ko module:

cd /lib/modules/`uname -r`/kernel/drivers/char/agp/

mv agpgart.ko agpgart.ko.disabled

then updating initrd with:

update-initramfs -k `uname -r` -u

After that i had to reboot and everything was ok.

Thank you for your help, hope that this alternate way will help someone.

cheers

Dan
mail me: danba AT suppcom DOT cz

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search