[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: halt script in lfs-base



Sean Sweda <sweda@xxxxxxxxx> responded:
> In-Reply-To: <Pine.LNX.4.61.0504262252120.28814@xxxxxxxxxxxxxxxxxxxxxxxxx>
> References: <Pine.LNX.4.61.0504262252120.28814@xxxxxxxxxxxxxxxxxxxxxxxxx>
> Mime-Version: 1.0 (Apple Message framework v728)
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> Message-Id: <43CED2E8-2508-42C4-9F07-E304A308EA71@xxxxxxxxx>
> Cc: umce.linux@xxxxxxxxx
> Content-Transfer-Encoding: 7bit
> From: Sean Sweda <sweda@xxxxxxxxx>
> Subject: Re: halt script in lfs-base
> Date: Tue, 26 Apr 2005 23:17:33 -0400
> To: Michael C Garrison <mcgarr@xxxxxxxxx>
> X-Mailer: Apple Mail (2.728)
> 
> 
> On Apr 26, 2005, at 10:56 PM, Michael C Garrison wrote:
> 
> > I dont know if this has been answered before, but is there a  
> > particular reason the halt script in the lfs-base has halt  
> > commented out and /bin/bash added?
> 
> because that script can be called at various points because of boot  
> script errors, and the halt leaves the machine unresponsive on serial  
> console, requiring physical handling of the machine (power-cycle)...  
> this is very undesirable for remote management
> 
> Sean
> 

Just to be a bit more clear about this:

a *panic* leaves the machine unresponsive with a serial console.
It's also unresponsive with a vga console, but usually the big
red toggle (small black button, or whatever) is closer at hand.

The "halt" command in its original incarnation (vax/4.2bsd)
left the machine in the boot firmware.  On the original hardware,
the CPU was in fact "halted", and the service processor took over
the console.  On the PC, no such equivalent state exists, so
the exact behavior isn't well defined.  The logical equivalent
would be the "setup" menu in the bios, but there's no programmatical
way for the OS to terminate and transfer control to that menu.
The three choices that result then are:
[1]	power-cycle the machine.  This is "halt -p".
[2]	reboot the machine.  This is "reboot".
[3]	hang in a tight loop with interrupts turned off.
	Ie, the same thing as a panic, above.
and well, as it happens, there is also
[4]	pretend nothing happened and keep right on going.

I'm not sure that the linux halt command ever does 3-- the logic
I see in the kernel appears to do 4 instead.

The rc scripts we inherited from LFS tried to do 1.  Sometimes this
works.  Specifically it works fine on a uniprocessor machine with APM.
APM isn't reentrant or MP aware, so it doesn't work with SMP.  The
updated replacement for APM is ACPI, which does lots of other cool
stuff.  The linux 2.4 support for ACPI isn't (or at least wasn't as of
2.4.26) ready for prime-time.  Most of our machines are configured to
only use ACPI to determine hyperthreading support.
Some of the
problems with using ACPI for other things is related to the maturity of
the code in linux, and some is related to problems or machine
dependencies in the BIOS code.  So, if we want to try to make "halt -p"
work on our current machines, we'd need to retest "halt -p" on all 5
hardware platforms, and we might need bios upgrades on some of them.

The result is all this is that "halt -p" on most of our machines does
case 4 "nothing", which is generally not a good thing if the problem that
caused the machine to try to reboot was a toasted filesystem.
That's why the "bash" prompt is there instead.  Note that about
the only legal thing you can do at that point is power cycle the
machine, or "reboot -n -f".  (Just reboot sync's the filesystem,
not necessarily what you really want to do if fsck just fixed things.)

				-Marcus