First let me start with my setup and the mods I had to do to get this far.
I'm running a Dell M1000e with currently 2 M600 blades with dual quad core 3.16Ghz procs and 4gigs of ram each. Since it's a blade server lots of the components are virtual or assigned. Like KB, Mouse, I don't have any HDDS at all my plan is to connect a SAN or NAS to the master once I'm ready. The Nics are BNX2 which needs the firmware-bnx2 pkg to work. Adding this to the master was fairly easy, I first added it to the list of pkgs, but since by default live-helper only access the main branch and not the contrib or non-free and the firmware pkg is non-free I had to figure out how to add those, which I find out was adding.
<lh_config --categories "main contrib non-free">
after <lh_config -a "$ARCHITECTURE"> in both the master and the node setup in the make script.
That then got the master booting and I could run setup_pelican, once that was done I was able to test boot the node, it found the PXE and booted, but was missing the bnx2 firmware, this took me much longer to figure out, but I added a second list of pkgs to include for nodes and got that solved. I used the same form as for the master so it's configurable at the top and I'd like to contribute these back if anyone wants.
Now the node finds the nic and starts to load, but it stops and I let it sit over night, it's not locked up ctrl-alt-del still works but it won't fulling boot either. I have a screen shot of where it's sitting which I'll include if I figure out how but here is the text it's sitting at just incase I can't attach.
Driver 'sd' needs updating - please use bus_type methods
sd 1:0:0:0: [sda] Attached SCSI removable disk
Driver 'sr' needs updating - please use bus_type methods
sr0: scsi-1 drive
Uniform CD-ROM driver Revision: 3.20
sd 1:0:0:0: Attached scsi generic sg0 type 0
sr 0:0:0:0: Attached scsi generic sg1 type 5
Remember I have no CD-Rom, no HDD, no USB drive, no floppies, nothing but CPUs and Ram on these blades, the master is booting from a USB CD-Rom.
So my question is since the master and the node are 100% the same hardware why would one fully boot and the other won't where can I debug or find the differences in the boot process? I really need to get the cluster up ASAP so any info or leads would help very much, Once i have this working I'm going to work on auto launching setup_pelican on the master and have nodes auto join and I"ll be sending that back upstream, but I need a node first ;)
p_49.jpg