27 Feb 2018 - by 'Maurits van der Schee'
The Dell R730xd is a really awesome machine. It has 2 CPU's and you can put 24 drives in it. It also sports a high performing H730 (LSI MegaRaid) controller. It is certified for Ubuntu and CentOS, so you have guaranteed smooth sailing when installing Linux... right? Yes, you do!
But not when you install an Intel 82572EI based HP 110T PCI-e gigabit ethernet card in the machine like I did. In that case newer kernels won't boot. They will try to load the kernel, but then give a black screen and possibly a blinking cursor, but nothing will happen. This has nothing to do with the R730, but everything with this specific PCI-e card, the power saving features in new kernels and how the (e1000e) kernel driver handles these features.
First we can try to remove the cards (or disable them in the BIOS). If you don't know in which slots the cards are located the R730xd has a nice BIOS based hardware scan to find out. The BIOS also supports disabling PCI slots, so we can virtually remove the card. After removing the cards the system boots as expected. This leads us to explore the card's kernel driver, the "e1000e".
We had already figured out that booting with grub parameter "
acpi=off" would work, but this disables all power management, having sever side effects. This setting will cause you to have only one CPU thread (of 48) active, so that is not really an option. It does indicate some power management is causing problems. Other people where helped by turning ASPM off.
Active State Power Management (ASPM) is a power management protocol used to manage PCI Express-based (PCIe) serial link devices as links become less active over time. It is normally used on laptops and other mobile Internet devices to extend battery life. - Wikipedia
It turned out that ASPM caused the malfunction of the Intel based PCI-e card. We decided that ASPM was not useful in a datacenter setting, since it is mainly designed for laptops to extend battery life. ASPM can be disabled with grub parameter "pcie_aspm=off". The following steps disable ASPM permanently in the boot configuration (source):
/etc/default/grub" to contain "
pcie_aspm=off" in "
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg"
Hopefully this post will save you some time.