2009-Jan-10
So for at least five years or more, I have had a wide variety of hardware power down when doing pkgsrc builds. It has been quite frustrating as a pkgsrc developer and maintainer and especially when I need some specific software installed, or system was in use for other tasks, and especially long fsck was a hassle.
I finally tracked this down to be overheated systems. I used mbmon and then later envstat to report the temperatures. I wrote scripts over the past few years to suspend my builds when temperature was too high and continue them when it dropped, but as you can imagine that is a slow way to build and not reliable, since sometimes stuff got built out of order or incomplete.
So this week, I received a hint that I should turn down my CPU frequency. I had heard of this before but never put two and two together. For example:
$ sysctl machdep.powernow machdep.powernow.frequency.target = 2000 machdep.powernow.frequency.current = 2000 machdep.powernow.frequency.available = 800 1600 1800 2000
I already had powerd running since I use it for my power button. So I setup /etc/envsys.conf to set up the hardware sensor monitors:
acpitz0 { refresh-timeout = 5s; sensor0 { critical-max = 179F; warning-max = 170F; warning-min = 159F; } }
And I edited my /etc/powerd/scripts/sensor_temperature script to lower or raise my machdep.powernow.frequency.target sysctl tunable. And it worked! The temperature doesn't get too high now and the system continues to run.
So I have been documenting this in my "Power management and hardware monitoring" chapter in my upcoming "Getting started with NetBSD" book. If you'd like to read, please let me know.
I also have around ten questions about the envsys framework. If you have an answers, I'd much appreciate it.