Catcher of last resort (First thing to be created?)


Thinking, from now on, I should always code a catcher-of-last-resort first when working on a project. That is, a script that stands outside the long-running application and pings it every X seconds to make sure it's alive, and if it isn't, forces a restart.

We need to upgrade the hardware for Cinemon, particularly the RAM and processor, so it's not that surprising that it ran out of RAM when I let it run for a few solid days last week. What was surprising was that it ran out so quickly.

The graph shows that the memory usage was rising as expected for 12 hours, flat for almost 12 hours just under the memory limit of the machine, then suddenly started swapping liked mad and ran out of RAM with no (logged) events occuring. I'd guess it was someone running one of the heavier web-queries, but you'd think I'd have heard about someone running something and then finding the server totally offline for 2 days.

Anyway, now have a catcher-of-last-resort for the demo that sends me email if the server winds up getting rebooted. Still want to know why the memory spiked so high. Might be able to rework whatever was going on at the time to be less memory intensive.

Comments

Comments are closed.

Pingbacks

Pingbacks are closed.