I've been watching our client's network through Cinemon for the last 3 hours now, trying to figure out how to go about making it easier to "see into" what's going on (looks like they've had some serious outage, now coming back online). No great insights as of yet, though my not actually knowing what the failure was makes it harder to see where I could be presenting that data better.
Did notice a few problems that I'm going to fix now that the crisis seems to be largely resolved; mostly just formatting issues, though there's a property which has been mis-labelled, which makes the statistics look seriously off. Also figured out a particular visualisation that needs to be done for the flat-hierarchic-stats-views (need a decent name for that).
At one point there was a slowdown in response time from Cinemon, not sure if there were just a lot of users or if the intensified scanning during the "transitional" periods was slowing it down. Response time was probably over 5 seconds for that one request. Not acceptable.
Still, I'm finding it fascinating to be able to look into the network in so many ways as it is failing and then recovering. We should be simulating failures more in the testing phase, as it's only when things are failing that you start to notice things such as the high micro-reflection values in certain trunks of the network and consider flagging such things.
Pingbacks are closed.