Event monitoring tools for proactive network management
NNM Event Management is a proactive network management tool that helps administrators to view the status of all network devices via a centralized dashboard, and receive alerts about potential problems while filtering out unimportant issues.
When managing a large and extensive IT network, the network managers will, on a daily basis, receive many alerts about irregular statuses of devices, such as high CPU usage, memory usage, traffic usage or insufficient bandwidth. These alerts are useful in keeping the administrators aware of the overall health of the network. The network manager’s job is to take the data from all of these alerts in order to analyze the overall status of the network, resolve any potential problems and plan for the future. However, the alerts can cause annoyances when they are sent repeatedly and too often, and can lead to false alarms.
The NNM Event Management was developed to monitor all of the events in the corporate IT infrastructure efficiently to prevent against false alarms or alerts occurring too often. Features such as event suppression allow the software to collect events that occur repeatedly and display a one-line alert rather than notifying the network manager each time the events occur. In instances in which multiple alerts are issued, the software allows the network manager to hold alerts until certain conditions are met, for example, if the CPU usage is at more than 80% for longer than five consecutive minutes. This helps to prevent clutter on the dashboard and allows the network manager to focus only on the alerts that matter.
In addition, NNM Event Management features event correlation and root cause analysis tools. Consider the example of an organization that has two VAN links that connect to the headquarters. If one of the VAN links is down on separate occasions and the alerts are issued at separate times, they risk becoming lost in the many alerts issued by the system. This could result in the network manager missing the important alerts, and subsequently the office would be disconnected from the headquarters for an extended period. NNM Event Management allows the network manager to create “rules” that if both VAN links are down, an alert would be sent saying the particular office has been disconnected from the headquarters.
The Data from the network latency problems are also recorded and stored for future reference, which can help the administrators to investigate reports of delays that occurred at a previous time and implement preventative measures to ensure the issues do not resurface.
This feature sums up the repeated logs of events and problems, displaying them as a single line in the Admin View. This results in an uncluttered screen, making it easy to notice other logs indicating more important problems, such as Link Down. In some switches or routers in the network, an event occurs every 30 seconds. This feature therefore allows executives to see a summary of all alarms, simplifying the analysis.
This feature helps to prioritize different events, so that the most important problems can be solved first. For example, if the setting for an alarm interface is 10 but it occurs more than the setting value, the network will be identified as unstable. This allows administrators to fix the problem at an early stage, before it is detected or experienced by customers. This feature is useful for organizations that place a high priority on data security, such as banks.
Root Cause Analysis
This feature helps in analyzing the source of a problem that is causing event alarms.
Automatic Event Clearing
Events mostly occur in the network in pairs, i.e. problem and recovery. This feature helps customers to see which problems have been solved, thereby reducing the administrator’s workload.
This feature helps administrators and engineers to solve problems more efficiently. This is essential because some alarms are far more important than others. Leaving an important problem unsolved could lead to more severe problems in the future that have a high impact on the network. It is capable of performing root cause analysis and allows users to create rules for event correlation, so that they are able to be alerted of a problem as soon as it occurs.