Service Monitor
based on the xPactor Frameworks functionswhat is it?
Typically, IT monitoring has been split into the classic (and most independent) individual disciplines such as network monitoring or system monitoring (such as application and database monitoring). The focus was mostly on a systems-oriented infrastructure monitoring, attempting to attain the most comprehensive and detailed monitoring of individual components.
Modern services are complex, using ever less dedicated infrastructure (SaaS, PaaS). With an infrastructure-based monitoring, the effects of disturbances and impairments on the a given service cannot easily be ascertained. The impact they have on the fulfillment of the service provider’s contractual obligations is even less clear. Hence service providers struggle to see if they actually fulfill contractual obligations, and hence cannot assess the value of services at risk.
Most of the currently deployed infrastructure-monitoring solutions were not designed to bridge the infrastructure-service gap, and hence cannot deliver in a simple manner.
Our approach is to model the services and their component elements, and the impact chains and dependencies. Each element has defined rules and assigned actions. An element is stimulated by events from the IT monitoring, and the effects are cascaded through the dependency tree to the commercial service. Here the effects (the impact) are visualized intelligently to facilitate assessement.
- It takes an integrated view of the contract (obligation to the customer ) all the way to the components / resources that provide the service components
- Each Service Provider structure services and service delivery processes differently, and not all are equally controllable
- Networks, servers, databases, etc.
- People and Skills
- Licenses and rights processes, roles and responsibilities
- The market is unsure, challenges abound but the solutions are not self-evident
- The new “cloud” is scary -> many business-critical issues are not resolved (security, continuity, responsibility, etc.)
- Lack of best practice approaches
- Tool vendors promise much and can not really deliver
- How should the gap between service and infrastructure be bridged?
- Megatrends force service providers to reallign control of service delivery
- cloud and virtualization force a separation of service and infrastructure
- building and operating infrastructure requires a high capital outlay, especially if usage can be sourced cheaper as a service
- constant control of service delivery is a must
- Separation in infrastructure management and service management
- Service management guarantees the benefits to the client
- Infrastructure management guarantees the function to the service provider
- Processes connect the two worlds
- build a common view across companies, departmental boundaries and responsibilities
- ease troubleshooting
- Processes separate the two worlds
- Available tools originate in the infrastructure monitoring world, and are based on technologies and products used there.
- they follow the buttom-up approach
- are complicated to use for rule definition and assignment
- they are expensive and cumbersome
- always show a technological bias
- have difficulties to model virtual structures
- missing functionality
- cannot cope with different views of individual infrastructure elements (eg location-based view, contractual view, product management perspective, SLA-view, etc.)
- do not place the customer in focus, but rather the infrastructure
- cannot combine / correlate different information content (eg, events from probes with current SLA status)
critical success factors for Service-Monitoring
As service provider you must always be answer to answer the following questions:
WHO needs and WHO uses WHAT?
WHAT is the current (service-) state?
WHAT has been agreed?
WHERE is the service handed over?
HOW is the Service configured?
WHO pays WHAT?
WHO is doing WHAT NOW?
Solution must be robust
- No data loss
- short maintainance times
- data security
Solution must perform
- quick loading of new service or capacity models: minutes for models with 50.000 or more elements
- data aggregation in big models should be in “near real time”: aggregation and propagation in range of seconds for big models
Solution must be easy to use
- a few clics to reach relevant data
- results must be comprehensible, the calculation understandable
Solution must scale
- from little to big trees and data volumes
Solution is based on open Interfaces
- Usage of data communication standards
- minimal amount of interfaces
Functional specifications of the Service Monitor
- core functionality modelled and operational (prototype exists)
- propagation of status changes in a dependency tree (from leaf to root, propagation rules: AND, OR, ORYELLOW)
- REST-Interface for element (Service Assets) stimulation
- status changes to RED, ORANGE or GREEN
- screens for function control and core functionality explanation
- alarm console (infrastructure events through REST), service alarms
- birds-eye view: visualization of the entire forest (neutral color = all clear, elements colored = alarms or events exist)
- drill-down option into the tree and the elements
- triggering of actions based on set rules, using an external interface
- integration with an Enterprise Service Bus
- fully integrated and presentable online model
- increase data quantity that can be handled (number of events and elements)
- update and load new structures as a master plan / service model
- mapping “External Services” and shared components including linking of differing trees (recursion)
- manual / automated triggering of activities
- ticket submission
- send tweet / RSS
- changing the element behavior
- blacklist / test capability
- activate / de-activate individual elements
- user and role Management
- application within the application
- application data elements
- additional interfaces
- additional dashboard
- additional propagation- and rule sets
- architecture change to allow generalisation of rules
- adaptation and visualisation of multiple element stati (alarm status and SLA-status)
- individualised branding
- installability of the system
- additional computational functionality per element
- usage of service times
- change correlatoin
- translation tables for mapping of external to internal events
- usage of broadcast events
- targeted stimulation of elements in one or more trees (what-if- scenarios)
- grouping function for weighing and grouping of entry signals
- extended reporting abilities
- personalisation of dashboards
- extended DB interfaces
- GUI to adapt loaded dependency trees
- 2 versions per year (Main, Update)
- 1 user conference per year
- advisory board
- quarterly webcasts