Nuts and Bolts

NICS and NOCS

        Network Operations Center
                Monitoring - Outages and Intermittants
                Trouble ticket responses
                Statistics on Traffic
                Statistics on problems
        Network Information Center
                Information and Documentation
                Training
                Packaging
        Service Center as a Front Door

Support Model

        Ground zero
        Service Center
        Technical Expertise

Help Layering

        Selfhelp
        generalist
        specialist
        expert

Service Center Issues

        resolving most
        tracking
        analyzing
        accounting
        reengineering

Service metrics

        MTTF
        MTTR
        Open tickets
        MT to close tickets

Diagnostic processes - SOP

        isolation
        identification
        interaction
        resolution

Standard NOC problem: Site X is down

        Notification of Problem
                by display on screen
                by page
                by phone from site
        Ack problem so other know its being worked on
        Open troubleticket
        Run checksite
                checks upstream site/link/lan
                uses pings and snmp to upstream to see if interface is up
        If line is down or can't tell - call end site
        Call upsite
        try basic loops and loopbacks
        call circuit provider
        check routing locally
        check routing nationally

Standard NOC problem: Performance or Intermittant

Notification

Problem Solving

        to solve problems - process
        to solve problems - clues
        to avoid problems

Problem-solving processes

        stay cool
        think logically
        be systematic
        keep notes as you go
        avoid preconceived notions
        have a plan of attack - always TEST TO A GOAL
        what you do, undo.  what you undo, do.
        anticipate the possibility of several problems
        if it was working, what changed?
        keep the processes simple.
        the problem may be outside

Clues:

        Hardware problems are less circumstantial; more random
        software problems have a repetitive quality
        who's affected?
        where there's smoke, there's fire
        what's still working? - test what's not broken
        what does normal look like
        look at the physical things first.
        use your standard diagnostic suite.
        usually only one thing breaks at a time.

Avoid the problem

        keep the spares current
        determine normal
        test beyond the limits
        manage user expectations
        write down the lessons