航空论坛_航空翻译_民航英语翻译_飞行翻译

标题: Comparative Risk Assessment Form [打印本页]

作者: 帅哥    时间: 2008-12-21 21:09:33     标题: Comparative Risk Assessment Form

FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-1 Appendix B Comparative Risk Assessment Form FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-2 SEC TRACKING No: This is the number assigned to the CRA by the FAA System Engineering Council (SEC) CRA Title: Title as assigned by the FAA SEC SYSTEM: This is the system being affected by the change, e.g. National Airspace System Initial Date: Date initiated SEC date: Date first reviewed by the SEC REFERENCES: A short list or references. If a long list is used can be continued on a separate page. SSE INFORMATION SSE Name/Title: Name and title of person who performed or led team Location: Address and office symbol of SSE Telephone No.: SUMMARY OF HAZARD CLASSIFICATION: (worst credible case; see List of Hazards below for individual risk assessments) Option A (Baseline): Place the highest risk assessment code for the baseline here Proposed Change Option(s) B-X: Place the highest risk assessment code for the alternatives here. DESCRIPTION OF (Option A) BASELINE AND PROPOSED CHANGE(s) Option A: Describe the system under study here in terms of the 5 M Model discussed in chapter 2. Describe the baseline (or no change) system and each alternative. This section can be continued in an appendix if it does not fit into this area. Avoid too much detail, but include enough so that the decision-maker has enough information to understand the risk associated with each alternative. SEVERITY: 1 CATASTROPHIC – Death, system or aircraft loss, permanent total disability 2 HAZARDOUS - Severe injury or major aircraft or system damage PROBABILITY 3 MAJOR - Minor injury or minor aircraft or system damage SEVERITY A B C D 4 MINOR – Less than minor injury or aircraft or system damage 1 5 NO SAFETY EFFECT 2 PROBABILITY: 3 A PROBABLE - Likely to occur in lifetime of each system (> 1E-5) 4 B REMOTE – Possible for each item, several for system (< 1E-5 ) 5 No risk C EXTREMELY REMOTE – Unlikely for item, may occur few in system (< 1E-7) D EXTREMELY IMPROBABLE – so unlikely, not expected in system (<1E-9) FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-3 HAZARD LIST No. Hazard Condition RISK ASSESSMENT CODE (RAC) List the hazard conditions here. Enter the risk assessment codes for each hazard – alternative to the right. Baseline Option A Option B Option C Option D Option E 1 Loss of communication between air traffic controllers and aircraft (flight essential) 1D 1D 1C 1C 1B 2 Loss of communication between air traffic controllers in different domains (ARTCC to ARTCC, ARTCC to TRACON, etc.) 1D 3 Loss of communication between air traffic controllers and flight service (flight plans, etc.) 4 Loss of communication between air traffic & ground controllers and vehicles in the airport movement area 5 Loss of the means for operator and flight service to communicate information relative to planned flight 6 Loss of the capability to detect, classify, locate, and communicate adverse weather such as: thunderstorms, rain and snow showers, lightning, windshear, tornadoes, icing, low visibility or ceilings, turbulence, hail, fog, etc. 7 Loss of navigation functions providing aircrew with independently determined 3D present position of the aircraft, defined routes, destination(s), and navigation solution (course, distance) to destination. 8 Loss of Air traffic control determination of 3D location, velocity vector, and identity of each aircraft operating in a domain. 9 Loss of Air traffic control determination of location, identity, and velocity vector of each participating vehicle operating in the airport movement area domain. FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-4 10 Loss of approach guidance to runway. Precision – horizontal and vertical guidance; Nonprecision – horizontal guidance, vertical procedures. 11 Loss of ground vehicle or aircraft operator independent determination of present position, destination(s), and navigation solution on the airport movement area. 12 Hazardous runway surface precludes safe takeoff or touchdown and rollout. FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-5 SAFETY ASSESSMENT SUMMARY (Conclusions/Recommendations) Summarize your conclusions. Which option is best (and 2 nd , 3 rd , etc) and why. Include enough detail to appropriately communicate with the audience. Recommendations: Provide additional controls to further mitigate or eliminate the risks. Follow the safety order of precedence, i.e., (1) eliminate/mitigate by design, (2) incorporate safety features, (3) provide warnings, and (4) procedures/training. See Chapter 4 for further elaboration of the Safety Order of Precedence). Define SSE requirements for reducing the risk of the design/option(s). FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-6 HAZARD CLASSIFICATION RATIONALE Do one of these sheets for each hazard 1 Hazard: Loss of communication between air traffic controllers and aircraft Summarization Summarize the risk assessments for hazard No. 1 for each alternative that was examined. Baseline Option A Severity: 1-Catastrophic Probability: E-Improbable Assessment: Medium Risk Option B Severity: NA Probability: NA Assessment: NA Severity Rationale for Severity: In this section explain how you came up with the hazard severity. This is where you will convince the skeptics that you were logical and objective. The hazard is a component of the hazardous conditions required for NMAC, CFIT, WXHZ, NLA, and RIA’s. For the baseline NAS system the severity of the “loss of communication” hazard is highly dependent upon the environmental conditions surrounding the event and is therefore categorized as a flight essential function of the NAS. In a “day, VFR, low density” environment the severity is very low resulting in minor effects. During a night/IFR high-density environment the occurrence of this hazard has a good chance of becoming catastrophic. The reason for this is that the purpose of this communication system is to provide aircraft in a region of airspace with direction, clearance, and other services provided by Air Traffic Control (ATC). In an environment of low outside visibility and many aircraft this function becomes critically important to air vehicle separation. The following points highlight the severity: Air Traffic Controllers (ATCs) are able to observe wide volumes of space using airspace surveillance systems. These systems enable the ATCs to observe the location, velocity, and sometimes the identity of the aircraft detected by their systems. The ATCs are trained to direct the flow of traffic safely to prevent midair collisions, flight following, approach clearances, and emergency assistance. Loss of the entire communication system would result in the rapid onset of chaos as approaching aircraft attempt to land and enroute aircraft converge on navigation waypoints and facilities. The risk of mid air is high in these conditions. In the event that a loss of communication occurs, then complex emergency procedures are established for IFR and VFR aircraft. The procedures are necessarily complex and if followed should result in a safe landing, but once initiated can be difficult to follow especially for a single pilot in IFR. The AIM states “Radio communications are a critical link in the ATC system. The link can be a strong bond between pilot and controller or it can be broken with surprising speed and disastrous results”. i FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-7 Probability Rationale for Probability: Use this section to explain how you derived the probability. This may be quantitative or qualitative. In general, the higher risk items will require more quantitative analysis than low or medium risk hazards. The example below is qualitative. Many controls exist to preclude this hazard from occurring- Multiple radios both in the aircraft and in the ATC facility provide redundant communication channels from aircraft to ATC. In the event of failure multiple facilities can be used including FSS, other ARTCC, TRACON, or ATCC, even airborne telephones. 1. Planning systems assist in keeping aircraft at different altitudes or routes. Emergency procedures exist to ensure an aircraft in “lost communication” will not converge on another aircraft’s flight path. 1 Federal Aviation Administration. (1995). Airman’s Information Manual. Para. 4-2-1. FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form December 30, 2000 B-8 Severity Definitions Catastrophic Results in multiple fatalities and/or loss of the system Hazardous Reduces the capability of the system or the operator ability to cope with adverse conditions to the extent that there would be: Large reduction in safety margin or functional capability Crew physical distress/excessive workload such that operators cannot be relied upon to perform required tasks accurately or completely (1) Serious or fatal injury to small number of occupants of aircraft (except operators) Fatal injury to ground personnel and/or general public Major Reduces the capability of the system or the operators to cope with adverse operating condition to the extent that there would be – Significant reduction in safety margin or functional capability Significant increase in operator workload Conditions impairing operator efficiency or creating significant discomfort Physical distress to occupants of aircraft (except operator) including injuries Major occupant illness and/or major environmental damage, and/or major property damage Minor Does not significantly reduce system safety. Actions required by operators are well within their capabilities. Include Slight reduction in safety margin or functional capabilities Slight increase in workload such as routine flight plan changes Some physical discomfort to occupants or aircraft (except operators) No Safety Effect Has no effect on safety
作者: 帅哥    时间: 2008-12-21 21:10:05     标题: Government References (

FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety December 30, 2000 C-1 Appendix C REFERENCES FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety December 30, 2000 C-2 GOVERNMENT REFERENCES FAA Order 1810, Acquisition Policy FAA Order 8040.4 FAA Safety Risk Management FAA Advisory Circular 25.1309 (Draft), System Design and Analysis, January 28, 1998 RTCA-DO 178B, Software Considerations In Airborne Systems And Equipment Certification, December 1, 1992 COMDTINST M411502D, System Acquisition Manual, December 27, 1994DODD 5000.1, Defense Acquisition, March 15, 1996 DOD 5000.2R, Mandatory Procedures for Major Defense Acquisition Programs and Major Automated Information Systems, March 15, 1996 DOD-STD 2167A, Military Standard Defense System Software Development, February 29, 1988 MIL-STD 882D, System Safety Program Requirements, February 10, 2000 MIL-STD 498, Software Development and Documentation, December 5, 1994 MIL-HDBK-217A, “Reliability Prediction of Electronic Equipment,” 1982. MIL-STD-1629A “Procedures for Performing a Failure Mode, Effects and Criticality Analysis,” November 1980. MIL-STD-1472D, “Human Engineering Design Criteria for Military Systems, Equipment and Facilities,” 14 March 1989. NSS 1740.13, Interim Software Safety Standard, June 1994 29 CFR 1910.119 Process Safety Management, U.S. Government Printing Office, July 1992. Department of the Air Force, Software Technology Support Center, Guidelines for Successful Acquisition and Management of Software-Intensive Systems: Weapon Systems, Command and Control Systems, Management Information Systems, Version-2, June 1996, Volumes 1 and 2 AFISC SSH 1-1, Software System Safety Handbook, September 5, 1985 Department of Defense, AF Inspections and Safety Center (now the AF Safety Agency), AFIC SSH 1-1 “Software System Safety,” September 1985. Department of Labor, 29 CFR 1910, “OSHA Regulations for General Industry,” July 1992. Department of Labor, 29 CFR 1910.119, “Process Safety Management of Highly Hazardous Chemicals,” Federal Register, 24 February 1992. Department of Labor, 29 CFR 1926, “OSHA Regulations for Construction Industry,” July 1992. Department of Labor, OSHA 3133, “Process Safety Management Guidelines for Compliance,” 1992. FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety December 30, 2000 C-3 Department of Labor, OSHA Instructions CPL 2-2.45A, Compliance Guidelines and Enforcement Procedures, September 1992. Department of Transportation, DOT P 5800.5, “Emergency Response Guidebook,” 1990. Environmental Protection Agency, 1989d, Exposure Factors Handbook, EPA/600/8-89/043, Office of Health and Environmental Assessment, Washington, DC 1989. Environmental Protection Agency, 1990a, Guidance for Data Usability in Risk Assessment, EPA/540/G-90/008, Office of Emergency and Remedial Response, Washington, DC 1990. COMMERICIAL REFERENCES ACGIH, “Guide for Control of Laser Hazards,” American Conference of Government Industrial Hygienists, 1990. American Society for Testing and Materials (ASTM), 1916 Race Street, Philadelphia, PA. 19103 ASTM STP762, “Fire Risk Assessment” American Society for Testing Materials, 1980. EIA-6B, G-48, Electronic Industries Association, System Safety Engineering In Software Development1990 IEC 61508: International Electrotechnical Commission. Functional Safety of Electrical/Electronic/ Programmable Electronic Safety-Related Systems, December 1997 EIC 1508 -(Draft), International Electrotechnical Commission, Functional Safety; Safety-Related System, June 1995 IEEE STD 1228, Institute of Electrical and Electronics Engineers, Inc., Standard For Software Safety Plans, 1994 IEEE STD 829, Institute of Electrical and Electronics Engineers, Inc., Standard for Software Test Documentation, 1983 IEEE STD 830, Institute of Electrical and Electronics Engineers, Inc., Guide to Software Requirements Specification, 1984 IEEE STD 1012, Institute of Electrical and Electronics Engineers, Inc., Standard for Software Verification and Validation Plans, 1987 ISO 12207-1, International Standards Organization, Information Technology-Software, 1994 Joint Software System Safety Committee, "Software System Safety Handbook", December 1999 NASA NSTS 22254, “Methodology for Conduct of NSTS Hazard Analyses,” May 1987. National Fire Protection Association, “Flammable and Combustible Liquids Code.” National Fire Protection Association, “Hazardous Chemical Handbook” FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety December 30, 2000 C-4 National Fire Protection Association, “Properties of Flammable Liquids, Gases and Solids”. National Fire Protection Association, “Fire Protection Handbook.” Nuclear Regulatory Commission NRC, “Safety/Risk Analysis Methodology”, April 12, 1993. Joint Services Computer Resources Management Group, “Software System Safety Handbook: A Technical and Managerial Team Approach”, Published on Compact Disc, December 1999. Society of Automotive Engineers, Aerospace Recommended Practice 4754: “Certification Considerations for Highly Integrated or Complex Aircraft Systems”, November 1996. Society of Automotive Engineers, Aerospace Recommended Practice 4761: “Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment”, December 1996. System Safety Society: System Safety Analysis Handbook, July 1997. INDIVIDUAL REFERENCES Ang, A.H.S., and Tang, W.H., “Probability Concept in Engineering Planning and Design”, Vol. II John Wiley and Sons, 1984. Anderson, D. R., Dennis J. Sweeney, Thomas A. Williams, “An Introduction to Management Science Quantitative Approaches to Decision Making.” West Publishing Co., 1976. Bahr, N. J., “System Safety Engineering and Risk Assessment: A Practical Approach”, Taylor and Francis 1997. Benner, L. “Guide 7: A Guide for Using energy Trace and Barrier Analysis with the STEP Investigation System”, Events Analysis, Inc., Oakton, Va., 1985. Briscoe, G.J., “Risk Management Guide”, EG&G Idaho, Inc. SSDC-11, June 1997. Brown, M., L., “Software Systems Safety and Human Error”, Proceedings: COMPASS 1988 Brown, M., L., “What is Software Safety and Who’s Fault Is It Anyway?” Proceedings: COMPASS 1987 Brown, M., L., “Applications of Commercially Developed Software in Safety Critical Systems”, Proceedings of Parari ’99, November 1999 Bozarth, J. D., Software Safety Requirement Derivation and Verification, Hazard Prevention, Q1, 1998 Card, D.N. and Schultz, D.J., “Implementing a Software Safety Program”, Proceedings: COMPASS 1987 Clark, R., Benner, L. and White, L. M., “Risk Assessment Techniques Manual,” Transportation Safety Institute, March 1987, Oklahoma City, OK. FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety December 30, 2000 C-5 Clemens, P.L. “A Compendium of Hazard Identification and Evaluation Techniques for System Safety Application,” Hazard Prevention, March/April, 1982. Cooper, J.A., “Fuzzy-Algebra Uncertainty Analysis,” Journal of Intelligent and Fuzzy Systems, Vol. 2 No. 4 1994. Connolly, B., “Software Safety Goal Verification Using Fault Tree Techniques: A Critically Ill Patient Monitor Example”, Proceedings: COMPASS 1989 De Santo, B., “A Methodology for Analyzing Avionics Software Safety”, Proceedings: COMPASS 1988 Dunn, R., Ullman, R., “Quality Assurance For Computer Software”, McGraw Hill, 1982 Forrest, M., and McGoldrick, Brendan, “Realistic Attributes of Various Software Safety Methodologies”, Proceedings: 9 Th International System Safety Society, 1989 Hammer, W., R., “Identifying Hazards in Weapon Systems – The Checklist Approach”, Proceedings: Parari ’97, Canberra, Australia Hammer, Willie, “Occupational Safety Management and Engineering”, 2 Ed., Prentice-Hall, Inc, Englewood Cliffs, NJ, 1981. Heinrich, H.W., Petersen, D., Roos, N., “Industrial Accident Prevention: A Safety Management Approach”, McGraw-Hill, 5 Th Ed., 1980. Johnson, W.G., “MORT –The Management Oversight and Risk Tree,” SAN 821-2, U.S. Atomic Energy Commission, 12 February 1973. Kije, L.T., “Residual Risk,” Rusee Press, 1963. Kjos, K., “Development of an Expert System for System Safety Analysis”, Proceedings: 8 Th International System Safety Conference, Volume II. Klir, G.J., Yuan, B., “Fuzzy Sets and Fuzzy logic: Theory and Applications”, Prentice Hall P T R, 1995. Kroemer, K.H.E., Kroemer, H.J., Kroemer-Elbert, K.E., “Engineering Physiology: Bases of Human Factors/Ergonomics”, 2 Nd. Ed., Van Nostrand Reinhold, 1990. Lawrence, J.D., “Design Factors for Safety-Critical Software”, NUREG/CR-6294, Lawrence Livermore National Laboratory, November 1994 Lawrence, J.D., “Survey of Industry Methods for Producing Highly Reliable Software”, NUREG/CR-6278, Lawrence Livermore National Laboratory, November 1994. Leveson, N., G, “SAFEWARE; System Safety and Computers, A Guide to Preventing Accidents and Losses Caused By Technology”, Addison Wesley, 1995 FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety December 30, 2000 C-6 Leveson, N., G., “Software Safety: Why, What, and How, Computing Surveys”, Vol. 18, No. 2, June 1986. Littlewood, B. and Strigini, L., “The Risks of Software”, Scientific American, November 1992. Mattern, S.F. Capt., “Defining Software Requirements for Safety-Critical Functions”, Proceedings: 12 Th International System Safety Conference, 1994. Mills, H., D., “Engineering Discipline for Software Procurement”, Proceedings: COMPASS 1987. Moriarty, Brian and Roland, Harold, E., “System Safety Engineering and Management”, Second Edition, John Wiley & Sons, 1990. Ozkaya, N., Nordin, M. “ Fundamentals of Biomechanics: Equilibrium, Motion, and Defermation”, Van Nostrand Reinhold, 1991. Raheja, Dev, G., “Assurance Technologies: Principles and Practices”, McGraw-Hill, Inc., 1991. Rodger, W.P. “Introduction to System Safety Engineering”, John Wiley and Sons. Russo, Leonard, “Identification, Integration, and Tracking of Software System Safety Requirements”, Proceedings: 12 Th International System Safety Conference, 1994. Saaty, T.L., “The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation”, 2 Nd., RWS Publications, 1996. Stephenson, Joe, “System Safety 2000 A Practical Guide for Planning, Managing, and Conducting System Safety Programs”, Van Nostrand Reinhold, 1991. Tarrants, William, E. “The Measurement of Safety Performance”, Garland STPM Press, 1980. OTHER REFERENCES DEF(AUST) 5679, Army Standardization (ASA), “The Procurement Of Computer-Based Safety Critical Systems”, May 1999 UK Ministry of Defense. Interim DEF STAN 00-54: “Requirements for Safety Related Electronic Hardware in Defense Equipment”, April 1999. UK Ministry of Defense. Defense Standard 00-55: “Requirements for Safety Related Software in Defense Equipment”, Issue 2, 1997 UK Ministry of Defense. Defense Standard 00-56: “Safety Management Requirements for Defense Systems”, Issue 2, 1996 International Electrotechnical Commission, IEC 61508, “Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems”, draft 61508-2 Ed 1.0, 1998
作者: 帅哥    时间: 2008-12-21 21:10:34     标题: Structural Analysis and Formal Methods

FAA System Safety Handbook, Appendix D December 30, 2000 D - 1 Appendix D Structured Analysis and Formal Methods FAA System Safety Handbook, Appendix D December 30, 2000 D - 2 D.1 Structured Analysis and Formal Methods Structured Analysis became popular in the 1980’s and is still used by many. The analysis consists of interpreting the system concept (or real world) into data and control terminology, that is into data flow diagrams. The flow of data and control from bubble to data store to bubble can be very hard to track and the number of bubbles can get to be extremely large. One approach is to first define events from the outside world that require the system to react, then assign a bubble to that event, bubbles that need to interact are then connected until the system is defined. This can be rather overwhelming and so the bubbles are usually grouped into higher level bubbles. Data Dictionaries are needed to describe the data and command flows and a process specification is needed to capture the transaction/transformation information. The problems have been: 1) choosing bubbles appropriately, 2) partitioning those bubbles in a meaningful and mutually agreed upon manner, 3) the size of the documentation needed to understand the Data Flows, 4) still strongly functional in nature and thus subject to frequent change, 5) though “data” flow is emphasized, “data” modeling is not, so there is little understanding of just what the subject matter of the system is about, and 6) not only is it hard for the customer to follow how the concept is mapped into these data flows and bubbles, it has also been very hard for the designers who must shift the DFD organization into an implementable format. Information Modeling, using entity-relationship diagrams, is really a forerunner for OOA. The analysis first finds objects in the problem space, describes them with attributes, adds relationships, refines them into super and sub-types and then defines associative objects. Some normalization then generally occurs. Information modeling is thought to fall short of true OOA in that, according to Peter Coad & Edward Yourdon: 1) Services, or processing requirements, for each object are not addressed, 2) Inheritance is not specifically identified, 3) Poor interface structures (messaging) exists between objects, and 4) Classification and assembly of the structures are not used as the predominate method for determining the system’s objects. This handbook presents in detail the two new most promising methods of structured analysis and design: Object-Oriented and Formal Methods (FM). OOA/OOD and FM can incorporate the best from each of the above methods and can be used effectively in conjunction with each other. Lutz and Ampo described their successful experience of using OOD combined with Formal Methods as follows: “ For the target applications, object-oriented modeling offered several advantages as an initial step in developing formal specifications. This reduced the effort in producing an initial formal specification. We also found that the object-oriented models did not always represent the “why,” of the requirements, i.e., the underlying intent or strategy of the software. In contrast, the formal specification often clearly revealed the intent of the requirements.” D.2 Object Oriented Analysis and Design Object Oriented Design (OOD) is gaining increasing acceptance worldwide. These fall short of full Formal Methods because they generally do not include logic engines or theorem provers. But they are more widely used than Formal Methods, and a large infrastructure of tools and expertise is readily available to support practical OOD usage. FAA System Safety Handbook, Appendix D December 30, 2000 D - 3 OOA/OOD is the new paradigm and is viewed by many as the best solution to most problems. Some of the advantages of modeling the real world into objects is that 1) it is thought to follow a more natural human thinking process and 2) objects, if properly chosen, are the most stable perspective of the real world problem space and can be more resilient to change as the functions/services and data & commands/messages are isolated and hidden from the overall system. For example, while over the course of the development life-cycle the number, as well as types, of functions (e.g. turn camera 1 on, download sensor data, ignite starter, fire engine 3, etc.) May change, the basic objects (e.g. cameras, sensors, starter, engines, operator, etc.) needed to create a system usually are constant. That is, while there may now be three cameras instead of two, the new Camera-3 is just an instance of the basic object ‘camera’. Or while an infrared camera may now be the type needed, there is still a ‘camera’ and the differences in power, warm-up time, and data storage may change, all that is kept isolated (hidden) from affecting the rest of the system. OOA incorporates the principles of abstraction, information hiding, inheritance, and a method of organizing the problem space by using the three most “human” means of classification. These combined principles, if properly applied, establish a more modular, bounded, stable and understandable software system. These aspects of OOA should make a system created under this method more robust and less susceptible to changes, properties which help create a safer software system design. Abstraction refers to concentrating on only certain aspects of a complex problem, system, idea or situation in order to better comprehend that portion. The perspective of the analyst focuses on similar characteristics of the system objects that are most important to them. Then, at a later time, the analyst can address other objects and their desired attributes or examine the details of an object and deal with each in more depth. Data abstraction is used by OOA to create the primary organization for thinking and specification in that the objects are first selected from a certain perspective and then each object is defined in detail. An object is defined by the attributes it has and the functions it performs on those attributes. An abstraction can be viewed, as per Shaw, as “a simplified description, or specification, of a system that emphasizes some of the system’s details or properties while suppressing others. A good abstraction is one that emphasizes details that are significant to the reader or user and suppresses details that are, at least for the moment, immaterial or diversionary”. Information hiding also helps manage complexity in that it allows encapsulation of requirements, which might be subject to change. In addition, it helps to isolate the rest of the system from some object specific design decisions. Thus, the rest of the s/w system sees only what is absolutely necessary of the inner workings of any object. Inheritance “ defines a relationship among classes [objects], wherein one class shares the structure or behavior defined in one or more classes... Inheritance thus represents a hierarchy of abstractions, in which a subclass [object] inherits from one or more superclasses [ancestor objects]. Typically, a subclass augments or redefines the existing structure and behavior of its superclasses”. Classification theory states that humans normally organize their thinking by: looking at an object and comparing its attributes to those experienced before (e.g. looking at a cat, humans tend to think of its size, color, temperament, etc. in relation to past experience with cats) distinguishing between an entire object and its component parts (e.g., a rose bush versus its roots, flowers, leaves, thorns, stems, etc.) classification of objects as distinct and separate groups (e.g. trees, grass, cows, cats, politicians). In OOA, the first organization is to take the problem space and render it into objects and their attributes (abstraction). The second step of organization is into Assembly Structures, where an object and its parts are considered. The third form of organization of the problem space is into Classification Structures during which the problem space is examined for generalized and specialized instances of objects FAA System Safety Handbook, Appendix D December 30, 2000 D - 4 (inheritance). That is, if looking at a railway system the objects could be engines (provide power to pull cars), cars (provide storage for cargo), tracks (provide pathway for trains to follow/ride on), switches (provide direction changing), stations (places to exchange cargo), etc. Then you would look at the Assembly Structure of cars and determine what was important about their pieces parts, their wheels, floor construction, coupling mechanism, siding, etc. Finally, Classification Structure of cars could be into cattle, passenger, grain, refrigerated, and volatile liquid cars. The purpose of all this classification is to provide modularity which partitions the system into well defined boundaries that can be individually/independently understood, designed, and revised. However, despite “classification theory”, choosing what objects represent a system is not always that straight forward. In addition, each analyst or designer will have their own abstraction, or view of the system which must be resolved. OO does provide a structured approach to software system design and can be very useful in helping to bring about a safer, more reliable system. D.3 Formal Methods - Specification Development “Formal Methods (FM) consists of a set of techniques and tools based on mathematical modeling and formal logic that are used to specify and verify requirements and designs for computer systems and software.” While Formal Methods (FM) are not widely used in US industry, FM has gained some acceptance in Europe. A considerable learning curve must be surmounted for newcomers, which can be expensive. Once this hurdle is surmounted successfully, some users find that it can reduce overall development lifecycle cost by eliminating many costly defects prior to coding. WHY ARE FORMAL METHODS NECESSARY? A digital system may fail as a result of either physical component failure, or design errors. The validation of an ultra-reliable system must deal with both of these potential sources of error. Well known techniques exist for handling physical component failure; these techniques use redundancy and voting. The reliability assessment problem in the presence of physical faults is based upon Markov modeling techniques and is well understood. The design error problem is a much greater threat. Unfortunately, no scientifically justifiable defense against this threat is currently used in practice. There are 3 basic strategies that are advocated for dealing with the design error: 1. Testing (Lots of it) 2. Design Diversity (i.e. software fault-tolerance: N-version programming, recovery blocks, etc.) 3. Fault/Failure Avoidance (i.e. formal specification/verification, automatic program synthesis, reusable modules) The problem with life testing is that in order to measure ultrareliability one must test for exorbitant amounts of time. For example, to measure a 10 -9 probability of failure for a 1-hour mission one must test for more than 114,000 years. Many advocate design diversity as a means to overcome the limitations of testing. The basic idea is to use separate design/implementation teams to produce multiple versions from the same specification. Then, FAA System Safety Handbook, Appendix D December 30, 2000 D - 5 non-exact threshold voters are used to mask the effect of a design error in one of the versions. The hope is that the design flaws will manifest errors independently or nearly so. By assuming independence one can obtain ultra-reliable-level estimates of reliability even though the individual versions have failure rates on the order of 10 -4 . Unfortunately, the independence assumption has been rejected at the 99% confidence level in several experiments for low reliability software. Furthermore, the independence assumption cannot ever be validated for high reliability software because of the exorbitant test times required. If one cannot assume independence then one must measure correlations. This is infeasible as well---it requires as much testing time as life-testing the system because the correlations must be in the ultra-reliable region in order for the system to be ultra-reliable. Therefore, it is not possible, within feasible amounts of testing time, to establish that design diversity achieves ultrareliability. Consequently, design diversity can create an illusion of ultra-reliability without actually providing it. It is felt that formal methods currently offer the only intellectually defensible method for handling the design fault problem. Because the often quoted 1 - 10 -9 reliability is well beyond the range of quantification, there is no choice but to develop life-critical systems in the most rigorous manner available to us, which is the use of formal methods. WHAT ARE FORMAL METHODS? Traditional engineering disciplines rely heavily on mathematical models and calculation to make judgments about designs. For example, aeronautical engineers make extensive use of computational fluid dynamics (CFD) to calculate and predict how particular airframe designs will behave in flight. We use the term formal methods to refer to the variety of mathematical modeling techniques that are applicable to computer system (software and hardware) design. That is, formal methods is the applied mathematics engineering and, when properly applied, can serve a role in computer system design. Formal methods may be used to specify and model the behavior of a system and to mathematically verify that the system design and implementation satisfy system functional and safety properties. These specifications, models, and verifications may be done using a variety of techniques and with various degrees of rigor. The following is an imperfect, but useful, taxonomy of the degrees of rigor in formal methods: Level-1: Formal specification of all or part of the system. Level-2: Formal specification at two or more levels of abstraction and paper and pencil proofs that the detailed specification implies the more abstract specification. Level-3: Formal proofs checked by a mechanical theorem prover. Level 1 represents the use of mathematical logic or a specification language that has a formal semantics to specify the system. This can be done at several levels of abstraction. For example, one level might enumerate the required abstract properties of the system, while another level describes an implementation that is algorithmic in style. Level 2 formal methods goes beyond Level 1 by developing pencil-and-paper proofs that the more concrete levels logically imply the more abstract-property oriented levels. This is usually done in the manner illustrated below. Level 3 is the most rigorous application of formal methods. Here one uses a semi-automatic theorem prover to make sure that all of the proofs are valid. The Level 3 process of convincing a mechanical FAA System Safety Handbook, Appendix D December 30, 2000 D - 6 prover is really a process of developing an argument for an ultimate skeptic who must be shown every detail. Formal methods is not an all-or-nothing approach. The application of formal methods to only the most critical portions of a system is a pragmatic and useful strategy. Although a complete formal verification of a large complex system is impractical at this time, a great increase in confidence in the system can be obtained by the use of formal methods at key locations in the system. D.3.1 Formal Inspections of Specifications Formal inspections and formal analysis are different. Formal Inspections should be performed within every major step of the software development process. Formal Inspections, while valuable within each design phase or cycle, have the most impact when applied early in the life of a project, especially the requirements specification and definition stages of a project. Studies have shown that the majority of all faults/failures, including those that impinge on safety, come from missing or misunderstood requirements. Formal Inspection greatly improves the communication within a project and enhances understanding of the system while scrubbing out many of the major errors/defects. For the Formal Inspections of software requirements, the inspection team should include representatives from Systems Engineering, Operations, Software Design and Code, Software Product Assurance, Safety, and any other system function that software will control or monitor. It is very important that software safety be involved in the Formal Inspections. It is also very helpful to have inspection checklists for each phase of development that reflect both generic and project specific criteria. The requirements discussed in this section and in Robyn R. Lutz's paper "Targeting Safety-Related Errors During Software Requirements Analysis" will greatly aid in establishing this checklist. Also, the checklists provided in the NASA Software Formal Inspections Guidebook are helpful. D.3.2 Timing, Throughput And Sizing Analysis Timing and sizing analysis for safety critical functions evaluates software requirements that relate to execution time and memory allocation. Timing and sizing analysis focuses on program constraints. Typical constraint requirements are maximum execution time and maximum memory usage. The safety organization should evaluate the adequacy and feasibility of safety critical timing and sizing requirements. These analyses also evaluate whether adequate resources have been allocated in each case, under worst case scenarios. For example, will I/O channels be overloaded by many error messages, preventing safety critical features from operating. Quantifying timing/sizing resource requirements can be very difficult. Estimates can be based on the actual parameters of similar existing systems. Items to consider include: · memory usage versus availability; · I/O channel usage (load) versus capacity and availability; · execution times versus CPU load and availability; · sampling rates versus rates of change of physical parameters. FAA System Safety Handbook, Appendix D December 30, 2000 D - 7 In many cases it is difficult to predict the amount of computing resources required. Hence, making use of past experience is important. D.3.3 Memory usage versus availability Assessing memory usage can be based on previous experience of software development if there is sufficient confidence. More detailed estimates should evaluate the size of the code to be stored in the memory, and the additional space required for storing data and scratchpad space for storing interim and final results of computations. Memory estimates in early program phases can be inaccurate, and the estimates should be updated and based on prototype codes and simulations before they become realistic. Dynamic Memory Allocation can be viewed as either a practical memory run time solution or as a nightmare for assuring proper timing and usage of critical data. Any suggestion of Dynamic Memory Allocation, common in OOD, CH environments, should be examined very carefully; even in “noncritical” functional modules. D.3.3.1 I/O channel usage (Load) versus capacity and availability Address I/O for science data collection, housekeeping and control. Evaluate resource conflicts between science data collection and safety critical data availability. During failure events, I/O channels can be overloaded by error messages and these important messages can be lost or overwritten. (e.g. the British “Piper Alpha” offshore oil platform disaster). Possible solutions includes, additional modules designed to capture, correlate and manage lower level error messages or errors can be passed up through the calling routines until at a level which can handle the problem; thus, only passing on critical faults or combinations of faults, that may lead to a failure. Execution times versus CPU load and availability. Investigate time variations of CPU load, determine circumstances of peak load and whether it is acceptable. Consider multi-tasking effects. Note that excessive multi-tasking can result in system instability leading to “crashes”. D.3.3.2 Sampling rates versus rates of change of physical parameters Analysis should address the validity of the system performance models used, together with simulation and test data, if available.
作者: 帅哥    时间: 2008-12-21 21:12:13     标题: System Safety Principles

FAA System Safety Handbook, Appendix E: System Safety Principles December 30, 2000 E-1 Appendix E System Safety Principles FAA System Safety Handbook, Appendix E: System Safety Principles December 30, 2000 E-2 System Safety Principles • System safety is a basic requirement of the total system. • System safety must be planned - Integrated and comprehensive safety engineering effort - Interrelated, sequential, and continuing effort - Plan must influence facilities, equipment, procedures, and personnel - Applicable to all program phases - Covers transportation and logistics support - Covers storage, packaging, and handling - Covers Non-Development Items (NDI). • MA provides management of system safety effort Managerial and technical procedures to be used must be for MA approval. - Resolves conflicts between safety and other design requirements - Resolves conflicts between associate contractors. • Design safety precedence: - Design to minimum hazard - Use safety devices - Use warning devices - Use special procedures. • System Safety requirements must be consistent with other program requirements. Performance, cost, etc., requirements may have priority over safety Requirements. • System analyses are basic tools for systematically developing design specifications. Ultimate measure of safety is not the scope of analysis but in satisfied Requirements. - Analyses are performed to: § Identify hazards and corrective actions § Review safety considerations in tradeoffs § Determine/evaluate safety design requirements § Determine/evaluate operational, test, logistics requirements § Validate qualitative/quantitative requirements have been met. - Analyses are hazard not safety analyses FAA System Safety Handbook, Appendix E: System Safety Principles December 30, 2000 E-3 • Level of risk assumption and criteria are an inherent part of risk management. • Safety Management - Defines functions, authority, and interrelationships - Exercises appropriate controls. • Degree of safety effort and achievements are directly dependent upon management emphasis by the FAA and contractors. • Results of safety effort depend upon MA clearly stating safety objectives/requirements. • MA responsibilities: - Plan, organize, and implement SSP - Establish safety requirements for system design - State safety requirements in contract - Requirements for activities in Statement of Work (SOW) - Review and insure adequate and complete system safety program plan (SSPP) - Supply historical data - Review contractor system safety effort/data - Ensure specifications are updated with test analyses results - Establish and operate system safety groups. • Software hazard analyses are a flow down requirements process followed by an upward flow verification process • Four elements of an effective SSP: - Planned approach to accomplish tasks - Qualified people - Authority to implement tasks through all levels of management - Appropriate manning/funding.
作者: 帅哥    时间: 2008-12-21 21:12:47     标题: ORM Details and Examples

FAA System Safety Handbook, Appendix F December 30, 2000 F-1 Appendix F ORM Details and Examples FAA System Safety Handbook, Appendix F December 30, 2000 F-2 1.0 HAZARD IDENTIFICATION TOOLS, DETAILS AND EXAMPLES Chapter 15 summarizes the Operational Risk Management methodology. This Appendix provides examples of those tools, as they are applied to the ORM process: · Hazard Identification · Risk Assessment · Risk Control Option Analysis · Risk Control Decisions · Risk Control Implementation · Supervision and Review 1.1 PRIMARY HAZARD IDENTIFICATION TOOLS The seven described in this appendix are considered the basic set of hazard identification tools to be applied on a day-to-day basis in organizations at all levels. These tools have been chosen for the following reasons: They are simple to use, though they require some training. They have been proven effective. Widespread application has demonstrated they can and will be used by operators and will consistently be perceived as positive. As a group, they complement each other, blending the intuitive and experiential with the more structured and rigorous. They are well supported with worksheets and job aids. In an organization with a mature ORM culture, the use of these tools by all personnel will be regarded as the natural course of events. The norm will be “Why would I even consider exposing myself and others to the risks of this activity before I have identified the hazards involved using the best procedures or designs available?” The following pages describe each tool using a standard format with models and examples. 1.1.1 THE OPERATIONS ANALYSIS AND FLOW DIAGRAM FORMAL NAME: The Operations Analysis ALTERNATIVE NAMES: The flow diagram, flow chart, operation timeline PURPOSE: The Operations Analysis (OA) provides an itemized sequence of events or a flow diagram depicting the major events of an operation. This assures that all elements of the operation are evaluated as potential sources of risk. This analysis overcomes a major weaknesses of traditional risk management, which tends to focus effort on one or two aspects of an operation that are intuitively identified as risky, often to the exclusion of other aspects that may actually be riskier. The Operations Analysis also guides the allocation of risk management resources over time as an operation unfolds event by event in a systematic manner. FAA System Safety Handbook, Appendix F December 30, 2000 F-3 APPLICATION: The Operations Analysis or flow diagram is used in nearly all risk management applications, including the most time-critical situations. It responds to the key risk management question “What am I facing here and from where can risk arise?” METHOD: Whenever possible, the Operations Analysis is taken directly from the planning of the operation. It is difficult to imagine planning an operation without identifying the key events in a time sequence. If for some reason such a list is not available, the analyst creates it using the best available understanding of the operation. The best practice is to break down the operation into time-sequenced segments strongly related by tasks and activities. Normally, this is well above the detail of individual tasks. It may be appropriate to break down aspects of an operation that carry obviously higher risk into more detail than less risky areas. The product of an OA is a compilation of the major events of an operation in sequence, with or without time checks. An alternative to the Operations Analysis is the flow diagram. Commonly used symbols are provided at Figure 1.1.1A. Putting the steps of the process on index cards or sticky-back note paper allows the diagram to be rearranged without erasing and redrawing, thus encouraging contributions. FAA System Safety Handbook, Appendix F December 30, 2000 F-4 Figure 1.1.1A Example Flow Chart Symbols SYMBOL REPRESENTS EXAMPLE START RECEIVE TASKING BEGIN TRIP OPEN CHECKLIST ACTIVITY OPERATION PLANNING START CAR STEP ONE IN CHECKLIST DECISION POINT (OR) YES/NO APPROVE/DISAPPROVE PASS/FAIL FORK / SPLIT (AND) PREPOSTION VEHICLES AND SUPPLIES RELEASE CLUTCH AND PRESS ACCELERATOR OBSERVE FLIGHT CONTROLS WHILE MOVING STICK END FINAL REPORT ARRIVE AT DESTINATION AIRCRAFT ACCEPTED RESOURCES: The key resource for the Operations Analysis are the operational planners. Using their operational layout will facilitate the integration of risk controls in the main operational plan and will eliminate the expenditure of duplicate resources on this aspect of hazard identification. COMMENTS: Look back on your own experience. How many times have you been surprised or seen others surprised because they overlooked possible sources of problems? The OA is the key to minimizing this source of accidents. THE PLANNING PHASE If more detail and more structured examination of the operational flow are desired, the flow diagram can be used. This diagram will add information through the use of graphic symbols. A flow diagram of the planning phase above might be developed as illustrated in Figure 1.1.1B below. · Initial Intelligence Received (Maps, Facility Lists, Environment, Etc. · Advance Party Dispatched · Advance Party Data Received · Deployment Planning Underway · Deployment Preparations Initiated · Initial Operation Planning Underway · Contingency Planning Underway FAA System Safety Handbook, Appendix F December 30, 2000 F-5 Figure 1.1.1B Example Flow Diagram Intelligence Tasks Gather initial Intelligence Dispatch advance team Deployment planning Initial planning Contingency planning Plans complete Start The flow diagram can be used as an ORM planning tool. Indicate ORM actions in connection with each activity. Get ORM data Protect the Team 1.1.2 THE PRELIMINARY HAZARD ANALYSIS FORMAL NAME: Preliminary Hazard Analysis ALTERNATIVE NAMES: The PHA, the PHL PURPOSE: The PHA provides an initial overview of the hazards present in the overall flow of the operation. It provides a hazard assessment that is broad, but usually not deep. The key idea of the PHA is to consider the risk inherent to every aspect of an operation. The PHA helps overcome the tendency to focus immediately on risk in one aspect of an operation, sometimes at the expense of overlooking more serious issues elsewhere in the operation. The PHA will often serve as the hazard identification process when risk is low or routine. In higher risk operations, it serves to focus and prioritize follow-on hazard analyses by displaying the full range of risk issues. APPLICATION: The PHA is used in nearly all risk management applications except the most timecritical. Its broad scope is an excellent guide to the identification of issues that may require more detailed hazard identification tools. METHOD: The PHA is usually based on the Operations Analysis or flow diagram, taking each event in turn from it. Analysts apply their experience and intuition, use reference publications and standards of various kinds, and consult with personnel who may have useful input. The extent of the effort is dictated by resource and time limitations, and by the estimate of the degree of overall risk inherent in the operation. Hazards that are detected are often listed directly on a copy of the Operations Analysis as shown at Figure 1.1.2A. Alternatively, a more formal PHA format such as the worksheet shown at Figure 1.1.2B can be used. Operations Analysis. The completed PHA is used to identify hazards requiring more in-depth hazard identification or it may lead directly to the remaining five steps of the ORM process, if FAA System Safety Handbook, Appendix F December 30, 2000 F-6 hazard levels are judged to be low. Key to the effectiveness of the PHA is assuring that all events of the operation are covered. Figure 1.1.2A Building the PHA directly From the Operations Analysis Flow Diagram Operational Phase Hazards RESOURCES: The two key resources for the PHA are the expertise of personnel actually experienced in the operation and the body of regulations, standards, and instructions that may be available. The PHA can be accomplished in small groups to broaden the List the operational phases vertically down the page. Be sure to leave plenty of space on the worksheet between each phase to allow several hazards to be noted for each phase. List the hazards noted for each operational phase. Strive for detail within the limits imposed by time. A copy of a PHA accomplished for an earlier similar operation would aid in the process. COMMENTS: The PHA is relatively easy to use and takes little time. Its significant power to impact risk arises from the forced consideration of risk in all phases of an operation. This means that a key to success is to link the PHA closely to the Operations Analysis. EXAMPLES: The following (Figure 1.1.2B) is an example of a PHA. List the operational phases vertically down the page. Be sure to leave plenty of space on the worksheet between each phase to allow several hazards to be noted List the hazards noted for each operational phase here. Strive for detail within the limits imposed by the time you have set aside for this tool. FAA System Safety Handbook, Appendix F December 30, 2000 F-7 Figure 1.1.2B Example PHA MOVING A HEAVY PIECE OF EQUIPMENT The example below uses an operation analysis for moving a heavy piece of equipment as the start point and illustrates the process of building the PHA direct from the Operations Analysis. Operation: Move a 3-ton machine from one building to another. Start Point: The machine is in its original position in building A End Point: The machine is in its new position in building B ACTIVITY / EVENT HAZARD Raise the machine to permit positioning of the forklift Machine overturns due to imbalance Machine overturns due to failure of lifting device Machine drops on person or equipment due to failure of lifting device or improper placement (person lifting device) Machine strikes overhead obstacle Machine is damaged by the lifting process Position the forklift Forklift strikes the machine Forklift strikes other items in the area Lift the machine Machine strikes overhead obstacle Lift fails due to mechanical failure (damage to machine, objects, or people) Machine overturns due to imbalance Move machine to the truck Instability due to rough surface or weather condition Operator error causes load instability The load shifts Place machine on the truck Improper tiedown produces instability Truck overloaded or improper load distribution Drive truck to building B Vehicle accident during the move Poor driving technique produces instability Instability due to road condition Remove machine from the truck Same factors as “Move it to the truck” Place machine in proper position in building B Same factors as “Raise the machine” except focused on lowering the machine 1.1.3 THE ""WHAT IF"" TOOL FORMAL NAME: The “"What If"” tool ALTERNATIVE NAMES: None. FAA System Safety Handbook, Appendix F December 30, 2000 F-8 PURPOSE: The "What If" tool is one of the most powerful hazard identification tools. As in the case of the Scenario Process tool, it is designed to add structure to the intuitive and experiential expertise of operational personnel. The "What If" tool is especially effective in capturing hazard data about failure modes that may create hazards. It is somewhat more structured than the PHA. Because of its ease of use, it is probably the single most practical and effective tool for use by operational personnel. APPLICATION: The "What If" tool should be used in most hazard identification applications, including many time-critical applications. A classic use of the "What If" tool is as the first tool used after the Operations Analysis and the PHA. For example, the PHA reveals an area of hazard that needs additional investigation. The best single tool to further investigate that area will be the “What If” tool. The user will zoom in on the particular area of concern, add detail to the OA in this area and then use the "What If" procedure to identify the hazards. METHOD: Ensure that participants have a thorough knowledge of the anticipated flow of the operation. Visualize the expected flow of events in time sequence from the beginning to the end of the operation. Select a segment of the operation on which to focus. Visualize the selected segment with "Murphy" injected. Make a conscious effort to visualize hazards. Ask, "what if various failures occurred or problems arose”? Add hazards and their causes to your hazard list and assess them based on probability and severity. The "What-If" analysis can be expanded to further explore the hazards in an operation by developing short scenarios that reflect the worst credible outcome from the compound effects of multiple hazards in the operation. Follow these guidelines in writing scenarios: · Target length is 5 or 6 sentences, 60 words · Don't dwell on grammatical details · Include elements of Mission, Man, Machine, Management, and Media · Start with history · Encourage imagination and intuition · Carry the scenario to the worst credible outcome · Use a single person or group to edit RESOURCES: A key resource for the "What If" tool is the Operations Analysis. It may be desirable to add detail to it in the area to be targeted by the "What If" analysis. However, in most cases an OA can be used as-is, if it is available. The "What If" tool is specifically designed to be used by personnel actually involved in an operation. Therefore, the most critical what if resource is the involvement of operators and their first lines supervisors. Because of its effectiveness, dynamic character, and ease of application, these personnel are generally quite willing to support the "What If" process. COMMENTS: The "What If" tool is so effective that the Occupational Safety and Health Administration (OSHA) has designated as it one of six tools from among which activities facing catastrophic risk situations must choose under the mandatory hazard analysis provisions of the process safety standard. EXAMPLES: Following (Figure 1.1.3A) is an extract from the typical output from the "What If" tool. FAA System Safety Handbook, Appendix F December 30, 2000 F-9 Figure 1.1.3A Example What If Analysis Situation: Picture a group of 3 operational employees informally applying the round robin procedure for the "What If" tool to a task to move a multi-ton machine from one location to another. A part of the discussion might go as follows: Joe: What if the machine tips over and falls breaking the electrical wires that run within the walls behind it? Bill: What if it strikes the welding manifolds located on the wall on the West Side? (This illustrates “piggybacking” as Bill produces a variation of the hazard initially presented by Joe). Mary: What if the floor fails due to the concentration of weight on the base of the lifting device? Joe: What if the point on the machine used to lift it is damaged by the lift? Bill: What if there are electrical, air pressure hoses, or other attachments to the machine that are not properly neutralized? Mary: What if the lock out/tag out is not properly applied to energy sources servicing the machine? And so on.... Note: The list above for example might be broken down as follows: Group 1: Machine falling hazards Group 2: Weight induced failures Group 3: Machine disconnect and preparation hazards These related groups of hazards are then subjected to the remaining five steps of the ORM process. 1.1.4 THE SCENARIO PROCESS TOOL FORMAL NAME: The Scenario Process tool ALTERNATIVE NAMES: The mental movie tool. PURPOSE: The Scenario Process tool is a time-tested procedure to identify hazards by visualizing them. It is designed to capture the intuitive and experiential expertise of personnel involved in planning or executing an operation, in a structured manner. It is especially useful in connecting individual hazards into situations that might actually occur. It is also used to visualize the worst credible outcome of one or more related hazards, and is therefore an important contributor to the risk assessment process. APPLICATION: The Scenario Process tool should be used in most hazard identification applications, including some time-critical applications. In the time-critical mode, it is indeed one of the few practical FAA System Safety Handbook, Appendix F December 30, 2000 F-10 tools, in that the user can quickly form a “mental movie” of the flow of events immediately ahead and the associated hazards. METHOD: The user of the Scenario Process tool attempts to visualize the flow of events in an operation. This is often described as constructing a “mental movie”. It is often effective to close the eyes, relax and let the images flow. Usually the best procedure is to use the flow of events established in the OA. An effective method is to visualize the flow of events twice. The first time, see the events as they are intended to flow. The next time, inject “Murphy” at every possible turn. As hazards are visualized, they are recorded for further action. Some good guidelines for the development of scenarios are as follows: Limit them to 60 words or less. Don’t get tied up in grammatical excellence (in fact they don’t have to be recorded at all). Use historical experience but avoid embarrassing anyone. Encourage imagination (this helps identify risks that have not been previously encountered). Carry scenarios to the worst credible event. RESOURCES: The key resource for the Scenario Process tool is the Operations Analysis. It provides the script for the flow of events that will be visualized. Using the tool does not require a specialist. Operational personnel leading or actually performing the task being assessed are key resources for the OA. Using this tool is often entertaining, dynamic and often motivates even the most junior personnel in the organization. COMMENTS: A special value of the Scenario Process tool is its ability to link two or more individual hazards developed using other tools into an operation relevant scenario. EXAMPLES. Following is an example (Figure 1.1.4A) of how the Scenario Process tool might be used in an operational situation. Figure 1.1.4A Example Machine Movement Scenario 1.1.5 THE LOGIC DIAGRAM FORMAL NAME: The Logic Diagram ALTERNATIVE NAMES: The Logic Tree PURPOSE: The Logic Diagram is intended to provide considerable structure and detail as a primary hazard identification procedure. Its graphic structure is an excellent means of capturing and correlating FROM MACHINE MOVEMENT EXAMPLE: As the machine was being jacked-up to permit placement of the forklift, the fitting that was the lift point on the machine broke. The machine tilted in that direction and fell over striking the nearby wall. This in turn broke a fuel gas line in the wall. The gas was turned off as a precaution, but the blow to the metal line caused the valve to which it was attached to break, releasing gas into the atmosphere. The gas quickly reached the motor of a nearby fan (not explosion proof) and a small explosion followed. Several personnel were badly burned and that entire section of the shop was badly damaged. The shop was out of action for 3 weeks. FAA System Safety Handbook, Appendix F December 30, 2000 F-11 the hazard data produced by the other primary tools. Because of its graphic display, it can also be an effective hazard-briefing tool. The more structured and logical nature of the Logic Diagram adds substantial depth to the hazard identification process to complement the other more intuitive and experiential tools. Finally, an important purpose of the Logic Diagram is to establish the connectivity and linkages that often exist between hazards. It does this very effectively through its tree-like structure. APPLICATION: Because it is more structured, the Logic Diagram requires considerable time and effort to accomplish. Following the principles of ORM, its use will be more limited than the other primary tools. This means limiting its use to higher risk issues. By its nature it is also most effective with more complicated operations in which several hazards may be interlinked in various ways. Because it is more complicated than the other primary tools, it requires more practice, and may not appeal to all operational personnel. However, in an organizational climate committed to ORM excellence, the Logic Diagram will be a welcomed and often used addition to the hazard identification toolbox. METHOD: There are three types of Logic Diagrams. These are the: Positive diagram. This variation is designed to highlight the factors that must be in place if risk is to be effectively controlled in the operation. It works from a safe outcome back to the factors that must be in place to produce it. Event diagram. This variation focuses on an individual operational event (often a failure or hazard identified using the "What If" tool) and examines the possible consequences of the event. It works from an event that may produce risk and shows what the loss outcomes of the event may be. Negative diagram. This variation selects a loss event and then analyzes the various hazards that could combine to produce that loss. It works from an actual or possible loss and identifies what factors could produce it. All of the various Logic Diagram options can be applied either to an actual operating system or one being planned. Of course, the best time for application is in the planning stages of the operational lifecycle. All of the Logic Diagram options begin with a top block. In the case of the positive diagram, this is a desired outcome; in the case of the event diagram, this is an operations event or contingency possibility; in the case of the negative diagram, it is a loss event. When working with positive diagram or negative diagram, the user then, reasons out the factors that could produce the top event. These are entered on the next line of blocks. With the event diagram, the user lists the possible results of the event being analyzed. The conditions that could produce the factors on the second line are then considered and they are entered on the third line. The goal is to be as logical as possible when constructing Logic Diagrams, but it is more important to keep the hazard identification goal in mind than to construct a masterpiece of logical thinking. Therefore, a Logic Diagram should be a worksheet with lots of changes and variations marked on it. With the addition of a chalkboard or flip chart, it becomes an excellent group tool. Figure 1.1.5A below is a generic diagram, and it is followed by a simplified example of each of the types of Logic Diagrams (Figures 1.1.5B, 1.1.5C, 1.1.5D). FAA System Safety Handbook, Appendix F December 30, 2000 F-12 Figure 1.1.5A Generic Logic Diagram EVENT PRIMARY CAUSE SUPPORTING CAUSE ROOT CAUSE PRIMARY CAUSE PRIMARY CAUSE SUPPORTING CAUSE SUPPORTING CAUSE SUPPORTING CAUSE ROOT CAUSE Figure 1.1.5B Positive Event Logic Diagram ETC. ETC. TIEDOWN PROPERLY ACCOMPLISHED CLEAR PROCEDURES GOOD MOTIVATION GOOD TRAINING CONTAINER STAYS ON VEHICLE FAA System Safety Handbook, Appendix F December 30, 2000 F-13 Figure 1.1.5C Risk Event Diagram FORKLIFT PROCEDURES VIOLATED-EXCEEDED LIFT CAPACITY ETC. LIFT MECHANISM FAILS, LIFT FAILS ETC. LOAD BOUNCES TO THE GROUND CONTAINER RUPTURES, CHEMICAL AGENT LEAKS FAA System Safety Handbook, Appendix F December 30, 2000 F-14 Figure 1.1.5D Negative Event Logic Diagram CONTAINER FALLS OFF VEHICLE & RUPTURES ETC. FAILURE OF TIEDOWN GEAR ETC. FAILURE TO INSPECT & TEST TIEDOWNS IAW PROCEDURES VARIOUS ROOT CAUSES ETC. RESOURCES: All of the other primary tools are key resources for the Logic Diagram, as it can correlate hazards that they generate. If available, a safety professional may be an effective facilitator for the Logic Diagram process. COMMENTS: The Logic Diagram is the most comprehensive tool available among the primary procedures. Compared to other approaches to hazard identification, it will substantially increase the quantity and quality of hazards identified. EXAMPLE: Figure 1.1.5E illustrates how a negative diagram could be constructed for moving a heavy piece of equipment. FAA System Safety Handbook, Appendix F December 30, 2000 F-15 Figure 1.1.5E Example Negative Diagram Machine fails when raised by the forklift Machine strikes an overhead obstacle and tilts The load shifts due to lift point or failure to secure Improper operator technique (jerky, bad technique) Load is too heavy for the forklift Mechanical failure of the forklift The machine breaks at the point of lift Improper operator technique (jerky, bad technique) Improper operator technique (jerky, bad technique) Improper operator technique (jerky, bad technique) Improper operator technique (jerky, bad technique) Improper operator technique (jerky, bad technique) Each of these items may be taken to a third level. For example: The Logic Diagram pulls together all sources of hazards and displays them in a graphic format that clarifies the risk issues. 1.1.6 THE CHANGE ANALYSIS FORMAL NAME: The Change Analysis ALTERNATIVE NAMES: None PURPOSE: Change is an important source of risk in operational processes. Figure 1.1.6A illustrates this causal relationship. FAA System Safety Handbook, Appendix F December 30, 2000 F-16 Figure 1.1.6A Change Causation System Impacted Stress is Created Risk Controls Overcome Risk Increases Losses Increase Introduce Change Some changes are planned, but many others occur incrementally over time, without any conscious direction. The Change Analysis is intended to analyze the hazard implications of either planned or incremental changes. The Change Analysis helps to focus only on the changed aspects of the operation, thus eliminating the need to reanalyze the total operation, just because a change has occurred in one area. The Change Analysis is also used to detect the occurrence of change. By periodically comparing current procedures with previous ones, unplanned changes are identified and clearly defined. Finally, Change Analysis is an important accident investigation tool. Because many incidents/accidents are due to the injection of change into systems, an important investigative objective is to identify these changes using the Change Analysis procedure. APPLICATION: Change analysis should be routinely used in the following situations. Whenever significant changes are planned in operations in which there is significant operational risk of any kind. An example is the decision to conduct a certain type of operation at night that has heretofore only been done in daylight. Periodically in any important operation, to detect the occurrence of unplanned changes. As an accident investigation tool. As the only hazard identification tool required when an operational area has been subjected to in-depth hazard analysis, the Change Analysis will reveal whether any elements exist in the current operations that were not considered in the previous in-depth analysis. METHOD: The Change Analysis is best accomplished using a format such as the sample worksheet shown at Figure 1.1.6B. The factors in the column on the left side of this tool are intended as a comprehensive change checklist. FAA System Safety Handbook, Appendix F December 30, 2000 F-17 Figure 1.1.6B Sample Change Analysis Worksheet Target: ________________________________ Date: ______________________ FACTORS EVALUATED SITUATION COMPARABLE SITUATION DIFFERENCE SIGNIFICANCE WHAT Objects Energy Defects Protective Devices WHERE On the object In the process Place WHEN In time In the process WHO Operator Fellow worker Supervisor Others TASK Goal Procedure Quality WORKING CONDITIONS Environmental Overtime Schedule Delays TRIGGER EVENT MANAGERIAL CONTROLS Control Chain Hazard Analysis Monitoring Risk Review To use the worksheet: The user starts at the top of the column and considers the current situation compared to a previous situation and identifies any change in any of the factors. When used in an accident investigation, the accident situation is compared to a previous baseline. The significance of detected changes can be evaluated intuitively or they can be subjected to "What If", Logic Diagram, or scenario, other specialized analyses. FAA System Safety Handbook, Appendix F December 30, 2000 F-18 RESOURCES: Experienced operational personnel are a key resource for the Change Analysis tool. Those who have long-term involvement in an operational process must help define the “comparable situation.” Another important resource is the documentation of process flows and task analyses. Large numbers of such analyses have been completed in recent years in connection with quality improvement and reengineering projects. These materials are excellent definitions of the baseline against which change can be evaluated. COMMENTS: In organizations with mature ORM processes, most, if not all, higher risk activities will have been subjected to thorough ORM applications and the resulting risk controls will have been incorporated into operational guidance. In these situations, the majority of day-to-day ORM activity will be the application of Change Analysis to determine if the operation has any unique aspects that have not been previously analyzed. 1.1.7 THE CAUSE AND EFFECT TOOL FORMAL NAME: The Cause and Effect Tool ALTERNATIVE NAMES: The cause and effect diagram. The fishbone tool, the Ishikawa Diagram PURPOSE: The Cause and Effect Tool is a variation of the Logic Tree tool and is used in the same hazard identification role as the general Logic Diagram. The particular advantage of the Cause and Effect Tool is its origin in the quality management process and the thousands of personnel who have been trained in the tool. Because it is widely used, thousands of personnel are familiar with it and therefore require little training to apply it to the problem of detecting risk. APPLICATION: The Cause and Effect Tool will be effective in organizations that have had some success with the quality initiative. It should be used in the same manner as the Logic Diagram and can be applied in both a positive and negative variation. METHOD: The Cause And Effect diagram is a Logic Diagram with a significant variation. It provides more structure than the Logic Diagram through the branches that give it one of its alternate names, the fishbone diagram. The user can tailor the basic “bones” based upon special characteristics of the operation being analyzed. Either a positive or negative outcome block is designated at the right side of the diagram. Using the structure of the diagram, the user completes the diagram by adding causal factors in either the “M” or “P” structure. Using branches off the basic entries, additional hazards can be added. The Cause And Effect diagram should be used in a team setting whenever possible. RESOURCES: There are many publications describing in great detail how to use cause and effect diagrams. 1 COMMENTS: EXAMPLES: An example of Cause and Effect Tool in action is illustrated at Figure 1.1.7A. 1 K. Ishikawa, Guide to Quality Control, Quality Resources, White Plains, New York, 12 th Printing 1994. FAA System Safety Handbook, Appendix F December 30, 2000 F-19 Figure 1.1.7 Example of Cause and Effect SITUATION: The supervisor of an aircraft maintenance operation has been receiving reports from Quality Assurance regarding tools in aircraft after maintenance over the last six months. The supervisor has followed up but each case has involved a different individual and his spot checks seem to indicate good compliance with tool control procedures. He decides to use a cause and effect diagram to consider all the possible sources of the tool control problem. The supervisor develops the cause and effect diagram with the help of two or three of his best maintenance personnel in a group application. NOTE: Tool control is one of the areas where 99% performance is not adequate. That would mean one in a hundred tools are misplaced. The standard must be that among the tens (or hundreds) of thousands of individual uses of tools over a year, not one is misplaced. Motivation weak (reward, discipline) OI incomplete (lacks detail) Training weak (procedures, consequences) Tool check procedures weak Supervision weak (checks) Management emphasis light No tool boards, cutouts Many small, hard to see tools Many places to lose tools in aircraft Participate in development of new procedures Collective & individual awards Self & coworker observation Detailed OI Quick feedback on mistakes Good matrices Commitment to excellence Strong sustained emphasis Extensive use of toolboard cutouts Using the positive diagram as a guide the supervisor and working group apply all possible and practical options developed from it. 1.2 THE SPECIALTY HAZARD IDENTIFICATION TOOLS The tools that follow are designed to augment the primary tools described in part 1.1. These tools have several advantages: Methods Human Materials Machinery Tool misplaced People Procedures Policies Plant Strong Motivation FAA System Safety Handbook, Appendix F December 30, 2000 F-20 They can be used by nearly everyone in the organization, though some may require either training or professional facilitation. Each tool provides a capability not fully realized in any of the primary tools. They use the tools of the less formal safety program to support the ORM process. They are well supported with forms, job aids, and models. Their effectiveness has been proven. In an organization with a mature ORM process, all personnel will be aware of the existence of these specialty tools and capable of recognizing the need for their application. While not everyone will be comfortable using every procedure, a number of people within the organization will have experience applying one or another of them. 1.2.1 THE HAZARD AND OPERABILITY TOOL FORMAL NAME: The Hazard and Operability Tool ALTERNATIVE NAMES: The HAZOP analysis PURPOSE: The special role of the HAZOP is hazard analysis of completely new operations. In these situations, traditional intuitive and experiential hazard identification procedures are especially weak. This lack of experience hobbles tools such as the "What If" and Scenario Process tools, which rely heavily on experienced operational personnel. The HAZOP deliberately maximizes structure and minimizes the need for experience to increase its usefulness in these situations. APPLICATION: The HAZOP should be considered when a completely new process or procedure is going to be undertaken. The issue should be one where there is significant risk because the HAZOP does demand significant expenditure of effort and may not be cost effective if used against low risk issues. The HAZOP is also useful when an operator or leader senses that “something is wrong” but they can’t identify it. The HAZOP will dig very deeply into the operation and to identify what that “something” is. METHOD: The HAZOP is the most highly structured of the hazard identification procedures. It uses a standard set of guide terms (Figure 1.1) which are then linked in every possible way with a tailored set of process terms (for example “flow”). The process terms are developed directly from the actual process or from the Operations Analysis. The two words together, for example “no” (a guideword) and “flow” (a process term) will describe a deviation. These are then evaluated to see if a meaningful hazard is indicated. If so, the hazard is entered in the hazard inventory for further evaluation. Because of its rigid process, the HAZOP is especially suitable for one-person hazard identification efforts. Figure 1.2.1A Standard HAZOP Guidewords NO MORE LESS REVERSE LATE EARLY Note: This basic set of guidewords should be all that are needed for all applications. Nevertheless, when useful, specialized terms can be added to the list. In less complex applications only some of the terms may be needed. FAA System Safety Handbook, Appendix F December 30, 2000 F-21 RESOURCES: There are few resources available to assist with HAZOP; none are really needed. COMMENTS: The HAZOP is highly structured, and often time-consuming. Nevertheless, in its special role, this tool works very effectively. OSHA selected it for inclusion in the set of six mandated procedures of the OSHA process safety standard. 1.2.2 THE MAPPING TOOL FORMAL NAME: The Mapping Tool ALTERNATIVE NAMES: Map analysis PURPOSE: The map analysis is designed to use terrain maps and other system models and schematics to identify both things at risk and the sources of hazards. Properly applied the tool will reveal the following: Task elements at risk The sources of risk The extent of the risk (proximity) Potential barriers between hazard sources and operational assets APPLICATION: The Mapping Tool can be used in a variety of situations. The explosive quantitydistance criteria are a classic example of map analysis. The location of the flammable storage is plotted and then the distance to various vulnerable locations (inhabited buildings, highways, etc.) is determined. The same principles can be extended to any facility. We can use a diagram of a maintenance shop to note the location of hazards such as gases, pressure vessels, flammables, etc. Key assets can also be plotted. Then hazardous interactions are noted and the layout of the facility can be optimized in terms of risk reduction. METHOD: The Mapping Tool requires some creativity to realize its full potential. The starting point is a map, facility layout, or equipment schematic. The locations of hazard sources are noted. The easiest way to detect these sources is to locate energy sources, since all hazards involve the unwanted release of energy. Figure 1.2.2A lists the kinds of energy to look for. Mark the locations of these sources on the map or diagram. Then, keeping the operation in mind, locate the personnel, equipment, and facilities that the various potentially hazardous energy sources could impact. Note these potentially hazardous links and enter them in the hazard inventory for risk management. FAA System Safety Handbook, Appendix F December 30, 2000 F-22 Figure 1.2.2A Major Types of Energy Electrical Kinetic (moving mass e.g. a vehicle, a machine part, a bullet) Potential (not moving mass e.g. a heavy object suspended overhead) Chemical (e.g. explosives, corrosive materials) Noise and Vibration Thermal (heat) Radiation (Non-ionizing e.g. microwave, and ionizing e.g. nuclear radiation, x-rays) Pressure (air, hydraulic, water) RESOURCES: Maps can convey a great deal of information, but cannot replace the value of an on-site assessment. Similarly, when working with an equipment schematic or a facility layout, there is no substitute for an on-site inspection of the equipment or survey of the facility. COMMENTS: The map analysis is valuable in itself, but it is also excellent input for many other tools such as the Interface Analysis, Energy Trace and Barrier Analysis, and Change Analysis. EXAMPLE: The following example (Figure 1.2.2B) illustrates the use of a facility schematic that focuses on the energy sources there as might be accomplished in support of an Energy Trace and Barrier Analysis. SITUATION: A team has been assigned the task of renovating an older facility for use as a museum for historical aviation memorabilia. They evaluate the facility layout (schematic below). By evaluating the potential energy sources presented in this schematic, it is possible to identify hazards that may be created by the operations to be conducted. FAA System Safety Handbook, Appendix F December 30, 2000 F-23 Figure 1.2.2B Example Map Analysis FACILITY ENERGY SOURCES Electrical throughout Simplified Facility Diagram Main electrical distribution Area beneath suspended item Area of paints & flammables storage Pneumatic lines for old mail distribution 1.2.3 THE INTERFACE ANALYSIS FORMAL NAME: The Interface Analysis ALTERNATIVE NAMES: Interface Hazard Analysis PURPOSE: The Interface Analysis is intended to uncover the hazardous linkages or interfaces between seemingly unrelated activities. For example, we plan to build a new facility. What hazards may be created for other operations during construction and after the facility is operational? The Interface Analysis reveals these hazards by focusing on energy exchanges. By looking at these potential energy transfers between two different activities, we can often detect hazards that are difficult to detect in any other way. APPLICATION: An Interface Analysis should be conducted any time a new activity is being introduced and there is any chance at all that unfavorable interaction could occur. A good cue to the need for an Interface Analysis is the use of either the Change Analysis (indicating the injection of something new) or the map analysis (with the possibility of interactions). METHOD: The Interface Analysis is normally based on an outline such as the one illustrated at Figure 3.1. The outline provides a list of potential energy types and guides the consideration of the potential interactions. A determination is made whether a particular type of energy is present and then whether Areas with old Gas lines for Medical Areas of former Medical FAA System Safety Handbook, Appendix F December 30, 2000 F-24 there is potential for that form of energy to adversely affect other activities. As in all aspects of hazard identification, the creation of a good Operations Analysis is vital. Figure 1.2.3A The Interface Analysis Worksheet RESOURCES: Interface Analyses are best accomplished when personnel from all of the involved activities participate, so that hazards and interfaces in both directions can be effectively and knowledgeably addressed. A safety office representative can also be useful in advising on the types and characteristics of energy transfers that are possible. COMMENTS: The lessons of the past indicate that we should give serious attention to use of the Interface Analysis. Nearly anyone who has been involved in operations for any length of time can relate stories of overlooked interfaces that have had serious adverse consequences. EXAMPLES: An Interface Analysis using the general outline is shown below. Energy Element Kinetic (objects in motion) Electromagnetic (microwave, radio, laser) Radiation (radioactive, x-ray) Chemical Other Personnel Element: Personnel moving from one area to another Equipment Element: Machines and material moving from one area to another Supply/materiel Element: Intentional movement from one area to another Unintentional movement from one area to another Product Element: Movement of product from one area to another Information Element: Flow of information from one area to another or interference (i.e. jamming) Bio-material Element Infectious materials (virus, bacteria, etc.) Wildlife Odors FAA System Safety Handbook, Appendix F December 30, 2000 F-25 Figure 1.2.3B Example Interface Analysis SITUATION: Construction of a heavy equipment maintenance facility is planned for the periphery of the complex at a major facility. This is a major complex costing over $2,000,000 and requiring about eight months to complete. The objective is to detect interface issues in both directions. Notice that the analysis reveals a variety of interface issues that need to be thought through carefully. Energy Interface Movement of heavy construction equipment Movement of heavy building supplies Movement of heavy equipment for repair Possible hazmat storage/use at the facility Personnel Interface Movement of construction personnel (vehicle or pedestrian) through base area Movement of repair facility personnel through base area Possible movement of base personnel (vehicular or pedestrian) near or through the facility Equipment Interface: Movement of equipment as indicated above Supply Interface Possible movement of hazmat through base area Possible movement of fuels and gases Supply flow for maintenance area through base area Product Interface Movement of equipment for repair by tow truck or heavy equipment transport through the base area Information Interface Damage to buried or overhead wires during construction or movement of equipment Possible Electro-magnetic interference due to maintenance testing, arcing, etc. Biomaterial Interface: None 1.2.4 THE ACCIDENT/INCIDENT ANALYSIS FORMAL NAME: The Accident/Incident Analysis ALTERNATIVE NAMES: The accident analysis PURPOSE: Most organizations have accumulated extensive, detailed databases that are gold mines of risk data. The purpose of the analysis is to apply this data to the prevention of future accidents or incidents. APPLICATION: Every organization should complete an operation incident analysis annually. The objective is to update the understanding of current trends and causal factors. The analysis should be completed for each organizational component that is likely to have unique factors. FAA System Safety Handbook, Appendix F December 30, 2000 F-26 METHOD: The analysis can be approached in many ways. The process generally builds a database of the factors listed below and which serves as the basis to identify the risk drivers Typical factors to examine include the following: Activity at the time of the accident Distribution of incidents among personnel Accident locations Distribution of incidents by sub-unit Patterns of unsafe acts or conditions RESOURCES: The analysis relies upon a relatively complete and accurate database. The FAA's system safety office (ASY) may have the needed data. That office can also provide assistance in the analysis process. System Safety personnel may have already completed analyses of similar activities or may be able to suggest the most productive areas for initial analysis. COMMENTS: The data in databases has been acquired the hard way - through the painful and costly mistakes of hundreds of individuals. By taking full advantage of this information the analysis process can be more realistic, efficient, and thorough and thereby preventing the same accidents (incidents?) from occurring over and over again. 1.2.5 THE INTERVIEW TOOL FORMAL NAME: The Interview Tool ALTERNATIVE NAMES: None PURPOSE: Often the most knowledgeable personnel in the area of risk are those who operate the system. They see the problems and often think about potential solutions. The purpose of the Interview Tool is to capture the experience of these personnel in ways that are efficient and positive for them. Properly implemented, the Interview Tool can be among the most valuable hazard identification tools. APPLICATION: Every organization can use the Interview Tool in one form or another. METHOD: The Interview Tool’s great strength is versatility. Figure 1.2.5A illustrates the many options available to collect interview data. Key to all of these is to create a situation in which interviewees feel free to honestly report what they know, without fear of any adverse consequences. This means absolute confidentiality must be assured, by not using names in connection with data. Figure 1.2.5A Interview Tool Alternatives Direct interviews with operational personnel Supervisors interview their subordinates and report results Questionnaire interviews are completed and returns Group interview sessions (several personnel at one time) Hazards reported formally Coworkers interview each other FAA System Safety Handbook, Appendix F December 30, 2000 F-27 RESOURCES: It is possible to operate the interview process facility-wide with the data being supplied to individual units. Hazard interviews can also be integrated into other interview activities. For example, counseling sessions could include a hazard interview segment. In these ways, the expertise and resource demands of the Interview Tool can be minimized. COMMENTS: The key source of risk is human error. Of all the hazard identification tools, the Interview Tool is potentially the most effective at capturing human error data. EXAMPLES: Figure 1.2.5B illustrates several variations of the Interview Tool. Figure 1.2.5B Example Exit Interview Format Name (optional)_____________________________ Organization _____________________ 1. Describe below incidents, near misses or close calls that you have experienced or seen since you have been in this organization. State the location and nature (i.e. what happened and why) of the incident. If you can’t think of an incident, then describe two hazards you have observed. INCIDENT 1: Location: _____________________________________________________ What happened and why?______________________________________________________ __________________________________________________________________________ INCIDENT 2: Location: _____________________________________________________ What happened and why?______________________________________________________ __________________________________________________________________________ 2. What do you think other personnel can do to eliminate these problems? Personnel: _________________________________________________________________ Incident 1__________________________________________________________________ Incident 2__________________________________________________________________ Supervisors: _______________________________________________________________ Incident 1__________________________________________________________________ Incident 2__________________________________________________________________ Top Leadership: ___________________________________________________________ Incident 1__________________________________________________________________ Incident 2__________________________________________________________________ FAA System Safety Handbook, Appendix F December 30, 2000 F-28 1.2.6 THE INSPECTION TOOL FORMAL NAME: The Inspection Tool ALTERNATIVE NAMES: The survey tool PURPOSE: Inspections have two primary purposes. (1) The detection of hazards. Inspections accomplish this through the direct observation of operations. The process is aided by the existence of detailed standards against which operations can be compared. The OSHA standards and various national standards organizations provide good examples. (2) To evaluate the degree of compliance with established risk controls. When inspections are targeted at management and safety management processes, they are usually called surveys. These surveys assess the effectiveness of management procedures by evaluating status against some survey criteria or standard. Inspections are also important as accountability tools and can be turned into important training opportunities APPLICATION: Inspections and surveys are used in the risk management process in much the same manner as in traditional safety programs. Where the traditional approach may require that all facilities are inspected on the same frequency schedule, the ORM concept might dictate that high-risk activities be inspected ten times or more frequently than lower risk operations, and that some of the lowest risk operations be inspected once every five years or so. The degree of risk drives the frequency and depth of the inspections and surveys. METHOD: There are many methods of conducting inspections. From a risk management point of view the key is focusing upon what will be inspected. The first step in effective inspections is the selection of inspection criteria and the development of a checklist or protocol. This must be risk-based. Commercial protocols are available that contain criteria validated to be connected with safety excellence. Alternatively, excellent criteria can be developed using incident databases and the results of other hazard identification tools such as the Operations Analysis and Logic Diagrams, etc. Some these have been computerized to facilitate entry and processing of data. Once criteria are developed, a schedule is created and inspections are begun. The inspection itself must be as positive an experience as possible for the people whose activity is being inspected. Personnel performing inspections should be carefully trained, not only in the technical processes involved, but also in human relations. During inspections, the ORM concept encourages another departure from traditional inspection practices. This makes it possible to evaluate the trend in organization performance by calculating the percentage of unsafe (non-standard) versus safe (meet or exceed standard) observations. Once the observations are made the data must be carefully entered in the overall hazard inventory database. Once in the database the data can be analyzed as part of the overall body of data or as a mini-database composed of inspection findings only. RESOURCES: There are many inspection criteria, checklists and related job aids available commercially. Many have been tailored for specific types of organizations and activities. The System Safety Office can be a valuable resource in the development of criteria and can provide technical support in the form of interpretations, procedural guidance, and correlation of data. COMMENTS: Inspections and surveys have long track records of success in detecting hazards and reducing risk. However, they have been criticized as being inconsistent with modern management practice because they are a form of “downstream” quality control. By the time a hazard is detected by an inspection, it may already have caused loss. The ORM approach to inspections emphasizes focus on the FAA System Safety Handbook, Appendix F December 30, 2000 F-29 higher risks within the organization and emphasizes the use of management and safety program surveys that detect the underlying causes of hazards, rather than the hazards themselves. EXAMPLES: Conventional inspections normally involve seeking and recording unsafe acts or conditions. The number of these may reflect either the number of unsafe acts or conditions occurring in the organization or the extent of the effort extended to find hazards. Thus, conventional inspections are not a reliable indicator of the extent of risk. To change the nature of the process, it is often only necessary to record the total number of observations made of key behaviors, then determine the number of unsafe behaviors. This yields a rate of “unsafeness” that is independent of the number of observations made. 1.2.7 THE JOB HAZARD ANALYSIS FORMAL NAME: The Job Hazard Analysis ALTERNATIVE NAMES: The task analysis, job safety analysis, JHA, JSA PURPOSE: The purpose of the Job Hazard Analysis (JHA) is to examine in detail the safety considerations of a single job. A variation of the JHA called a task analysis focuses on a single task, i.e., some smaller segment of a “job.” APPLICATION: Some organizations have established the goal of completing a JHA on every job in the organization. If this can be accomplished cost effectively, it is worthwhile. Certainly, the higher risk jobs in an organization warrant application of the JHA procedure. Within the risk management approach, it is important that such a plan be accomplished by beginning with the most significant risk areas first. The JHA is best accomplished using an outline similar to the one illustrated at Figure 1.2.7A. As shown in the illustration, the job is broken down into its individual steps. Jobs that involve many quite different tasks should be handled by analyzing each major task separately. The illustration considers risks both to the workers involved, and to the system, as well as. Risk controls for both. Tools such as the Scenario and "What If" tools can contribute to the identification of potential hazards. There are two alternative ways to accomplish the JHA process. A safety professional can complete the process by asking questions of the workers and supervisors involved. Alternatively, supervisors could be trained in the JHA process and directed to analyze the jobs they supervise. FAA System Safety Handbook, Appendix F December 30, 2000 F-30 Figure 1.2.7A Sample Job Hazard Analysis Format Job Safety Analysis Job Title or Operation Page of ISA Number Job Series/AFSC Supervisor Organization Symbol Location/Building Number Shop Title Reviewed By Required and/or Recommended Personal Protective Equipment Approved By SEQUENCE OF BASIC JOB STPES POTENTIAL HAZARDS USAFE ACTS OR CONDITIONS RECOMMENDED ACTION OR PROCEDURE RESOURCES: The System Safety Office has personnel trained in detail in the JHA process who can serve as consultants, and may have videos that walk a person through the process. COMMENTS: The JHA is risk management. The concept of completing in-depth hazard assessments of all jobs involving significant risk with the active participation of the personnel doing the work is an ideal model of ORM in action. FAA System Safety Handbook, Appendix F December 30, 2000 F-31 1.2.8 THE OPPORTUNITY ASSESSMENT FORMAL NAME: The Opportunity Assessment ALTERNATIVE NAMES: The opportunity-risk tool PURPOSE: The Opportunity Assessment is intended to identify opportunities to expand the capabilities of the organization and/or to significantly reduce the operational cost of risk control procedures. Either of these possibilities means expanded capabilities. APPLICATION: Organizations should systematically assess their capabilities on a regular basis, especially in critical areas. The Opportunity Assessment can be one of the most useful tools in this process and therefore should be completed on all-important operations and then be periodically updated. METHOD: The Opportunity Assessment involves five key steps as outlined at Figure 1.2.10A. In Step 1, operational areas that would benefit substantially from expanded capabilities are identified and prioritized. Additionally, areas where risk controls are consuming extensive resources or are otherwise constraining operation capabilities are listed and prioritized. Step 2 involves the analysis of the specific risk-related barriers that are limiting the desired expanded performance or causing the significant expense. This is a critical step. Only by identifying the risk issues precisely can focused effort be brought to bear to overcome them. Step 3 attacks the barriers by using the risk management process. This normally involves reassessment of the hazards, application of improved risk controls, improved implementation of existing controls, or a combination of these options. Step 4 is used when available risk management procedures don’t appear to offer any breakthrough possibilities. In these cases the organization must seek out new ORM tools using benchmarking procedures or, if necessary, innovate new procedures. Step 5 involves the exploitation of any breakthroughs achieved by pushing the operational limits or cost saving until a new barrier is reached. The cycle then repeats and a process of continuous improvement begins. Figure 1.2.9A Opportunity Analysis Steps RESOURCES: The Opportunity Assessment depends upon a detailed understanding of operational processes so that barriers can be identified. An effective Opportunity Assessment will necessarily involve operations experts. Step 1. Review key operations to identify opportunities for enhancement. Prioritize. Step 2. In areas where opportunities exist, analyze for risk barriers. Step 3. When barriers are found, apply the ORM process. Step 4. When available ORM processes can’t breakthrough, innovate! Step 5. When a barrier is breached, push through until a new barrier is reached. FAA System Safety Handbook, Appendix F December 30, 2000 F-32 1.3 THE ADVANCED HAZARD IDENTIFICATION TOOLS The five tools that follow are advanced hazard identification tools designed to support strategic hazard analysis of higher risk and critical operations. These advanced tools are often essential when in-depth hazard identification is needed. They provide the mechanism needed to push the limits of current hazard identification technology. For example, the Management Oversight and Risk Tree (MORT) represents the full-time efforts of dozens of experts over decades to fully develop an understanding of all of the sources of hazards. As might be expected, these tools are complex and require significant training to use. Full proficiency also requires experience in using them. They are best reserved for use by, loss control professionals. Those with an engineering, scientific, or other technical background are certainly capable of using these tools with a little read-in. Even though professionals use the tools, much of the data that must be fed into the procedures must come from operators. In an organization with a mature ORM culture, all personnel in the organization will be aware that higher risk justifies more extensive hazard identification. They will feel comfortable calling for help from loss control professionals, knowing that these individuals have the advanced tools needed to cope with the most serious situations. These advanced tools will play a key role in the mature ORM culture in helping the organization reach its hazard identification goal: No significant hazard undetected. 1.3.1 THE ENERGY TRACE AND BARRIER ANALYSIS FORMAL NAME: The Energy Trace and Barrier Analysis ALTERNATIVE NAMES: Abnormal energy exchange PURPOSE: The Energy Trace and Barrier Analysis (ETBA) is a procedure intended to detect hazards by focusing in detail on the presence of energy in a system and the barriers for controlling that energy. It is conceptually similar to the Interface Analysis in its focus on energy forms, but is considerably more thorough and systematic. APPLICATION: The ETBA is intended for use by loss system safety professionals and is targeted against higher risk operations, especially those involving large amounts of energy or a wide variety of energy types. The method is used extensively in the acquisition of new systems and other complex systems. METHOD: The ETBA involves 5 basic steps as shown at Figure 1.3.1A. Step 1 is the identification of the types of energy found in the system. It often requires considerable expertise to detect the presence of the types of energy listed at Figure 1.3.1B. Step 2 is the trace step. Once identified as present, the point of origin of a particular type of energy must be determined and then the flow of that energy through the system must be traced. In Step 3 the barriers to the unwanted release of that energy must be analyzed. For example, electrical energy is usually moved in wires with an insulated covering. FAA System Safety Handbook, Appendix F December 30, 2000 F-33 In Step 4 the risk of barrier failure and the unwanted release of the energy are assessed. Finally, in Step 5, risk control options are considered and selected. Figure 1.3.1A ETBA Steps Figure 1.3.1B Types of Energy RESOURCES: This tool requires sophisticated understanding of the technical characteristics of systems and of the various energy types and barriers. Availability of a safety professional, especially a safety engineer or other professional engineer is important. COMMENTS: Most accidents involve the unwanted release of one kind of energy or another. This fact makes the ETBA a powerful hazard identification tool. When the risk stakes are high and the system is complex, the ETBA is a must have. EXAMPLES: A simplified example of the ETBA procedure is provided at Figure 1.3. Step 1. Identify the types of energy present in the system Step 2. Locate energy origin and trace the flow Step 3. Identify and evaluate barriers (mechanisms to confine the energy) Step 4. Determine the risk (the potential for hazardous energy to escape control and damage something significant) Step 5. Develop improved controls and implement as appropriate Electrical Kinetic (moving mass e.g. a vehicle, a machine part, a bullet) Potential (not moving mass e.g. a heavy object suspended overhead) Chemical (e.g. explosives, corrosive materials) Noise and Vibration Thermal (heat) Radiation (Non-ionizing e.g. microwave, and ionizing e.g. nuclear radiation, x-rays) Pressure (air, Hydraulic, water) FAA System Safety Handbook, Appendix F December 30, 2000 F-34 Figure 1.3.1C Example ETBA Scenario: The supervisor of a maintenance facility has just investigated a serious incident involving one of his personnel who received a serious shock while using a portable power drill in the maintenance area. The tool involved used a standard three-prong plug. Investigation revealed that the tool and the receptacle were both functioning properly. The individual was shocked when he was holding the tool and made contact with a piece of metal electrical conduit (it one his drill was plugged into) that had become energized as a result of an internal fault. As a result the current flowed through the individual to the tool and through the grounded tool to ground resulting in the severe shock. The supervisor decides to fully assess the control of electrical energy in this area. Option 1. Three prong tool. Electrical energy flow that is from the source through an insulated wire, to the tool, to a single insulated electric motor. In the event of an internal fault the flow is from the case of the tool through the ground wire to ground through the grounded third prong through a properly grounded receptacle. Hazards: Receptacle not properly grounded, third prong removed, person provides lower path of resistance, break in any of the ground paths (case, cord, plug, and receptacle). These hazards are serious in terms of the frequency encountered in the work environment and might be expected to be present in 10% or more cases. Option 2. Double insulated tool. The tool is not grounded. Protection that is provided by double insulating the complete flow of electrical energy at all points in the tool. In the event of an internal fault, there are two layers of insulation protection between the fault and the person preventing shorting through the user. Hazards: If the double layers of insulation are damaged as a result of extended use, rough handling, or repair/maintenance activity, the double insulation barrier can be compromised. In the absence of a fully effective tool inspection and replacement program such damage is not an unusual situation. Option 3. Grand Fault Circuit Fault Interrupters. Either of the above types of tools is used (double insulated is preferred). Electrical energy flows as described above in both the normal and fault situations. However, in the event of a fault (or any other cause of a differential between the potential of a circuit), it is detected almost instantly and the circuit is opened preventing the flow of dangerous amounts of current. Because no dangerous amount of current can flow the individual using the tool is in no danger of shock. Circuit interrupters are reliable at a level of 1 in 10,000 or higher and when they do fail, most failure modes are in the fail-safe mode. Ground Fault circuit fault interrupters are inexpensive to purchase and relatively easy to install. In this case, the best option is very likely to be the use of the circuit interrupter in connection with either Option 1 or 2, with 2 the preferred. This combination for all practical purposes eliminates the possibility of electric shock and injury/death as a result of using portable power tools. FAA System Safety Handbook, Appendix F December 30, 2000 F-35 1.3.2 THE FAULT TREE ANALYSIS FORMAL NAME: The Fault Tree Analysis ALTERNATIVE NAMES: The logic tree PURPOSE: The Fault Tree Analysis (FTA) is a hazard identification tool based on the negative type Logic Diagram. The FTA adds several dimensions to the basic logic tree. The most important of these additions are the use of symbols to add information to the trees and the possibility of adding quantitative risk data to the diagrams. With these additions, the FTA adds substantial hazard identification value to the basic Logic Diagram previously discussed. APPLICATION: Because of its relative complexity and detail, it is normally not cost effective to use the FTA against risks assessed below the level of extremely high or high. The method is used extensively in the acquisition of new systems and other complex systems where, due to the complexity and criticality of the system, the tool is a must. METHOD: The FTA is constructed exactly like a negative Logic Diagram except that the symbols depicted in Figure 1.3.2A are used. FAA System Safety Handbook, Appendix F December 30, 2000 F-36 Figure 1.3.2A Key Fault Tree Analysis Symbols The output event. Identification of a particular event in the sequence of an operation. A basic event. . An event, usually a malfunction, for which further causes are not normally sought. A normal event. An event in an operational sequence that is within expected performance standards . An “AND” gate. Requires all of the below connected events to occur before the above connected event can occur . An “OR” gate. Any one of the events can independently cause the event placed above the OR gate . An undeveloped event. This is an event not developed because of lack of information or the event lacks significance. Transfer symbols. These symbols transfer the user to another part of the diagram. These symbols are used to eliminate the need to repeat identical analyses that have been completed in connection with another part of the fault tree. RESOURCES: The System Safety Office is the best source of information regarding Fault Tree Analysis. Like the other advanced tools, the FTA will involve the consultation of a safety professional or engineer trained in the use of the tool. If the probabilistic aspects are added, it will also require a database capable of supplying the detailed data needed. COMMENTS: The FTA is one of the few hazard identification procedures that will support quantification when the necessary data resources are available. EXAMPLE: A brief example of the FTA is provided at Figure 1.3.2B. It illustrates how an event may be traced to specific causes that can be very precisely identified at the lowest levels. FAA System Safety Handbook, Appendix F December 30, 2000 F-37 Figure 1.3.2B Example of Fault Tree Analysis Fire Occurs in Storeroom Combustibles stored in storeroom Ignition source In storeroom Stock Material Degrades to Combustible State Electrical Spark Occurs Direct Thermal Energy Present Radiant Thermal Energy Raises Combustibles Leak into Storeroom Combustibles Stored in Storeroom Airflow < Critical Valve And Or Or 1.3.3 THE FAILURE MODES AND EFFECTS ANALYSIS FORMAL NAME: The Failure Modes and Effects Analysis ALTERNATIVE NAMES: The FMEA PURPOSE: The Failure Modes and Effects Analysis (FMEA) is designed to evaluate the impact due to the failure of various system components. A brief example of FMEA illustrating this purpose is the analysis of the impact of the failure of the communications component (radio, landline, computer, etc.) of a system on the overall operation. The focus of the FMEA is on how such a failure could occur (failure mode) and the impact of such a failure (effects). APPLICATION: The FMEA is generally regarded as a reliability tool but most operational personnel can use the tool effectively. The FMEA can be thought of as a more detailed “What If” analysis. It is especially useful in contingency planning, where it is used to evaluate the impact of various possible failures (contingencies). The FMEA can be used in place of the "What If" analysis when greater detail is needed or it can be used to examine the impact of hazards developed using the "What If" tool in much greater detail. FAA System Safety Handbook, Appendix F December 30, 2000 F-38 METHOD: The FMEA uses a worksheet similar to the one illustrated at Figure 1.3.3A. As noted on the sample worksheet, a specific component of the system to be analyzed is identified. Several components can be analyzed. For example, a rotating part might freeze up, explode, breakup, slow down, or even reverse direction. Each of these failure modes may have differing impacts on connected components and the overall system. The worksheet calls for an assessment of the probability of each identified failure mode. Figure 1.3.3A Sample Failure Mode sand Effects Analysis Worksheet FAILURE MODES AND EFFECTS ANALYSIS Page ___of ___Pages System_________________________ Date_______________ Subsystem _____________________ Analyst_____________ Component Description Failure Mode Effects on Other Components Effects On System RAC or Hazard Category Failure Frequency Effects Probability Remarks RESOURCES: The best source of more detailed information on the FMEA is the System Safety Office. EXAMPLES: An example of the FMEA is provided at Figure 1.3.3B. Figure 1.3.3B Example FMEA Situation: The manager of a major facility is concerned about the possible impact of the failure of the landline communications system that provides the sole communications capability at the site. The decision is made to do a Failure Modes and Effects Analysis. An extract from the resulting FMEA is shown below. Component Function Failure Mode & Cause Failure Effect on Higher Item System Probability Corrective Action Landline Wire Comm Cut-natural cause, falling tree, etc. Comm system down Cease Fire Probable Clear natural obstacle from around wires FAA System Safety Handbook, Appendix F December 30, 2000 F-39 Wire Cut-unrelated operational activities Comm system down Cease Fire Probable Warn all operations placement of wire Wire Line failure Comm system down Cease Fire Probable Placement of wires Proper grounding Wire Cut – vandals & thieves Comm system down Cease Fire Unlikely Placement of wires Area security 1.3.4 THE MULTI-LINEAR EVENTS SEQUENCING TOOL FORMAL NAME: The Multi-linear Events Sequencing Tool ALTERNATIVE NAMES: The timeline tool, the sequential time event plot (STEP) 2 PURPOSE: The Multi-linear Events Sequencing Tool (MES) is a specialized hazard identification procedure designed to detect hazards arising from the time relationship of various operational activities. The MES detects situations in which either the absolute or relative timing of events may create risk. For example, an operational planner may have crammed too many events into a single period of time, creating a task overload problem for the personnel involved. Alternatively, the MES may reveal that two or more events in an operational plan conflict because a person or piece of equipment is required for both but obviously cannot be in two places at once. The MES can be used as a hazard identification tool or as an incident investigation tool. APPLICATION: The MES is usually considered a loss prevention method, but the MES worksheet simplifies the process to the point that a motivated individual can effectively use it. The MES should be used any time that risk levels are significant and when timing and/or time relationships may be a source of risk. It is an essential tool when the time relationships are relatively complex. METHOD: The MES uses a worksheet similar to the one illustrated at Figure 4.1. The sample worksheet displays the timeline of the operation across the top and the “actors” (people or things) down the left side. The flow of events is displayed on the worksheet, showing the relationship between the actors on a time basis. Once the operation is displayed on the worksheet, the sources of risk will be evident as the flow is examined. 2 K. Hendrisk, and L. Benner, Investigating Accidents with Step, Marcel Dekker, New York, 1988. FAA System Safety Handbook, Appendix F December 30, 2000 F-40 Figure 1.3.4A Multi-linear Events Sequencing Form (Time units in seconds or minutes as needed) Actors Timeline (People or things involved in the process) RESOURCES: The best sources for more detailed information on the MES is the System Safety staff. As with the other advanced tools, using the MES will normally involve consultation with a safety professional familiar with its application. COMMENTS: The MES is unique in its role of examining the time-risk implications of operations. 1.3.5 THE MANAGEMENT OVERSIGHT AND RISK TREE FORMAL NAME: The Management Oversight and Risk Tree ALTERNATIVE NAMES: The MORT PURPOSE: The Management Oversight and Risk Tree (MORT) uses a series of charts developed and perfected over several years by the Department of Energy in connection with their nuclear safety programs. Each chart identifies a potential operating or management level hazard that might be present in an operation. The attention to detail characteristic of MORT is illustrated by the fact that the full MORT diagram or tree contains more than 10,000 blocks. Even the simplest MORT chart contains over 300 blocks. The full application of MORT is a time-consuming and costly venture. The basic MORT chart with about 300 blocks can be routinely used as a check on the other hazard identification tools. By reviewing the major headings of the MORT chart, an analyst will often be reminded of a type of hazard that was overlooked in the initial analysis. The MORT diagram is also very effective in assuring attention to the underlying management root causes of hazards. APPLICATION: Full application of MORT is reserved for the highest risks and most operation-critical activities because of the time and expense required. MORT generally requires a specially trained loss control professional to assure proper application. METHOD: MORT is accomplished using the MORT diagrams, of which there are several levels available. The most comprehensive, with about 10,000 blocks, fills a book. There is an intermediate diagram with about 1500 blocks, and a basic diagram with about 300. It is possible to tailor a MORT diagram by choosing various branches of the tree and using only those segments. The MORT is essentially a negative tree, so the process begins by placing an undesired loss event at the top of the FAA System Safety Handbook, Appendix F December 30, 2000 F-41 diagram used. The user then systematically responds to the issues posed by the diagram. All aspects of the diagram are considered and the “less than adequate” blocks are highlighted for risk control action. RESOURCES: The best source of information on MORT is the System Safety Office. COMMENTS: The MORT diagram is an elaborate negative Logic Diagram. The difference is primarily that the MORT diagram is already filled out for the user, allowing a person to identify the contributory factors for a given undesirable event. Since the MORT is very detailed, as mentioned above, a person can identify basic causes for essentially any type of event. EXAMPLES: The top blocks of the MORT diagram are displayed at Figure 1.3.5A. Figure 1.3.5A Example MORT Section Accidental Losses Oversights & Omissions Assumed Risk Operational System Factors LTA Management System Factors LTA 2.0 RISK ASSESSMENT TOOLS, DETAILS, AND EXAMPLES Introduction. This section contains an example of assessing risk, using a risk assessment matrix (Figure 2). The easiest way to understand the application of the matrix is to apply it. The reasoning used in constructing the matrix in the example below is provided. FAA System Safety Handbook, Appendix F December 30, 2000 F-42 Example. The example below demonstrates the application of the matrix to the risk associated with moving a heavy piece of machinery. Risk to be assessed: The risk of the machine falling over and injuring personnel. Probability assessment: The following paragraphs illustrate the thinking process that might be followed in developing the probability segment of the risk assessment: Use previous experience and the database, if available. “We moved a similar machine once before and although it did not fall over, there were some close calls. This machine is not as easy to secure as that machine and has a higher center of gravity and poses an even greater chance of falling. The base safety office indicates that there was an accident about 18 months ago that involved a similar operation. An individual received a broken leg in that case.” Use the output of the hazard analysis process. “Our hazard analysis shows that there are several steps in the machine movement process where the machine is vulnerable to falling. Furthermore, there are several different types of contributory hazards that could cause the machine to fall. Both these factors increase the probability of falling.” Consider expert opinion. “My experienced manager feels that there is a real danger of the machine falling” Consider your own intuition and judgment. “My gut feeling is that there is a real possibility we could lose control of this machine and topple it. The fact that we rarely move machines quite like this one increases the probability of trouble.” Refer to the matrix terms. “Hmmm, the decision seems to be between likely and occasional. I understand likely to mean that the machine is likely to fall, meaning a pretty high probability. Certainly there is a real chance it may fall, but if we are careful, there should be no problem. I am going to select Occasional as the best option from the matrix.” Severity assessment. The following illustrates the thinking process that might occur in selecting the severity portion of the risk assessment matrix for the machine falling risk: Identify likely outcomes. “If the machine falls, it will crush whatever it lands on. Such an injury will almost certainly be severe. Because of the height of the machine, it can easily fall on a person’s head and body with almost certain fatal results. There are also a variety of different crushing injuries, especially of the feet, even if the machine falls only a short distance. Identify the most likely outcomes. “Because of the weight of the machine, a severe injury is almost certain. Because people are fairly agile and the fact that the falling machine gives a little warning that it is falling, death is not likely.” Consider factors other than injuries. “We identified several equipment and facility items at risk. Most of these we have guarded, but some are still vulnerable. If the machine falls nobody can do any thing to protect these items. It would take a couple of days at least to get us back in full production.” FAA System Safety Handbook, Appendix F December 30, 2000 F-43 Refer to the matrix (see Figure 2.1A). “Let’s see, any injury is likely to be severe, but a fatality is not very probable, property damage could be expensive and could cost us a lot of production time. Considering both factors, I think that critical is the best choice.” Combine probability and severity in the matrix. The thinking process should be as follows: The probability category occasional is in the middle of the matrix (refer to the matrix below). I go down until it meets the critical category coming from the left side. The result is a high rating. I notice that it is among the lower high ratings but it is still high.” Figure 2.1A Risk Assessment Matrix Probability Frequent Likely Occasional Seldom Unlikely I II III IV Catastrophic Critical Moderate Negligible A B C D E S E V E R I T Y Extremely High High Medium Low Medium Extremely High Risk Levels Limitations and concerns with the use of the matrix. As you followed the scenario above, you may have noted that there are some problems involved in using the matrix. These include the following: Subjectivity. There are at least two dimensions of subjectivity involved in the use of the matrix. The first is in the interpretation of the matrix categories. Your interpretation of the term “critical” may be quite different from mine. The second is in the interpretation of the risk. If a few weeks ago I saw a machine much like the one to be moved fall over and crush a person to death, I might have a greater tendency to rate both the probability and severity higher than someone who did not have such an experience. If time and resources permit, averaging the rating of several can reduce this variation personnel. Inconsistency. The subjectivity described above naturally leads to some inconsistency. A risk rated very high in one organization may only have a high rating in another. This becomes a real problem if the two risks are competing for a limited pot of risk control resources (as they always are). There will be real motivation to inflate risk assessments to enhance competitiveness for limited resources. FAA System Safety Handbook, Appendix F December 30, 2000 F-44 3.0 RISK CONTROL OPTION ANALYSIS TOOLS, DETAILS, AND EXAMPLES 3.1 BASIC RISK CONTROL OPTIONS Major risk control options and examples of each are as follows: Reject a risk. We can and should refuse to take a risk if the overall costs of the risk exceed its benefits. For example, planner may review the risks associated with a specific particular operation or task. After assessing all the advantages and evaluating the increased risk associated with it, even after application of all available risk controls, he decides the benefits do not outweigh the expected risk costs and it is better off in the long run not doing the operation or task. Avoiding risk altogether requires canceling or delaying the job, or operation, but is an option that is rarely exercised due to operational importance. However, it may be possible to avoid specific risks: risks associated with a night operation may be avoided by planning the operation for daytime, likewise thunderstorms can be avoided by changing the route of flight. Delaying a risk. It may be possible to delay a risk. If there is no time deadline or other operational benefit to speedy accomplishment of a risky task, then it is often desirable delay the acceptance of the risk. During the delay, the situation may change and the requirement to accept the risk may go away. During the delay additional risk control options may become available for one reason or another (resources become available, new technology becomes available, etc.) thereby reducing the overall risk. Risk transference does not change probability or severity of the risk, but it may decrease the probability or severity of the risk actually experienced by the individual or organization accomplishing the activity. As a minimum, the risk to the original individual or organization is greatly decreased or eliminated because the possible losses or costs are shifted to another entity. Risk is commonly spread out by either increasing the exposure distance or by lengthening the time between exposure events. Aircraft may be parked so that an explosion or fire in one aircraft will not propagate to others. Risk may also be spread over a group of personnel by rotating the personnel involved in a high-risk operation. Compensate for a risk. We can create a redundant capability in certain special circumstances. Flight control redundancy is an example of an engineering or design redundancy. Another example is to plan for a back up, and then when a critical piece of equipment or other asset is damaged or destroyed we have capabilities available to bring on line to continue the operation. Risk can be reduced. The overall goal of risk management is to plan operations or design systems that do not contain hazards and risks. However, the nature of most complex operations and systems makes it impossible or impractical to design them completely risk-free. As hazard analyses are performed, hazards will be identified that will require resolution. To be effective, risk management strategies must address the components of risk: probability, severity, or exposure. A proven order of precedence for dealing with risks and reducing the resulting risks is: FAA System Safety Handbook, Appendix F December 30, 2000 F-45 Plan or Design for Minimum Risk. From the first, plan the operation or design the system to eliminate risks. Without hazards there is no probability, severity or exposure. If an identified risk cannot be eliminated, reduce the associated risk to an acceptable level. Flight control components can be designed so they cannot be incorrectly connected during maintenance operations as an example. Incorporate Safety Devices. If identified hazards cannot be eliminated or their associated risk adequately reduced by modifying the operation or system elements or their inputs, that risk should be reduced to an acceptable level through the use of safety design features or devices. Safety devices can effect probability and reduce severity: an automobile seat belt doesn’t prevent a collision but reduces the severity of injuries. Provide Warning Devices. When planning, system design, and safety devices cannot effectively eliminate identified hazards or adequately reduces associated risk, warning devices should be used to detect the condition and alert personnel of the hazard. As an example, aircraft could be retrofitted with a low altitude ground collision warning system to reduce controlled flight into the ground risks. Warning signals and their application should be designed to minimize the probability of the incorrect personnel reaction to the signals and should be standardized. Flashing red lights or sirens are a common warning device that most people understand. Develop Procedures and Training. Where it is impractical to eliminate hazards through design selection or adequately reduce the associated risk with safety and warning devices, procedures and training should be used. A warning system by itself may not be effective without training or procedures required to respond to the hazardous condition. The greater the human contribution to the functioning of the system or involvement in the operational process, the greater the chance for variability. However, if the system is well designed and the operation well planned, the only remaining risk reduction strategies may be procedures and training. Emergency procedure training and disaster preparedness exercises improve human response to hazardous situations. In most cases it will not be possible to eliminate safety risk entirely, but it will be possible to significantly reduce it. There are many risk reduction options available. Examples are included in the next section. 3.1.1 THE RISK CONTROL OPTIONS MATRIX The sample risk control options matrix, illustrated at Figure 3.1.1A, is designed to develop a detailed and comprehensive list of risk control options. These options are listed in priority order of preference, all things being equal, therefore start at the top and consider each option in turn. Add those controls that appear suitable and practical to a list of potential options. Examples of control options for each are suggested in Figure 3.1.1B. Many of the options may be applied at more than one level. For example, the training option may be applied to operators, supervisors, more senior leaders, or staff personnel. Figure 3.1.1A Sample Risk Control Options Matrix OPTONS OPERATOR LEADER STAFF MGR ENGINEER (Energy Mgt) Limit Energy Substitute Safer Form Prevent Buildup Prevent Release Provide Slow Release FAA System Safety Handbook, Appendix F December 30, 2000 F-46 OPTONS OPERATOR LEADER STAFF MGR Rechannel/separate In Time/Space Provide Special Maint of Controls GUARD On Source Barrier Between On Human or Object Raise Threshold (harden) IMPROVE TASK DESIGN Sequence of Events (Flow) Timing (within tasks, between tasks) Human-Machine Interface/Ergonomics Simplify Tasks Reduce Task Loads (physical, mental, emotional) Backout Options LIMIT EXPOSURE Number of People or Items Time Iterations SELECTION OF PERSONNEL Mental Criteria Emotional Criteria Physical Criteria Experience TRAIN AND EDUCATE Core Tasks (especially critical tasks) Leader Tasks Emergency/Contingency Tasks Safety Tasks Rehearsals WARN Signs/Color Coding Audio/Visual Alarms Briefings MOTIVATE Measurable Standards Essential Accountability Positive/negative Incentives Competition Demonstrations of Effects REDUCE EFFECTS Emergency Equipment Rescue Capabilities FAA System Safety Handbook, Appendix F December 30, 2000 F-47 OPTONS OPERATOR LEADER STAFF MGR Emergency Medical Care Emergency Procedures Damage Control Procedures/Plans Backups/Redundant Capabilities REHABILITATE Personnel Facilities/equipment Operational Capabilities Figure 3.1.1B Example Risk Control Options Matrix OPTIONS SOME EXAMPLES ENGINEER (Energy Mgt.). Limit Energy Lower voltages, small amount of explosives, reduce heights, and reduce speeds Substitute Safer Form Use air power, less hazardous chemicals, more stable explosives/chemicals Prevent Buildup Use automatic cutoffs, blowout panels, limit momentum, governors Prevent Release Containment, double/triple containment Provide Slow Release Use pressure relief valves, energy absorbing materials Rechannel/separate in Time/Space Automatic processes, deviators, barriers, distance Provide Special Maint of Controls Special procedures, special checks/audits GUARD. On Source Fire suppression systems, energy absorbing systems (crash walls, etc.) Barrier between Revetments, walls, distance On Human or Object Personal protective equipment, energy absorbing materials Raise Threshold (harden) Acclimatization, over-design, reinforcement, physical conditioning IMPROVE TASK DESIGN. Sequence of Events (Flow) Put tough tasks first before fatigue, don’t schedule several tough tasks in a row Timing (within tasks, between tasks) Allow sufficient time to perform, to practice. Allow adequate time between tasks Man-Machine Interface/Ergonomics Assure equipment fits the people, and effective ergonomic design Simplify Tasks Provide job aids, reduce steps, provides tools like lifters communications aids FAA System Safety Handbook, Appendix F December 30, 2000 F-48 OPTIONS SOME EXAMPLES Reduce Task Loads (physical, mental, emotional) Set weight limits; automate mental calculations and some monitoring tasks. Avoid excessive stress, provide breaks, vacations, and spread risk among many Bucket Options Establish points where process reversal is possible when hazard is detected LIMIT EXPOSURE. Number of People or Items Only expose essential personnel & things Time Minimize the time of exposure -Don’t bring the explosives until the last minute Iterations Don’t do it as often SELECTION OF PERSONNEL. Mental Criteria Essential basic intelligence, and essential skills and proficiency Emotional Criteria Essential stability and maturity Physical Criteria Essential strength, motor skills, endurance, size Experience Demonstrated performance abilities TRAIN AND EDUCATE. Core Tasks (especially critical tasks) Define critical minimum abilities, train, test and score Leader Tasks Define essential leader tasks and standards, train, test and score Emergency Contingency Tasks Define, assign, train, verify ability Safety Tasks Hazard identification, risk controls, maintenance of standards Rehearsals Validate processes, validate skills, verify interfaces WARN. Signs/Color Coding Warning signs, instruction signs, traffic signs Audio/Visual Alarms Bells, flares, flashing lights, klaxons, whistles Briefings Refresher warnings, demonstrate hazards, refresh training MOTIVATE. Measurable Standards Define minimum acceptable risk controls, see that tasks are assigned Essential Accountability Check performance at an essential level of frequency and detail Positive/negative Incentives Meaningful individual & group rewards, punishment Competition Healthy individual and group competition on a fair basis Demonstrations of Effects Graphic, dynamic, but tasteful demonstrations of effects of unsafe acts REDUCE EFFECTS. Emergency Equipment Fire extinguishers, first aid materials, spill containment materials Rescue Capabilities A rescue squad, rescue equipment, helicopter rescue FAA System Safety Handbook, Appendix F December 30, 2000 F-49 OPTIONS SOME EXAMPLES Emergency Medical Care Trained first aid personnel, medical facilities Emergency Damage Control Procedures Emergency responses for anticipated contingencies, coordinating agencies Backups/Redundant Capabilities Alternate ways to continue the operation if primaries are lost REHABILITATE. Personnel Rehabilitation services restore confidence Facilities/equipment Get key elements back in service Operational Capabilities Focus on restoration of the operation 4.0 MAKE CONTROL DECISIONS TOOLS, DETAILS, AND EXAMPLES Introduction. Making control decisions includes the basic options (reject, transfer, spread, etc.) as well as a comprehensive list of risk reduction options generated through use of the risk control options matrix by a decision-maker. The decision-making organization requires a procedure to establish, as a matter of routine, who should make various levels of risk decisions. Finally, after the best available set of risk controls is selected the decision-maker will make a final go/no-go decision. Developing a decision-making process and system: Risk decision-making should be scrutinized in a risk decision system. This system will produce the following benefits: · Promptly get decisions to the right decision-makers · Create a trail of accountability · Assure that risk decisions involving comparable levels of risk are generally made at comparable levels of management · Assure timely decisions · Explicitly provide for the flexibility in the decision-making process required by the nature of operations. · A decision matrix is an important part of a good decision-making system. These are normally tied directly to the risk assessment process. Selecting the best combination of risk controls: This process can be made as simple as intuitively choosing what appears to be the best control or group of controls, or so complex they justify the use of the most sophisticated decision-making tools available. For most risks involving moderate levels of risk and relatively small investments in risk controls, the intuitive method is fully satisfactory. Guidelines for intuitive decisions are: Don’t select control options to produce the lowest level of risk, select the combination yielding the most operational supportive level of risk. This means keeping in mind the need to take risks when those appropriate risks are necessary for improved performance. Be aware that some risk controls are incompatible. In some cases using risk control A will cancel the effect of risk control B. Obviously using both A and B is wasting resources. For example, a fully FAA System Safety Handbook, Appendix F December 30, 2000 F-50 effective machine guard may make it completely unnecessary to use personnel protective equipment such as goggles and face shields. Using both will waste resources and impose a burden on operators. Be aware that some risk controls reinforce each other. For example, a strong enforcement program to discipline violators of safety rules will be complemented by a positive incentive program to reward safe performance. The impact of the two coordinated together will usually be stronger than the sum of their impacts. Evaluate full costs versus full benefits. Try to evaluate all the benefits of a risk and evaluate them against all of the costs of the risk control package. Traditionally, this comparison has been limited to comparisons of the incident/accident costs versus the safety function costs. When it is supportive, choose redundant risk controls to protect against risk in-depth. Keep in mind the objective is not risk control, it is optimum risk control. Selecting risk controls when risks are high and risk control costs are important - cost benefit assessment. In these cases, the stakes are high enough to justify application of more formal decision-making processes. All of the tools existing in the management science of decision-making apply to the process of risk decision-making. Two of these tools should be used routinely and deserve space in this publication. The first is cost benefit assessment, a simplified variation of cost benefit analysis. Cost benefit analysis is a science in itself, however, it can be simplified sufficiently for routine use in risk management decisionmaking even at the lowest organizational levels. Some fiscal accuracy will be lost in this process of simplification, but the result of the application will be a much better selection of risk controls than if the procedures were not used. Budget personnel are usually trained in these procedures and can add value to the application. The process involves the following steps: Step 1. Measure the full, lifecycle costs of the risk controls to include all costs to all involved parties. For example, a motorcycle helmet standard should account for the fact that each operator will need to pay for a helmet. Step 2. Develop the best possible estimate of the likely lifecycle benefits of the risk control package to include any non-safety benefits expressed as a dollar estimate. For example, an ergonomics program can be expected to produce significant productivity benefits in addition to a reduction in cumulative trauma injuries. Step 3. Let your budget expert’s fine-tune your efforts. Step 4. Develop the cost benefit ratio. You are seeking the best possible benefit-to-cost ratio but at least 2 to 1. Step 5. Fine-tune the risk control package to achieve an improved “bang for the buck”. The example at Figure 4.1A illustrates this process of fine-tuning applied to an ergonomics-training course (risk control). FAA System Safety Handbook, Appendix F December 30, 2000 F-51 Figure 4.1A Example Maximizing Bang for the Buck Anyone can throw money at a problem. A manager finds the optimum level of resources producing an optimum level of effectiveness, i.e. maximum bang for the buck. Consider an ergonomicstraining program involving training 400 supervisors from across the entire organization in a 4-hour (3 hours training, 1-hour admin) ergonomics-training course that will cost $30,500 including student time. Ergonomics losses have been averaging $300,000 per year and estimates are that the risk control will reduce this loss by 10% or $30,000. On the basis of a cost benefit assessment over the next year (ignoring any out year considerations), this risk control appears to have a one year negative cost benefit ratio i.e. $30,000 in benefit, versus a $30,500 investment, a $500 loss. Apparently it is not a sound investment on a one-year basis. This is particularly true when we consider that most decision-makers will want the comfort of a 2 or 3 to 1 cost benefit ratio to insure a positive outcome. Can this project be turned into a winner? We can make it a winner if able to access risk information concerning ergonomics injuries/illnesses from loss control office data, risk management concepts, and a useful tool called “Pareto’s Law”. Pareto’s Law, as previously mentioned, essentially states that 80% of most problems can be found in 20% of the exposure. For example, 80% of all traffic accidents might involve only 20% of the driver population. We can use this law, guided by our injury/illness data, to turn the training program into a solid winner. Here is what we might do. Step 1. Let’s assume that Pareto’s Law applies to the distribution of ergonomics problems within this organization. If so, then 80% of the ergonomics problem can be found in 20% of our exposures. Our data can tell us which 20%. We can then target the 20% (80 students) of the original 400 students that are accounting for 80% of our ergonomics costs ($240,000). Step 2. Lets also assume that Pareto’s Law applies to the importance of tasks that we intend to teach in the training course. If the three hours of training included 10 tasks, lets assume that two of those tasks (20%) will in fact account for 80% of the benefit of the course. Again our data should be able to indicate this. Lets also assume that by good luck, these two tasks only take the same time to teach as the other eight. We might now decide to teach only these two tasks which will require only 36 minutes (20% of 180 minutes). We will still retain 80% of the $240,000 target value or $192,000. Step 3. Since the training now only requires 36 minutes, we will modify our training procedure to conduct the training in the workshops rather than in a classroom. This reduces our admin time from 1 hour (wash up, travel, get there well before it actually starts, and return to work) to 4 minutes. Our total training time is now 40 minutes. Summary. We are still targeting $192,000 of the original $300,000 annual loss but our cost factor is now 80 employees for 40 minutes at $15/hour, with our teaching cost cut to 1/5th of the $6000 (80 students instead of 400) which is $1200. We still have our staff cost so the total cost of the project is now $2500. We will still get the 10% reduction in the remaining $192,000 that we are still targeting, which totals $19,200. Our cost benefit ratio is now a robust 7.68 to 1. If all goes well with the initial training and we actually demonstrate at 20% loss reduction, we may choose to expand the training to the next riskiest 20% of our 400 personnel which should also produce a very positive return. FAA System Safety Handbook, Appendix F December 30, 2000 F-52 Selecting risk controls when risks are high and risk control costs are important - use of decision matrices. An excellent tool for evaluating various risk control options is the decision matrix. On the vertical dimension of the matrix we list the operation supportive characteristics we are looking for in risk controls. Across the top of the matrix we list the various risk control options (individual options or packages of options). Then we rank each control option on a scale of 1 (very low) to 10 (very high) in each of the desirable characteristics. If we choose to, we can weight each desirable characteristic based on its operational significance and calculate the weighted score (illustrated below). All things being the same, the options with the higher scores are the stronger options. A generic illustration is provided at Figure 4.1B. Figure 4.1B Sample Decision Matrix RATING FACTOR WEIGHT* RISK CONTROL OPTIONS/PACKAGES #1 #2 #3 #4 #5 #6 Low Cost 5 9/45 6/30 4/20 5/25 8/40 8/40 Easy to implement 4 10/40 7/28 5/20 6/24 8/32 8/32 Positive Operator involvement 5 8/40 2/10 1/5 6/30 3/15 7/35 Consistent with Culture 3 10/30 2/6 9/27 6/18 6/18 6/18 Easy to integrate 3 9/27 5/15 6/18 7/21 6/18 5/15 Easy to measure 2 10/20 10/20 10/20 8/16 8/16 5/10 Low risk (sure to succeed) 3 9/27 9/27 10/30 2/6 4/12 5/15 TOTALS 229 136 140 140 151 165 * Weighting is optional and is designed to reflect the relative importance of the various factors. Summary. It is not unusual for a risk control package to cost hundreds of thousands of dollars and even millions over time. Millions of dollars and critical operations may be at risk. The expenditure of several tens of thousands of dollars to get the decision right is sound management practice and good risk management. 5.0 RISK CONTROL IMPLEMENTATION TOOLS AND DETAILS FAA System Safety Handbook, Appendix F December 30, 2000 F-53 5.1 Introduction Figure 5.1A summarizes a Risk Control Implementation model. It is based on accountability being an essential element of risk management success. Organizations and individuals must be held accountable for the risk decisions and actions that they take or the risk control motivation is minimized. The model depicted at Figure 5.1A is the basis of positive accountability and strong risk control behavior. Figure 5.1A Implementation Model 5.2 Applying the model The example below illustrates each step in the model applied to the sometimes-difficult task of assuring that personnel consistently wear and use their protective clothing and equipment. The steps of the model should be applied as follows: 5.2.1 Identify key tasks This step, while obvious however, is critical to actually define the key tasks with enough accuracy that effective accountability is justified. For example, in our example regarding use of protective clothing and equipment, it is essential to identify exactly when the use of such items is required. Is it when I enter the door of a work area? When I approach a machine? How close? What about on the loading dock? Exactly what items are to be worn? Is there any specific way that they should be worn? I can be wearing ear plugs but incorrectly have them stuck in the outer ear, producing little or no noise reduction benefit. Does this meet the requirement? The task needs to be defined with sufficient precision that personnel know what is expected of them and that what is expected of them produces the risk control desired. It is also important that the task be made as simple, pleasant, and trouble free as possible. In this way we significantly increase the ease with which the rest of the process proceeds. 5.2.2 Assign key tasks Personnel need to know clearly what is expected of them especially if they are going to be held accountable for the task. This is normally not difficult. The task can be included in job descriptions, operating instructions, or in the task procedures contained in manuals. It can be very effectively be embedded in training. In less structured situations, it can be a clear verbal order or directive. It is important that the assignment of the task include the specifics of what is expected. 5.2.3 Measure performance The task needs to include at least a basic level of measurement. It is important to note that measurement does not need to include every time the behavior is displayed. It is often perfectly practical to sample performance only once in large number of actions, perhaps as few as one in several hundred actions as long as the sample is a random example of routine behavior. Often the only one who needs to do the measuring is the individual responsible for the behavior. In other situations, the supervisor or an outside auditor may need to do the observing. Performance is compared to the standard, which should have been ID Key Tasks Assign Key Tasks Measure Performan ce Reward Correct Safe Behavior FAA System Safety Handbook, Appendix F December 30, 2000 F-54 communicated to the responsible individual. This step of the process is the rigorous application of the old adage that “What is monitored (or measured) and checked gets done.” 5.2.4 Reward correct behavior and correct inadequate behavior The emphasis should clearly be on reinforcing correct behavior. Reinforcement means any action that increases the likelihood that the person will display the desired behavior again. It can be as informal as a pat on the back or as formal as a major award or cash incentive. Correcting inadequate behavior should be done whenever inadequate behavior is observed. The special case of punishment should only be used when all other means of producing the desired behavior have failed. 5.2.5 Risk control performance If the steps outlined above have been accomplished correctly, the result will be consistent success in controlling risk. Note that and unpleasantness of the task will dictate the extent of the rewards and corrective actions required. The harder the task for whatever reason, the more powerful the rewards and corrective actions needed will be. It is important to make risk control tasks as uncomplicated, and pleasant as possible. 6.0 SUPERVISE AND REVIEW DETAILS AND EXAMPLES Management involves moving a task or an organization toward a goal. To move toward a goal you must have three things. You must have a goal, you must know where you are in relation to that goal, and you must have a plan to reach it. An effective set of risk matrices provides two of the elements. In regard to ORM, indicators should provide information concerning the success or lack of success of controls intended to mitigate a risk. These indicators could focus on those key areas identified during the assessment as being critical to minimizing a serious risk area. Additionally, matrices may be developed to generically identify operations/areas where ORM efforts are needed. A representative set of risk measures that a maintenance shop leader could use to assess the progress of his shop toward the goal of improving safety performance. Similar indicators could be developed in the areas of environment, fire prevention, security, and other loss control areas. The tool control effectiveness index. Establish key indicators of tool control program effectiveness (percentage of tool checks completed, items found by QA, score on knowledge quiz regarding control procedures, etc.). All that is needed is a sampling of data in one or more of these areas. If more than one area is sampled, the scores can be weighted if desired and rolled up into a single tool control index by averaging them. See Figure 6.1A for the example. Figure 6.1A Example Tool Control Effectiveness Measurement The percent of tool checks completed is 94%. Items found by QA. Items were found in 2% of QA inspections (98% were to standard). Tool control quiz score is 88%. If all items are weighted equally (94+98+88 divided by 3 = 93.3) then 93.3 is this quarter’s tool control safety index. Of course, in this index, high scores are desirable. FAA System Safety Handbook, Appendix F December 30, 2000 F-55 The protective clothing and equipment risk index. Shop personnel are using this index measures the effectiveness with which required protective clothing and equipment. Making spot observations periodically during the workday collects data. Data are recorded on a check sheet and are rolled-up monthly. The index is the percent safe observations of the total number of observations made as illustrated at Figure 6.1B. Figure 6.1B Example Safety Observation Measurement The emergency procedures index. This index measures the readiness of the shop to respond to various emergencies such as fires, injuries, and hazmat releases. It is made up of a compilation of indicators as shown at Figure 6.1C A high score is desirable. Figure 6.1C Example Emergency Procedures Measurement The quality assurance score. This score measures a defined set of maintenance indicators tailored to the particular type of aircraft serviced. Quality Assurance (QA) personnel record deviations in these target areas as a percentage of total observations made. The specific types of deviations are noted. The score is the percentage of positive observations with a high score being desirable. Secondary scores could be developed for each type of deviation if desired. The overall index. Any combination of the indicators previously mentioned, along with others as desired, can be rolled up into an overall index for the maintenance facility as illustrated at Figure 6.1D. Scores on emergency procedure quizzes Percentage of emergency equipment on hand and fully operational Scores on emergency response drills indicating speed, correct procedures, and other effectiveness indicators TOTAL OBSERVATIONS: 27 SAFE OBSERVATIONS: 21 The protective clothing and equipment safety index is 78 (21 divided by 27 = 78%). In this index high scores are desirable FAA System Safety Handbook, Appendix F December 30, 2000 F-56 Figure 6.1D Example Overall Measurement Once the data has been collected and analyzed, the results need to be provided to the unit. With this information the unit will be able to concentrate their efforts on those areas where improvement would produce the greatest gain. Summary. It is not difficult to set up useful and effective measures of operational risk, particularly once the key risks have been identified during a risk assessment. Additionally, the workload associated with such indicators can be minimized by using data already collected and by collecting the data as an integrated routine aspect of operational processes. Tool control safety index: 93.3 Protective clothing and equipment safety index: 78.0 Emergency procedures index: 88.4 Quality Assurance Score: 97.9 TOTAL: 357.6 OR AVERAGE: 89.4 This index is the overall safety index for the maintenance facility. The goal is to push toward 100% or a maximum score of 400. This index would be used in our accountability procedures to measure performance and establish the basis for rewards or corrective action.
作者: 帅哥    时间: 2008-12-21 21:13:32     标题: Order 8040.4

______________________________________________________________________________ Distribution: A-WXYZ-2; A-FOF-0 (Ltd) Initiated by: ASY300 Appendix G FAA ORDER 8040.4 8040.4 6/26/98 Page 2 Par 5 ORDER U.S. DEPARTMENT OF TRANSPORTATION FEDERAL AVIATION ADMINISTRATION 8040.4 6/26/98 SUBJ: SAFETY RISK MANAGEMENT 1. PURPOSE. This order establishes the safety risk management policy and prescribes procedures for implementing safety risk management as a decision making tool within the Federal Aviation Administration (FAA). This order establishes the Safety Risk Management Committee. 2. DISTRIBUTION. This order is distributed to the division level in the Washington headquarters, regions, and centers, with limited distribution to all field offices and facilities. 3. DEFINITIONS. Appendix 1, Definitions, contains definitions used in this order. 4. SCOPE. This order requires the application of a flexible but formalized safety risk management process for all high-consequence decisions, except in situations deemed by the Administrator to be an emergency. A high-consequence decision is one that either creates or could be reasonably estimated to result in a statistical increase or decrease, as determined by the program office, in personal injuries and/or loss of life and health, a change in property values, loss of or damage to property, costs or savings, or other economic impacts valued at $100,000,000 or more per annum. The objective of this policy is to formalize a common sense approach to risk management and safety risk analysis/assessment in FAA decisionmaking. This order is not intended to interfere with regulatory processes and activities. Each program office will interpret, establish, and execute the policy contained herein consistent with its role and responsibility. The Safety Risk Management Committee will consist of technical personnel with risk assessment expertise and be available for guidance across all FAA programs. 5. SAFETY RISK MANAGEMENT POLICY. The FAA shall use a formal, disciplined, and documented decisionmaking process to address safety risks in relation to high-consequence decisions impacting the complete product life cycle. The critical information resulting from a safety risk management process can thereby be effectively communicated in an objective and unbiased manner to decisionmakers, and from decisionmakers to the public. All decisionmaking authorities within the FAA shall maintain safety risk management expertise appropriate to their operations, and shall perform and document the safety risk management process prior to issuing the high-consequence decision. The choice of methodologies to support risk management efforts remains the responsibility of each program office. The decisionmaking authority shall determine the documentation format. The approach to safety risk management is composed of the following steps: a. Plan. A case-specific plan for risk analysis and risk assessment shall be predetermined in adequate detail for appropriate review and agreement by the decisionmaking authority prior to commitment of resources. The plan shall additionally describe criteria for acceptable risk. 6/26/98 8040.4 Par 6 Page 3 (and 4) b. Hazard Identification. The specific safety hazard or list of hazards to be addressed by the safety risk management plan shall be explicitly identified to prevent ambiguity in subsequent analysis and assessment. c. Analysis. Both elements of risk (hazard severity and likelihood of occurrence) shall be characterized. The inability to quantify and/or lack of historical data on a particular hazard does not exclude the hazard from this requirement. If the seriousness of a hazard can be expected to increase over the effective life of the decision, this should be noted. Additionally, both elements should be estimated for each hazard being analyzed, even if historical and/or quantitative data is not available. d. Assessment. The combined impact of the risk elements in paragraph 5c shall be compared to acceptability criteria and the results provided for decisionmaking. e. Decision. The risk management decision shall consider the risk assessment results conducted in accordance with paragraph 5d. Risk assessment results may be used to compare and contrast alternative options. 6. PRINCIPLES FOR SAFETY RISK ASSESSMENT AND RISK CHARACTERIZATION. In characterizing risk, one must comply with each of the following: a. General. Safety risk assessments, to the maximum extent feasible: (1) Are scientifically objective. (2) Are unbiased. (3) Include all relevant data available. (4) Employ default or conservative assumptions only if situation-specific information is not reasonably available. The basis of these assumptions must be clearly identified. (5) Distinguish clearly as to what risks would be affected by the decision and what risks would not. (6) Are reasonably detailed and accurate. (7) Relate to current risk or the risk resulting from not adopting the proposal being considered. (8) Allow for unknown and/or unquantifiable risks. b. Principles. The principles to be applied when preparing safety risk assessments are: (1) Each risk assessment should first analyze the two elements of risk: severity of the hazard and likelihood of occurrence. Risk assessment is then performed by comparing the combined effect of their characteristics to acceptable criteria as determined in the plan (paragraph 5a). (2) A risk assessment may be qualitative and/or quantitative. To the maximum extent practicable, these risk assessments will be quantitative. 8040.4 6/26/98 Page 2 Par 5 (3) The selection of a risk assessment methodology should be flexible. (4) Basic assumptions should be documented or, if only bounds can be estimated reliably, the range encompassed should be described. (5) Significant risk assessment assumptions, inferences, or models should: (a) Describe any model used in the risk assessment and make explicit the assumptions incorporated in the model. (b) Identify any policy or value judgments. (c) Explain the basis for choices. (d) Indicate the extent that the model and the assumptions incorporated have been validated by or conflict with empirical data. (6) All safety risk assessments should include or summarize the information of paragraphs 6a (3) and 6a(4) as well as 6b (4) and 6b (5). This record should be maintained by the organization performing the assessment in accordance with Order 1350.15B, Records Organization, Transfer, and Destruction Standards. 7. ANALYSIS OF RISK REDUCTION BENEFITS AND COSTS. For each high-consequence decision, the following tasks shall be performed: a. Compare the results of a risk assessment for each risk-reduction alternative considered, including no action, in order to rank each risk assessment for decisionmaking purposes. The assessment will consider future conditions, e.g., increased traffic volume. b. Assess the costs and the safety risk reduction or other benefits associated with implementation of, and compliance with, an alternative under final consideration. 8. SUBSTITUTION RISKS. Safety risk assessments of proposed changes to high-consequence decisions shall include a statement of substitution risks. Substitution risks shall be included in the risk assessment documentation. 9. SAFETY RISK MANAGEMENT COMMITTEE. This order establishes the Safety Risk Management Committee. Appendix 2, Safety Risk Management Committee, contains the committee charter. The committee shall provide a service to any FAA organization for safety risk management planning, as outlined in appendix 2, when requested by the responsible program office. It also meets periodically (e.g., two to four times per year) to exchange risk management ideas and information. The committee will provide advice and counsel to the Office of System Safety, the Assistant Administrator for System Safety, and other management officials when requested. Jane F. Garvey Administrator 8040.4 Appendix 1 Page 1 and 2 APPENDIX 1. DEFINITIONS. 1. COSTS. Direct and indirect costs to the United States Government, State, local, and tribal governments, international trade impacts, and the private sector. 2. EMERGENCY. A circumstance that requires immediate action to be taken. 3. HAZARD. Condition, event, or circumstance that could lead to or contribute to an unplanned or undesired event. 4. HAZARD IDENTIFICATION. Identification of a substance, activity, or condition as potentially posing a risk to human health or safety. 5. HIGH-CONSEQUENCE DECISION. Decision that either creates or could be reasonably estimated to result in a statistical increase or decrease in personal injuries and/or loss of life and health, a change in property values, loss of or damage to property, costs or savings, or other economic impacts valued at $100,000,000 or more per annum. 6. PRODUCT LIFE CYCLE. The entire sequence from precertification activities through those associated with removal from service. 7. MISHAP. Unplanned event, or series of events, that results in death, injury, occupational illness, or damage to or loss of equipment or property. 8. RISK. Expression of the impact of an undesired event in terms of event severity and event likelihood. 9. RISK ASSESSMENT. a. Process of identifying hazards and quantifying or qualifying the degree of risk they pose for exposed individuals, populations, or resources; and/or b. Document containing the explanation of how the assessment process is applied to individual activities or conditions. 10. RISK CHARACTERIZATION. Identification or evaluation of the two components of risk, i.e., undesired event severity and likelihood of occurrence. 11. RISK MANAGEMENT. Management activity ensuring that risk is identified and eliminated or controlled within established program risk parameters. 12. SAFETY RISK. Expression of the probability and impact of an undesired event in terms of hazard severity and hazard likelihood. 13. SUBSTITUTION RISK. Additional risk to human health or safety, to include property risk, from an action designed to reduce some other risk(s). 6/26/98 8040.4 Appendix 2 Page 1 APPENDIX 2. SAFETY RISK MANAGEMENT COMMITTEE 1. PURPOSE. The Safety Risk Management Committee provides a communication and support team to supplement the overall risk analysis capability and efficiency of key FAA organizations. 2. RESPONSIBILITIES. The Committee supports FAA safety risk management activities. It provides advice and guidance, upon request from responsible program offices, to help them fulfill their authority and responsibility to incorporate safety risk management as a decisionmaking tool. It serves as an internal vehicle for risk management process communication, for coordination of risk analysis methods, and for use of common practices where appropriate. This includes, but is not limited to: a. Continuing the internal exchange of risk management information among key FAA organizations. b. Fostering the exchange of risk management ideas and information with other government agencies and industry to avoid duplication of effort. c. Providing risk analysis/management advice and guidance. d. Identifying and recommending needed enhancements to FAA risk analysis/management capabilities and/or efficiencies upon request. e. Maintaining a risk management resources directory that includes: (1) FAA risk methodologies productively employed, (2) Specific internal risk analysis/management expertise by methodology or tool and organizational contact point(s), and (3) A central contact point for resource identification assistance. f. Encouraging the establishment of an international directory of aviation safety information resources via the Internet. g. Assisting in the identification of suitable risk analysis tools and initiate appropriate training in the use of these tools. 3. COMPOSITION. The Safety Risk Management Committee is composed of safety and risk management professionals representing all Associate/Assistant Administrators and the Offices of the Chief Counsel, Civil Rights, Government and Industry Affairs, and Public Affairs. The Assistant Administrator for System Safety will designate an individual to chair the committee. The chairperson is responsible for providing written notice of all meetings to committee members and, in coordination with the executive secretary, keeping minutes of the meetings. 8040.4 6/26/98 Appendix 2 Page 2 4. ASSIGNMENTS. The Safety Risk Management Committee may form ad hoc working groups to address specific issues when requested by the responsible program office. Composition of those working groups will consist of member representatives from across the FAA. Working groups will be disbanded upon completion of their task. The Office of System Safety shall provide the position of executive secretary of the committee. The Office of System Safety shall also furnish other administrative support. 5. FUNDING. Resources for support staff and working group activities will be provided as determined by the Assistant Administrator for System Safety. Unless otherwise stated, each member is responsible for his/her own costs associated with committee membership.
作者: 帅哥    时间: 2008-12-21 21:14:06     标题: Standard Practice for System Safety

APPENDIX H MIL-STD-882D MIL-STD-882D 2 NOT MEASUREMENT SENSITIVE MIL-STD-882D 10 February 2000 SUPERSEDING MIL-STD-882C 19 January 1993 DEPARTMENT OF DEFENSE STANDARD PRACTICE FOR SYSTEM SAFETY AMSC N/A AREA SAFT MIL-STD-882D ii FOREWORD 1. This standard is approved for use by all Departments and Agencies within the Department of Defense (DoD). 2. The DoD is committed to protecting: private and public personnel from accidental death, injury, or occupational illness; weapon systems, equipment, material, and facilities from accidental destruction or damage; and public property while executing its mission of national defense. Within mission requirements, the DoD will also ensure that the quality of the environment is protected to the maximum extent practical. The DoD has implemented environmental, safety, and health efforts to meet these objectives. Integral to these efforts is the use of a system safety approach to manage the risk of mishaps associated with DoD operations. A key objective of the DoD system safety approach is to include mishap risk management consistent with mission requirements, in technology development by design for DoD systems, subsystems, equipment, facilities, and their interfaces and operation. The DoD goal is zero mishaps. 3. This standard practice addresses an approach (a standard practice normally identified as system safety) useful in the management of environmental, safety, and health mishap risks encountered in the development, test, production, use, and disposal of DoD systems, subsystems, equipment, and facilities. The approach described herein conforms to the acquisition procedures in DoD Regulation 5000.2-R and provides a consistent means of evaluating identified mishap risks. Mishap risk must be identified, evaluated, and mitigated to a level acceptable (as defined by the system user or customer) to the appropriate authority, and compliant with federal laws and regulations, Executive Orders, treaties, and agreements. Program trade studies associated with mitigating mishap risk must consider total life cycle cost in any decision. Residual mishap risk associated with an individual system must be reported to and accepted by the appropriate authority as defined in DoD Regulation 5000.2-R. When MIL-STD-882 is required in a solicitation or contract and no specific references are included, then only those requirements presented in section 4 are applicable. 4. This revision applies the tenets of acquisition reform to system safety in Government procurement. A joint Government/Industrial process team oversaw this revision. The Government Electronic and Information Technology Association (GEIA), G-48 committee on system safety represented industry on the process action team. System safety information (e.g., system safety tasks, commonly used approaches, etc.) associated with previous versions of this standard are in the Defense Acquisition Deskbook (see 6.8). This standard practice is no longer the source for any safety-related data item descriptions (DIDs). 5. Address beneficial comments (recommendations, additions, and deletions) and any pertinent information that may be of use in improving this document to: HQ Air Force Materiel Command (SES), 4375 Chidlaw Road, Wright-Patterson AFB, OH 45433-5006. Use the Standardization Document Improvement Proposal (DD Form 1426) appearing at the end of this document or by letter or electronic mail. MIL-STD-882D iii CONTENTS PARAGRAPH PAGE FOREWORD..................................................................................................................ii 1. SCOPE............................................................................................................................1 1.1 Scope...................................................................................................................1 2. APPLICABLE DOCUMENTS........................................................................................1 3. DEFINITIONS................................................................................................................1 3.1 Acronyms used in this standard ...........................................................................1 3.2 Definitions...........................................................................................................1 3.2.1 Acquisition program ............................................................................................1 3.2.2 Developer ............................................................................................................1 3.2.3 Hazard .................................................................................................................1 3.2.4 Hazardous material ..............................................................................................2 3.2.5 Life cycle.............................................................................................................2 3.2.6 Mishap.................................................................................................................2 3.2.7 Mishap risk..........................................................................................................2 3.2.8 Program manager.................................................................................................2 3.2.9 Residual mishap risk............................................................................................2 3.2.10 Safety ..................................................................................................................2 3.2.11 Subsystem ...........................................................................................................2 3.2.12 System.................................................................................................................2 3.2.13 System safety.......................................................................................................2 3.2.14 System safety engineering....................................................................................2 4. GENERAL REQUIREMENTS.......................................................................................3 4.1 Documentation of the system safety approach......................................................3 4.2 Identification of hazards.......................................................................................3 4.3 Assessment of mishap risk...................................................................................3 4.4 Identification of mishap risk mitigation measures ................................................3 4.5 Reduction of mishap risk to an acceptable level ...................................................4 4.6 Verification of mishap risk reduction ...................................................................4 4.7 Review of hazards and acceptance of residual mishap risk by the appropriate authority ..............................................................................................................4 4.8 Tracking of hazards and residual mishap risk.......................................................4 5. DETAILED REQUIREMENTS......................................................................................4 6. NOTES ...........................................................................................................................5 6.1 Intended use.........................................................................................................5 6.2 Data requirements................................................................................................5 6.3 Subject term (key words) listing...........................................................................6 MIL-STD-882D iv 6.4 Definitions used in this standard ..........................................................................6 6.5 International standardization agreements..............................................................6 6.6 Explosive hazard classification and characteristic data.........................................6 6.7 Use of system safety data in certification and other specialized safety approvals..6 6.8 DoD acquisition practices ....................................................................................7 6.9 Identification of changes......................................................................................7 APPENDIXES A Guidance for implementation of system safety efforts..........................................8 CONCLUDING MATERIAL....................................................................................... 26 TABLES TABLE PAGE A-I. Suggested mishap severity categories................................................................. 18 A-II. Suggested mishap probability levels................................................................... 19 A-III. Example mishap risk assessment values............................................................. 20 A-IV. Example mishap risk categories and mishap risk acceptance levels .................... 20 MIL-STD-882D 1 1. SCOPE 1.1 Scope. This document outlines a standard practice for conducting system safety. The system safety practice as defined herein conforms to the acquisition procedures in DoD Regulation 5000.2-R and provides a consistent means of evaluating identified risks. Mishap risk must be identified, evaluated, and mitigated to a level acceptable (as defined by the system user or customer) to the appropriate authority and compliant with federal (and state where applicable) laws and regulations, Executive Orders, treaties, and agreements. Program trade studies associated with mitigating mishap risk must consider total life cycle cost in any decision. When requiring MIL-STD-882 in a solicitation or contract and no specific paragraphs of this standard are identified, then apply only those requirements presented in section 4. 2. APPLICABLE DOCUMENTS Sections 3, 4, and 5 of this standard contain no applicable documents. This section does not include documents cited in other sections of this standard or recommended for additional information or as examples. 3. DEFINITIONS 3.1 Acronyms used in this standard. The acronyms used in this standard are defined as follows: a. AMSDL Acquisition Management System & Data Requirement List b. ANSI American National Standard Institute c. DID Data Item Description d. DoD Department of Defense e. ESH Environmental, Safety, and Health f. GEIA Government Electronic & Information Technology Association g. MAIS Major Automated Information System h. MDAP Major Defense Acquisition Program i. USAF United States Air Force 3.2 Definitions. Within this document, the following definitions apply (see 6.4): 3.2.1 Acquisition program. A directed, funded effort designed to provide a new, improved, or continuing system in response to a validated operational need. 3.2.2 Developer. The individual or organization assigned responsibility for a development effort. Developers can be either internal to the government or contractors. 3.2.3 Hazard. Any real or potential condition that can cause injury, illness, or death to personnel; damage to or loss of a system, equipment or property; or damage to the environment. MIL-STD-882D 2 3.2.4 Hazardous material. Any substance that, due to its chemical, physical, or biological nature, causes safety, public health, or environmental concerns that would require an elevated level of effort to manage. 3.2.5 Life cycle. All phases of the system's life including design, research, development, test and evaluation, production, deployment (inventory), operations and support, and disposal. 3.2.6 Mishap. An unplanned event or series of events resulting in death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment. 3.2.7 Mishap risk. An expression of the impact and possibility of a mishap in terms of potential mishap severity and probability of occurrence. 3.2.8 Program Manager (PM). A government official who is responsible for managing an acquisition program. Also, a general term of reference to those organizations directed by individual managers, exercising authority over the planning, direction, and control of tasks and associated functions essential for support of designated systems. This term will normally be used in lieu of any other titles, e.g.; system support manager, weapon program manager, system manager, and project manager. 3.2.9 Residual mishap risk. The remaining mishap risk that exists after all mitigation techniques have been implemented or exhausted, in accordance with the system safety design order of precedence (see 4.4). 3.2.10 Safety. Freedom from those conditions that can cause death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment. 3.2.11 Subsystem. A grouping of items satisfying a logical group of functions within a particular system. 3.2.12 System. An integrated composite of people, products, and processes that provide a capability to satisfy a stated need or objective. 3.2.13 System safety. The application of engineering and management principles, criteria, and techniques to achieve acceptable mishap risk, within the constraints of operational effectiveness and suitability, time, and cost, throughout all phases of the system life cycle. 3.2.14 System safety engineering. An engineering discipline that employs specialized professional knowledge and skills in applying scientific and engineering principles, criteria, and techniques to identify and eliminate hazards, in order to reduce the associated mishap risk. MIL-STD-882D 3 4. GENERAL REQUIREMENTS This section defines the system safety requirements to perform throughout the life cycle for any system, new development, upgrade, modification, resolution of deficiencies, or technology development. When properly applied, these requirements should ensure the identification and understanding of all known hazards and their associated risks; and mishap risk eliminated or reduced to acceptable levels. The objective of system safety is to achieve acceptable mishap risk through a systematic approach of hazard analysis, risk assessment, and risk management. This document delineates the minimum mandatory requirements for an acceptable system safety program for any DoD system. When MIL-STD-882 is required in a solicitation or contract, but no specific references are included, then only the requirements in this section are applicable. System safety requirements consist of the following: 4.1 Documentation of the system safety approach. Document the developer's and program manager's approved system safety engineering approach. This documentation shall: a. Describe the program’s implementation using the requirements herein. Include identification of each hazard analysis and mishap risk assessment process used. b. Include information on system safety integration into the overall program structure. c. Define how hazards and residual mishap risk are communicated to and accepted by the appropriate risk acceptance authority (see 4.7) and how hazards and residual mishap risk will be tracked (see 4.8). 4.2 Identification of hazards. Identify hazards through a systematic hazard analysis process encompassing detailed analysis of system hardware and software, the environment (in which the system will exist), and the intended use or application. Consider and use historical hazard and mishap data, including lessons learned from other systems. Identification of hazards is a responsibility of all program members. During hazard identification, consider hazards that could occur over the system life cycle. 4.3 Assessment of mishap risk. Assess the severity and probability of the mishap risk associated with each identified hazard, i.e., determine the potential negative impact of the hazard on personnel, facilities, equipment, operations, the public, and the environment, as well as on the system itself. The tables in Appendix A are to be used unless otherwise specified. 4.4 Identification of mishap risk mitigation measures. Identify potential mishap risk mitigation alternatives and the expected effectiveness of each alternative or method. Mishap risk mitigation is an iterative process that culminates when the residual mishap risk has been reduced to a level acceptable to the appropriate authority. The system safety design order of precedence for mitigating identified hazards is: a. Eliminate hazards through design selection. If unable to eliminate an identified hazard, reduce the associated mishap risk to an acceptable level through design selection. MIL-STD-882D 4 b. Incorporate safety devices. If unable to eliminate the hazard through design selection, reduce the mishap risk to an acceptable level using protective safety features or devices. c. Provide warning devices. If safety devices do not adequately lower the mishap risk of the hazard, include a detection and warning system to alert personnel to the particular hazard. d. Develop procedures and training. Where it is impractical to eliminate hazards through design selection or to reduce the associated risk to an acceptable level with safety and warning devices, incorporate special procedures and training. Procedures may include the use of personal protective equipment. For hazards assigned Catastrophic or Critical mishap severity categories, avoid using warning, caution, or other written advisory as the only risk reduction method. 4.5 Reduction of mishap risk to an acceptable level. Reduce the mishap risk through a mitigation approach mutually agreed to by both the developer and the program manager. Communicate residual mishap risk and hazards to the associated test effort for verification. 4.6 Verification of mishap risk reduction. Verify the mishap risk reduction and mitigation through appropriate analysis, testing, or inspection. Document the determined residual mishap risk. Report all new hazards identified during testing to the program manager and the developer. 4.7 Review of hazards and acceptance of residual mishap risk by the appropriate authority. Notify the program manager of identified hazards and residual mishap risk. Unless otherwise specified, the suggested tables A-I through A-III of the appendix will be used to rank residual risk. The program manager shall ensure that remaining hazards and residual mishap risk are reviewed and accepted by the appropriate risk acceptance authority (ref. table A-IV). The appropriate risk acceptance authority will include the system user in the mishap risk review. The appropriate risk acceptance authority shall formally acknowledge and document acceptance of hazards and residual mishap risk. 4.8 Tracking of hazards, their closures, and residual mishap risk. Track hazards, their closure actions, and the residual mishap risk. Maintain a tracking system that includes hazards, their closure actions, and residual mishap risk throughout the system life cycle. The program manager shall keep the system user advised of the hazards and residual mishap risk. 5. DETAILED REQUIREMENTS Program managers shall identify in the solicitation and system specification any specific system safety engineering requirements including risk assessment and acceptance, unique classifications and certifications (see 6.6 and 6.7), or any mishap reduction needs unique to their program. Additional information in developing program specific requirements is located in Appendix A. MIL-STD-882D 5 6. NOTES (This section contains information of a general or explanatory nature that may be helpful, but is not mandatory.) 6.1 Intended use. This standard establishes a common basis for expectations of a properly executed system safety effort. 6.2 Data requirements. Hazard analysis data may be obtained from contracted sources by citing DI-MISC-80508, Technical Report - Study/Services. When it is necessary to obtain data, list the applicable Data Item Descriptions (DIDs) on the Contract Data Requirements List (DD Form 1423), except where the DoD Federal Acquisition Regulation Supplement exempts the requirement for a DD Form 1423. The developer and the program manager are encouraged to negotiate access to internal development data when hard copies are not necessary. They are also encouraged to request that any type of safety plan required to be provided by the contractor, be submitted with the proposal. It is further requested that any of the below listed data items be condensed into the statement of work and the resulting data delivered in one general type scientific report. Current DIDs, that may be applicable to a system safety effort (check DoD 5010.12-L, Acquisition Management Systems and Data Requirements Control List (AMSDL) for the most current version before using), include: DID Number DID Title DI-MISC-80043 Ammunition Data Card DI-SAFT-80101 System Safety Hazard Analysis Report DI-SAFT-80102 Safety Assessment Report DI-SAFT-80103 Engineering Change Proposal System Safety Report DI-SAFT-80104 Waiver or Deviation System Safety Report DI-SAFT-80105 System Safety Program Progress Report DI-SAFT-80106 Occupational Health Hazard Assessment DI-SAFT-80184 Radiation Hazard Control Procedures DI-MISC-80508 Technical Report - Study Services DI SAFT-80931 Explosive Ordnance Disposal Data DI-SAFT-81065 Safety Studies Report DI-SAFT-81066 Safety Studies Plan DI-ADMN-81250 Conference Minutes DI-SAFT-81299 Explosive Hazard Classification Data DI-SAFT-81300 Mishap Risk Assessment Report DI-ILSS-81495 Failure Mode, Effects, Criticality Analysis Report MIL-STD-882D 6 6.3 Subject term (key word) listing. Environmental Hazard Mishap Mishap probability levels Mishap risk Mishap severity categories Occupational Health Residual mishap risk System safety engineering 6.4 Definitions used in this standard. The definitions at 3.2 may be different from those used in other specialty areas. One must carefully check the specific definition of a term in question for its area of origination before applying the approach described in this document. 6.5 International standardization agreements. Certain provisions of this standard are the subject of international standardization agreements (AIR STD 20/23B, Safety Design Requirements for Airborne Dispenser Weapons, and STANAG No. 3786, Safety Design Requirements for Airborne Dispenser Weapons). When proposing amendment, revision, or cancellation of this standard that might modify the international agreement concerned, the preparing activity will take appropriate action through international standardization channels, including departmental standardization offices, to change the agreement or make other appropriate accommodations. 6.6 Explosive hazard classification and characteristic data. Any new or modified item of munitions or of an explosive nature that will be transported to or stored at a DoD installation or facility must first obtain an interim or final explosive hazard classification. The system safety effort should provide the data necessary for the program manager to obtain the necessary classification(s). These data should include identification of safety hazards involved in handling, shipping, and storage related to production, use, and disposal of the item. 6.7 Use of system safety data in certification and other specialized safety approvals. Hazard analyses are often required for many related certifications and specialized reviews. Examples of activities requiring data generated during a system safety effort include: a. Federal Aviation Agency airworthiness certification of designs and modifications b. DoD airworthiness determination c. Nuclear and non-nuclear munitions certification d. Flight readiness reviews e. Flight test safety review board reviews f. Nuclear Regulatory Commission licensing g. Department of Energy certification Special safety-related approval authorities include USAF Radioisotope Committee, Weapon System Explosive Safety Review Board (Navy), Non-Nuclear Weapons and Explosives Safety Board (NNWESB), Army Fuze Safety Review Board, Triservice Laser Safety Review MIL-STD-882D 7 Board, and the DoD Explosive Safety Board. Acquisition agencies should ensure that appropriate service safety agency approvals are obtained prior to use of new or modified weapons systems in an operational or test environment. 6.8 DoD acquisition practices. Information on DoD acquisition practices is presented in the Defense Acquisition Deskbook available from the Deskbook Joint Program Office, Wright- Patterson Air Force Base, Ohio. Nothing in the referenced information is considered additive to the requirements provided in this standard. 6.9 Identification of changes. Due to the extent of the changes, marginal notations are not used in this revision to identify changes with respect to the previous issue. MIL-STD-882D APPENDIX A 8 GUIDANCE FOR IMPLEMENTATION OF A SYSTEM SAFETY EFFORT A.1 SCOPE A.1.1 Scope. This appendix provides rationale and guidance to fit the needs of most system safety efforts. It includes further explanation of the effort and activities available to meet the requirements described in section 4 of this standard. This appendix is not a mandatory part of this standard and is not to be included in solicitations by reference. However, program managers may extract portions of this appendix for inclusion in requirement documents and solicitations. A.2 APPLICABLE DOCUMENTS A.2.1 General. The documents listed in this section are referenced in sections A.3, A.4, and A.5. This section does not include documents cited in other sections of this appendix or recommended for additional information or as examples. A.2.2 Government documents. A.2.2.1 Specifications, standards, and handbooks. This section is not applicable to this appendix. A.2.2.2 Other Government documents, drawings, and publications. The following other Government document forms a part of this document to the extent specified herein. Unless otherwise specified, the issue is that cited in the solicitation. DoD 5000.2-R Mandatory Procedures for Major Defense Acquisition Programs (MDAPs) and Major Automated Information System (MAIS) Acquisition Programs (Copies of DoD 5000.2-R are available from the Washington Headquarters Services, Directives and Records Branch (Directives Section), Washington, DC or from the DoD Acquisition Deskbook). A.2.3 Non-Government publications. This section is not applicable to this appendix. A.2.4 Order of precedence. Since this appendix is not mandatory, in event of a conflict between the text of this appendix and the reference cited herein, the text of the reference takes precedence. Nothing in this appendix supersedes applicable laws and regulations unless a specific exemption has been obtained. MIL-STD-882D APPENDIX A 9 A.3 DEFINITIONS A.3.1 Acronyms used in this appendix. No additional acronyms are used in this appendix. A.3.2 Definitions. Additional definitions that apply to this appendix: A.3.2.1 Development agreement. The formal documentation of the agreed-upon tasks that the developer will execute for the program manager. For a commercial developer, this agreement usually is in the form of a written contract. A.3.2.2 Fail-safe. A design feature that ensures the system remains safe, or in the event of a failure, causes the system to revert to a state that will not cause a mishap. A.3.2.3 Health hazard assessment. The application of biomedical knowledge and principles to identify and eliminate or control health hazards associated with systems in direct support of the life-cycle management of materiel items. A.3.2.4 Mishap probability. The aggregate probability of occurrence of the individual events/hazards that might create a specific mishap. A.3.2.5 Mishap probability levels. An arbitrary categorization that provides a qualitative measure of the most reasonable likelihood of occurrence of a mishap resulting from personnel error, environmental conditions, design inadequacies, procedural deficiencies, or system, subsystem, or component failure or malfunction. A.3.2.6 Mishap risk assessment. The process of characterizing hazards within risk areas and critical technical processes, analyzing them for their potential mishap severity and probabilities of occurrence, and prioritizing them for risk mitigation actions. A.3.2.7 Mishap risk categories. An arbitrary categorization of mishap risk assessment values often used to generate specific action such as mandatory reporting of certain hazards to management for action, or formal acceptance of the associated mishap risk. A.3.2.8 Mishap severity. An assessment of the consequences of the most reasonable credible mishap that could be caused by a specific hazard. A.3.2.9 Mishap severity category. An arbitrary categorization that provides a qualitative measure of the most reasonable credible mishap resulting from personnel error, environmental conditions, design inadequacies, procedural deficiencies, or system, subsystem, or component failure or malfunction. A.3.2.10 Safety critical. A term applied to any condition, event, operation, process, or item whose proper recognition, control, performance, or tolerance is essential to safe system operation and support (e.g., safety critical function, safety critical path, or safety critical component). MIL-STD-882D APPENDIX A 10 A.3.2.11 System safety management. All plans and actions taken to identify, assess, mitigate, and continuously track, control, and document environmental, safety, and health mishap risks encountered in the development, test, acquisition, use, and disposal of DoD weapon systems, subsystems, equipment, and facilities. A.4 GENERAL REQUIREMENTS A.4.1 General. System safety applies engineering and management principles, criteria, and techniques to achieve acceptable mishap risk, within the constraints of operational effectiveness, time, and cost, throughout all phases of the system life cycle. It draws upon professional knowledge and specialized skills in the mathematical, physical, and scientific disciplines, together with the principles and methods of engineering design and analysis, to specify and evaluate the environmental, safety, and health mishap risk associated with a system. Experience indicates that the degree of safety achieved in a system is directly dependent upon the emphasis given. The program manager and the developer must apply this emphasis during all phases of the system's life cycle. A safe design is a prerequisite for safe operations, with the goal being to produce an inherently safe product that will have the minimum safety-imposed operational restrictions. A.4.1.1 System safety in environmental and health hazard management. DoD 5000.2-R has directed the integration of environmental, safety, and health hazard management into the systems engineering process. While environmental and health hazard management are normally associated with the application of statutory direction and requirements, the management of mishap risk associated with actual environmental and health hazards is directly addressed by the system safety approach. Therefore, environmental and health hazards can be analyzed and managed with the same tools as any other hazard, whether they affect equipment, the environment, or personnel. A.4.2 Purpose (see 1.1). All DoD program managers shall establish and execute programs that manage the probability and severity of all hazards for their systems (DoD 5000.2-R). Provision for system safety requirements and effort as defined by this standard should be included in all applicable contracts negotiated by DoD. These contracts include those negotiated within each DoD agency, by one DoD agency for another, and by DoD for other Government agencies. In addition, each DoD in-house program will address system safety. A.4.2.1 Solicitations and contracts. Apply the requirements of section 4 to acquisitions. Incorporate MIL-STD-882 in the list of contractual compliance documents, and include the potential of a developer to execute section 4 requirements as source selection evaluation criteria. Developers are encouraged to submit with their proposal a preliminary plan that describes the system safety effort required for the requested program. When directed by the program manager, attach this preliminary plan to the contract or reference it within the statement of work; so it becomes the basis for a contractual system safety program. A.4.3 System safety planning. Before formally documenting the system safety approach, the program manager, in concert with systems engineering and associated system safety MIL-STD-882D APPENDIX A 11 professionals, must determine what system safety effort is necessary to meet program and regulatory requirements. This effort will be built around the requirements set forth in section 4 and includes developing a planned approach for safety task accomplishment, providing qualified people to accomplish the tasks, establishing the authority for implementing the safety tasks through all levels of management, and allocating appropriate resources to ensure that the safety tasks are completed. A.4.3.1 System safety planning subtasks. System safety planning subtasks should: a. Establish specific safety performance requirements (see A.4.3.2) based on overall program requirements and system user inputs. b. Establish a system safety organization or function and the required lines of communication with associated organizations (government and contractor). Establish interfaces between system safety and other functional elements of the program, as well as with other safety and engineering disciplines (such as nuclear, range, explosive, chemical, and biological). Designate the organizational unit responsible for executing each safety task. Establish the authority for resolution of identified hazards. c. Establish system safety milestones and relate these to major program milestones, program element responsibility, and required inputs and outputs. d. Establish an incident alerting/notification, investigation, and reporting process, to include notification of the program manager. e. Establish an acceptable level of mishap risk, mishap probability and severity thresholds, and documentation requirements (including but not limited to hazards and residual mishap risk). f. Establish an approach and methodology for reporting to the program manager the following minimum information: (1) Safety critical characteristics and features. (2) Operating, maintenance, and overhaul safety requirements. (3) Measures used to eliminate or mitigate hazards. (4) Acquisition management of hazardous materials. g. Establish the method for the formal acceptance and documenting of residual mishap risks and the associated hazards. h. Establish the method for communicating hazards, the associated risks, and residual mishap risk to the system user. MIL-STD-882D APPENDIX A 12 i. Specify requirements for other specialized safety approvals (e.g., nuclear, range, explosive, chemical, biological, electromagnetic radiation, and lasers) as necessary (reference 6.6 and 6.7). A.4.3.2 Safety performance requirements. These are the general safety requirements needed to meet the core program objectives. The more closely these requirements relate to a given program, the more easily the designers can incorporate them into the system. In the appropriate system specifications, incorporate the safety performance requirements that are applicable, and the specific risk levels considered acceptable for the system. Acceptable risk levels can be defined in terms of: a hazard category developed through a mishap risk assessment matrix; an overall system mishap rate; demonstration of controls required to preclude unacceptable conditions; satisfaction of specified standards and regulatory requirements; or other suitable mishap risk assessment procedures. Listed below are examples of safety performance statements. a. Quantitative requirements. Quantitative requirements are usually expressed as a failure or mishap rate, such as "The catastrophic system mishap rate shall not exceed x.xx X 10 -y per operational hour." b. Mishap risk requirements. Mishap risk requirements could be expressed as "No hazards assigned a Catastrophic mishap severity are acceptable." Mishap risk requirements could also be expressed as a level defined by a mishap risk assessment (see A.4.4.3.2.3), such as "No Category 3 or higher mishap risks are acceptable." c. Standardization requirements. Standardization requirements are expressed relative to a known standard that is relevant to the system being developed. Examples include: "The system will comply with the laws of the State of XXXXX and be operable on the highways of the State of XXXXX" or "The system will be designed to meet ANSI Std XXX as a minimum." A.4.3.3 Safety design requirements. The program manager, in concert with the chief engineer and utilizing systems engineering and associated system safety professionals, should establish specific safety design requirements for the overall system. The objective of safety design requirements is to achieve acceptable mishap risk through a systematic application of design guidance from standards, specifications, regulations, design handbooks, safety design checklists, and other sources. Review these for safety design parameters and acceptance criteria applicable to the system. Safety design requirements derived from the selected parameters, as well as any associated acceptance criteria, are included in the system specification. Expand these requirements and criteria for inclusion in the associated follow-on or lower level specifications. See general safety system design requirements below. a. Hazardous material use is minimized, eliminated, or associated mishap risks are reduced through design, including material selection or substitution. When using potentially hazardous materials, select those materials that pose the least risk throughout the life cycle of the system. MIL-STD-882D APPENDIX A 13 b. Hazardous substances, components, and operations are isolated from other activities, areas, personnel, and incompatible materials. c. Equipment is located so that access during operations, servicing, repair, or adjustment minimizes personnel exposure to hazards (e.g., hazardous substances, high voltage, electromagnetic radiation, and cutting and puncturing surfaces). d. Protect power sources, controls, and critical components of redundant subsystems by physical separation or shielding, or by other acceptable methods. f. Consider safety devices that will minimize mishap risk (e.g., interlocks, redundancy, fail safe design, system protection, fire suppression, and protective measures such as clothing, equipment, devices, and procedures) for hazards that cannot be eliminated. Make provisions for periodic functional checks of safety devices when applicable. g. System disposal (including explosive ordnance disposal) and demilitarization are considered in the design. h. Implement warning signals to minimize the probability of incorrect personnel reaction to those signals, and standardize within like types of systems. i. Provide warning and cautionary notes in assembly, operation, and maintenance instructions; and provide distinctive markings on hazardous components, equipment, and facilities to ensure personnel and equipment protection when no alternate design approach can eliminate a hazard. Use standard warning and cautionary notations where multiple applications occur. Standardize notations in accordance with commonly accepted commercial practice or, if none exists, normal military procedures. Do not use warning, caution, or other written advisory as the only risk reduction method for hazards assigned to Catastrophic or Critical mishap severity categories. j. Safety critical tasks may require personnel proficiency; if so, the developer should propose a proficiency certification process to be used. k. Severity of injury or damage to equipment or the environment as a result of a mishap is minimized. l. Inadequate or overly restrictive requirements regarding safety are not included in the system specification. m. Acceptable risk is achieved in implementing new technology, materials, or designs in an item’s production, test, and operation. Changes to design, configuration, production, or mission requirements (including any resulting system modifications and upgrades, retrofits, insertions of new technologies or materials, or use of new production or test techniques) are accomplished in a manner that maintains an acceptable level of mishap risk. Changes to the environment in which the system operates are analyzed to identify and mitigate any resulting hazards or changes in mishap risks. MIL-STD-882D APPENDIX A 14 A.4.3.3.1 Some program managers include the following conditions in their solicitation, system specification, or contract as requirements for the system design. These condition statements are used optionally as supplemental requirements based on specific program needs. A.4.3.3.1.1 Unacceptable conditions. The following safety critical conditions are considered unacceptable for development efforts. Positive action and verified implementation is required to reduce the mishap risk associated with these situations to a level acceptable to the program manager. a. Single component failure, common mode failure, human error, or a design feature that could cause a mishap of Catastrophic or Critical mishap severity catagories. b. Dual independent component failures, dual independent human errors, or a combination of a component failure and a human error involving safety critical command and control functions, which could cause a mishap of Catastrophic or Critical mishap severity catagories. c. Generation of hazardous radiation or energy, when no provisions have been made to protect personnel or sensitive subsystems from damage or adverse effects. d. Packaging or handling procedures and characteristics that could cause a mishap for which no controls have been provided to protect personnel or sensitive equipment. e. Hazard categories that are specified as unacceptable in the development agreement. A.4.3.3.1.2 Acceptable conditions. The following approaches are considered acceptable for correcting unacceptable conditions and will require no further analysis once mitigating actions are implemented and verified. a. For non-safety critical command and control functions: a system design that requires two or more independent human errors, or that requires two or more independent failures, or a combination of independent failure and human error. b. For safety critical command and control functions: a system design that requires at least three independent failures, or three independent human errors, or a combination of three independent failures and human errors. c. System designs that positively prevent errors in assembly, installation, or connections that could result in a mishap. d. System designs that positively prevent damage propagation from one component to another or prevent sufficient energy propagation to cause a mishap. e. System design limitations on operation, interaction, or sequencing that preclude occurrence of a mishap. MIL-STD-882D APPENDIX A 15 f. System designs that provide an approved safety factor, or a fixed design allowance that limits, to an acceptable level, possibilities of structural failure or release of energy sufficient to cause a mishap. g. System designs that control energy build-up that could potentially cause a mishap (e.g., fuses, relief valves, or electrical explosion proofing). h. System designs where component failure can be temporarily tolerated because of residual strength or alternate operating paths, so that operations can continue with a reduced but acceptable safety margin. i. System designs that positively alert the controlling personnel to a hazardous situation where the capability for operator reaction has been provided. j. System designs that limit or control the use of hazardous materials. A.4.3.4 Elements of an effective system safety effort. Elements of an effective system safety effort include: a. Management is always aware of the mishap risks associated with the system, and formally documents this awareness. Hazards associated with the system are identified, assessed, tracked, monitored, and the associated risks are either eliminated or controlled to an acceptable level throughout the life cycle. Identify and archive those actions taken to eliminate or reduce mishap risk for tracking and lessons learned purposes. b. Historical hazard and mishap data, including lessons learned from other systems, are considered and used. c. Environmental protection, safety, and occupational health, consistent with mission requirements, are designed into the system in a timely, cost-effective manner. Inclusion of the appropriate safety features is accomplished during the applicable phases of the system life cycle. d. Mishap risk resulting from harmful environmental conditions (e.g., temperature, pressure, noise, toxicity, acceleration, and vibration) and human error in system operation and support is minimized. e. System users are kept abreast of the safety of the system and included in the safety decision process. A.4.4 System safety engineering effort. As stated in section 4, a system safety engineering effort consists of eight main requirements. The following paragraphs provide further descriptions on what efforts are typically expected due to each of the system safety requirements listed in section 4. A.4.4.1 Documentation of the system safety approach. The documentation of the system safety approach should describe the planned tasks and activities of system safety management MIL-STD-882D APPENDIX A 16 and system engineering required to identify, evaluate, and eliminate or control hazards, or to reduce the residual mishap risk to a level acceptable throughout the system life cycle. The documentation should describe, as a minimum, the four elements of an effective system safety effort: a planned approach for task accomplishment, qualified people to accomplish tasks, the authority to implement tasks through all levels of management, and the appropriate commitment of resources (both manning and funding) to ensure that safety tasks are completed. Specifically, the documentation should: a. Describe the scope of the overall system program and the related system safety effort. Define system safety program milestones. Relate these to major program milestones, program element responsibility, and required inputs and outputs. b. Describe the safety tasks and activities of system safety management and engineering. Describe the interrelationships between system safety and other functional elements of the program. List the other program requirements and tasks applicable to system safety and reference where they are specified or described. Include the organizational relationships between other functional elements having responsibility for tasks with system safety impacts and the system safety management and engineering organization including the review and approval authority of those tasks. c. Describe specific analysis techniques and formats to be used in qualitative or quantitative assessments of hazards, their causes, and effects. d. Describe the process through which management decisions will be made (for example, timely notification of unacceptable risks, necessary action, incidents or malfunctions, waivers to safety requirements, and program deviations). Include a description on how residual mishap risk is formally accepted and this acceptance is documented. e. Describe the mishap risk assessment procedures, including the mishap severity categories, mishap probability levels, and the system safety design order of precedence that should be followed to satisfy the safety requirements of the program. State any qualitative or quantitative measures of safety to be used for mishap risk assessment including a description of the acceptable and unacceptable risk levels (if applicable). Include system safety definitions that modify, deviate from, or are in addition to those in this standard or generally accepted by the system safety community (see Defense Acquisition Deskbook and System Safety Society’s System Safety Analysis Handbook) (see A.6.1). f. Describe how resolution and action relative to system safety will be implemented at the program management level possessing resolution authority. g. Describe the verification (e.g., test, analysis, demonstration, or inspection) requirements for ensuring that safety is adequately attained. Identify any certification requirements for software, safety devices, or other special safety features (e.g., render safe and emergency disposal procedures). MIL-STD-882D APPENDIX A 17 h. Describe the mishap or incident notification, investigation, and reporting process for the program, including notification of the program manager. i. Describe the approach for collecting and processing pertinent historical hazard, mishap, and safety lessons learned data. Include a description on how a system hazard log is developed and kept current (see A.4.4.8.1). j. Describe how the user is kept abreast of residual mishap risk and the associated hazards. A.4.4.2 Identification of hazards. Identify hazards through a systematic hazard analysis process encompassing detailed analysis of system hardware and software, the environment (in which the system will exist), and the intended usage or application. Historical hazard and mishap data, including lessons learned from other systems, are considered and used. A.4.4.2.1 Approaches for identifying hazards. Numerous approaches have been developed and used to identify system hazards. A key aspect of many of these approaches is empowering the design engineer with the authority to design safe systems and the responsibility to identify to program management the hazards associated with the design. Hazard identification approaches often include using system users in the effort. Commonly used approaches for identifying hazards can be found in the Defense Acquisition Deskbook and System Safety Society’s System Safety Analysis Handbook (see A.6.1) A.4.4.3 Assessment of mishap risk. Assess the severity and probability of the mishap risk associated with each identified hazard, i.e., determine the potential impact of the hazard on personnel, facilities, equipment, operations, the public, or environment, as well as on the system itself. Other factors, such as numbers of persons exposed, may also be used to assess risk. A.4.4.3.1 Mishap risk assessment tools. To determine what actions to take to eliminate or control identified hazards, a system of determining the level of mishap risk involved must be developed. A good mishap risk assessment tool will enable decision makers to properly understand the level of mishap risk involved, relative to what it will cost in schedule and dollars to reduce that mishap risk to an acceptable level. A.4.4.3.2 Tool development. The key to developing most mishap risk assessment tools is the characterization of mishap risks by mishap severity and mishap probability. Since the highest system safety design order of precedence is to eliminate hazards by design, a mishap risk assessment procedure considering only mishap severity will generally suffice during the early design phase to minimize the system’s mishap risks (for example, just don’t use hazardous or toxic material in the design). When all hazards cannot be eliminated during the early design phase, a mishap risk assessment procedure based upon the mishap probability as well as the mishap severity provides a resultant mishap risk assessment. The assessment is used to establish priorities for corrective action, resolution of identified hazards, and notification to management of the mishap risks. The information provided here is a suggested tool and set of definitions that can be used. Program managers can develop tools and definitions appropriate to their individual programs. MIL-STD-882D APPENDIX A 18 A.4.4.3.2.1 Mishap severity. Mishap severity categories are defined to provide a qualitative measure of the most reasonable credible mishap resulting from personnel error, environmental conditions, design inadequacies, procedural deficiencies, or system, subsystem, or component failure or malfunction. Suggested mishap severity categories are shown in Table A-I. The dollar values shown in this table should be established on a system by system basis depending on the size of the system being considered to reflect the level of concern. TABLE A-I. Suggested mishap severity categories. Description Category Environmental, Safety, and Health Result Criteria Catastrophic I Could result in death, permanent total disability, loss exceeding $1M, or irreversible severe environmental damage that violates law or regulation. Critical II Could result in permanent partial disability, injuries or occupational illness that may result in hospitalization of at least three personnel, loss exceeding $200K but less than $1M, or reversible environmental damage causing a violation of law or regulation. Marginal III Could result in injury or occupational illness resulting in one or more lost work days(s), loss exceeding $10K but less than $200K, or mitigatible environmental damage without violation of law or regulation where restoration activities can be accomplished. Negligible IV Could result in injury or illness not resulting in a lost work day, loss exceeding $2K but less than $10K, or minimal environmental damage not violating law or regulation. NOTE: These mishap severity categories provide guidance to a wide variety of programs. However, adaptation to a particular program is generally required to provide a mutual understanding between the program manager and the developer as to the meaning of the terms used in the category definitions. Other risk assessment techniques may be used provided that the user approves them. A.4.4.3.2.2 Mishap probability. Mishap probability is the probability that a mishap will occur during the planned life expectancy of the system. It can be described in terms of potential occurrences per unit of time, events, population, items, or activity. Assigning a quantitative mishap probability to a potential design or procedural hazard is generally not possible early in the design process. At that stage, a qualitative mishap probability may be MIL-STD-882D APPENDIX A 19 derived from research, analysis, and evaluation of historical safety data from similar systems. Supporting rationale for assigning a mishap probability is documented in hazard analysis reports. Suggested qualitative mishap probability levels are shown in Table A-II. TABLE A-II. Suggested mishap probability levels. Description* Level Specific Individual Item Fleet or Inventory** Frequent A Likely to occur often in the life of an item, with a probability of occurrence greater than 10 -1 in that life. Continuously experienced. Probable B Will occur several times in the life of an item, with a probability of occurrence less than 10 -1 but greater than 10 -2 in that life. Will occur frequently. Occasional C Likely to occur some time in the life of an item, with a probability of occurrence less than 10 -2 but greater than 10 -3 in that life. Will occur several times. Remote D Unlikely but possible to occur in the life of an item, with a probability of occurrence less than 10 -3 but greater than 10 -6 in that life. Unlikely, but can reasonably be expected to occur. Improbable E So unlikely, it can be assumed occurrence may not be experienced, with a probability of occurrence less than 10 -6 in that life. Unlikely to occur, but possible. *Definitions of descriptive words may have to be modified based on quantity of items involved. **The expected size of the fleet or inventory should be defined prior to accomplishing an assessment of the system. A.4.4.3.2.3 Mishap risk assessment. Mishap risk classification by mishap severity and mishap probability can be performed by using a mishap risk assessment matrix. This assessment allows one to assign a mishap risk assessment value to a hazard based on its mishap severity and its mishap probability. This value is then often used to rank different hazards as to their associated mishap risks. An example of a mishap risk assessment matrix is shown at Table A-III. MIL-STD-882D APPENDIX A 20 TABLE A-III. Example mishap risk assessment values. SEVERITY PROBABILITY Catastrophic Critical Marginal Negligible Frequent 1 3 7 13 Probable 2 5 9 16 Occasional 4 6 11 18 Remote 8 10 14 19 Improbable 12 15 17 20 A.4.4.3.2.4 Mishap risk categories. Mishap risk assessment values are often used in grouping individual hazards into mishap risk categories. Mishap risk categories are then used to generate specific action such as mandatory reporting of certain hazards to management for action or formal acceptance of the associated mishap risk. Table A-IV includes an example listing of mishap risk categories and the associated assessment values. In the example, the system management has determined that mishap risk assessment values 1 through 5 constitute “High” risk while values 6 through 9 constitute “Serious” risk. TABLE A-IV. Example mishap risk categories and mishap risk acceptance levels. Mishap Risk Assessment Value Mishap Risk Category Mishap Risk Acceptance Level 1 – 5 High Component Acquisition Executive 6 – 9 Serious Program Executive Officer 10 – 17 Medium Program Manager 18 – 20 Low As directed *Representative mishap risk acceptance levels are shown in the above table. Mishap risk acceptance is discussed in paragraph A.4.4.7. The using organization must be consulted by the corresponding levels of program management prior to mishap risk acceptance. A.4.4.3.2.5 Mishap risk impact. The mishap risk impact is assessed, as necessary, using other factors to discriminate between hazards having the same mishap risk value. One might discriminate between hazards with the same mishap risk assessment value in terms of mission capabilities, or social, economic, and political factors. Program management will closely consult with the using organization on the decisions used to prioritize resulting actions. A.4.4.3.3 Mishap risk assessment approaches. Commonly used approaches for assessing mishap risk can be found in the Defense Acquisition Deskbook and System Safety Society’s System Safety Analysis Handbook (see A.6.1) MIL-STD-882D APPENDIX A 21 A.4.4.4 Identification of mishap risk mitigation measures. Identify potential mishap risk mitigation alternatives and the expected effectiveness of each alternative or method. Mishap risk mitigation is an iterative process that culminates when the residual mishap risk has been reduced to a level acceptable to the appropriate authority. A.4.4.4.1 Prioritize hazards for corrective action. Hazards should be prioritized so that corrective action efforts can be focused on the most serious hazards first. A categorization of hazards may be conducted according to the mishap risk potential they present. A.4.4.4.2 System safety design order of precedence (see 4.4). The ultimate goal of a system safety program is to design systems that contain no hazards. However, since the nature of most complex systems makes it impossible or impractical to design them completely hazardfree, a successful system safety program often provides a system design where there exist no hazards resulting in an unacceptable level of mishap risk. As hazard analyses are performed, hazards will be identified that will require resolution. The system safety design order of precedence defines the order to be followed for satisfying system safety requirements and reducing risks. The alternatives for eliminating the specific hazard or controlling its associated risk are evaluated so that an acceptable method for mishap risk reduction can be agreed to. A.4.4.5 Reduction of mishap risk to an acceptable level. Reduce the system mishap risk through a mitigation approach mutually agreed to by the developer, program manager and the using organization. A.4.4.5.1 Communication with associated test efforts. Residual mishap risk and associated hazards must be communicated to the system test efforts for verification. A.4.4.6 Verification of mishap risk reduction. Verify the mishap risk reduction and mitigation through appropriate analysis, testing, or inspection. Document the determined residual mishap risk. The program manager must ensure that the selected mitigation approaches will result in the expected residual mishap risk. To provide this assurance, the system test effort should verify the performance of the mitigation actions. New hazards identified during testing must be reported to the program manager and the developer. A.4.4.6.1 Testing for a safe design. Tests and demonstrations must be defined to validate selected safety features of the system. Test or demonstrate safety critical equipment and procedures to determine the mishap severity or to establish the margin of safety of the design. Consider induced or simulated failures to demonstrate the failure mode and acceptability of safety critical equipment. When it cannot be analytically determined whether the corrective action taken will adequately control a hazard, conduct safety tests to evaluate the effectiveness of the controls. Where costs for safety testing would be prohibitive, safety characteristics or procedures may be verified by engineering analyses, analogy, laboratory test, functional mockups, or subscale/model simulation. Integrate testing of safety systems into appropriate system test and demonstration plans to the maximum extent possible. MIL-STD-882D APPENDIX A 22 A.4.4.6.2 Conducting safe testing. The program manager must ensure that test teams are familiar with mishap risks of the system. Test plans, procedures, and test results for all tests including design verification, operational evaluation, production acceptance, and shelf-life validation should be reviewed to ensure that: a. Safety is adequately demonstrated. b. The testing will be conducted in a safe manner. c. All additional hazards introduced by testing procedures, instrumentation, test hardware, and test environment are properly identified and controlled. A.4.4.6.3 Communication of new hazards identified during testing. Testing organizations must ensure that hazards and safety discrepancies discovered during testing are communicated to the program manager and the developer. A.4.4.7 Review and acceptance of residual mishap risk by the appropriate authority. Notify the program manager of identified hazards and residual mishap risk. For long duration programs, incremental or periodic reporting should be used. A.4.4.7.1 Residual mishap risk. The mishap risk that remains after all planned mishap risk management measures have been implemented is considered residual mishap risk. Residual mishap risk is documented along with the reason(s) for incomplete mitigation. A.4.4.7.2 Residual mishap risk management. The program manager must know what residual mishap risk exists in the system being acquired. For significant mishap risks, the program manager is required to elevate reporting of residual mishap risk to higher levels of appropriate authority (such as the Program Executive Officer or Component Acquisition Executive) for action or acceptance. The program manager is encouraged to apply additional resources or other remedies to help the developer satisfactorily resolve hazards providing significant mishap risk. Table A-IV includes an example of a mishap risk acceptance level matrix based on the mishap risk assessment value and mishap risk category. A.4.4.7.3 Residual mishap risk acceptance. The program manager is responsible for formally documenting the acceptance of the residual mishap risk of the system by the appropriate authority. The program manager should update this residual mishap risk and the associated hazards to reflect changes/modifications in the system or its use. The program manager and using organization should jointly determine the updated residual mishap risk prior to acceptance of the risk and system hazards by the risk acceptance authority, and should document the agreement between the user and the risk acceptance authority. A.4.4.8 Tracking hazards and residual mishap risk. Track hazards, their closures, and residual mishap risk. A tracking system for hazards, their closures, and residual mishap risk must be maintained throughout the system life cycle. The program manager must keep the system user apprised of system hazards and residual mishap risk. MIL-STD-882D APPENDIX A 23 A.4.4.8.1 Process for tracking of hazards and residual mishap risk. Each system must have a current log of identified hazards and residual mishap risk, including an assessment of the residual mishap risk (see A.4.4.7). As changes are integrated into the system, this log is updated to incorporate added or changed hazards and the associated residual mishap risk. The Government must formally acknowledge acceptance of system hazards and residual mishap risk. Users will be kept informed of hazards and residual mishap risk associated with their systems. A.4.4.8.1.1 Developer responsibilities for communications, acceptance, and tracking of hazards and residual mishap risk. The developer (see 3.2.2) is responsible for communicating information to the program manager on system hazards and residual mishap risk, including any unusual consequences and costs associated with hazard mitigation. After attempting to eliminate or mitigate system hazards, the developer will formally document and notify the program manager of all hazards breaching thresholds set in the safety design criteria. At the same time, the developer will also communicate the system residual mishap risk. A.4.4.8.1.2 Program manager responsibilities for communications, acceptance, and tracking of hazards and residual mishap risk. The program manager is responsible for maintaining a log of all identified hazards and residual mishap risk for the system. The program manager will communicate known hazards and associated risks of the system to all system developers and users. As changes are integrated into the system, the program manager shall update this log to incorporate added or changed hazards and the residual mishap risk identified by the developer. The program manager is also responsible for informing system developers about the program manager’s expectations for handling of newly discovered hazards. The program manager will evaluate new hazards and the resulting residual mishap risk, and either recommend further action to mitigate the hazards, or formally document the acceptance of these hazards and residual mishap risk. The program manager will evaluate the hazards and associated residual mishap risk in close consultation and coordination with the ultimate end user, to assure that the context of the user requirements, potential mission capability, and the operational environment are adequately addressed. Copies of the documentation of the hazard and risk acceptance will be provided to both the developer and the system user. Hazards for which the program manager accepts responsibility for mitigation will also be included in the formal documentation. For example, if the program manager decides to execute a special training program to mitigate a potentially hazardous situation, this approach will be documented in the formal response to the developer. Residual mishap risk and hazards must be communicated to system test efforts for verification. A.5 SPECIFIC REQUIREMENTS A.5.1 Program manager responsibilities. The program manager must ensure that all types of hazards are identified, evaluated, and mitigated to a level compliant with acquisition management policy, federal (and state where applicable) laws and regulations, Executive Orders, treaties, and agreements. The program manager should: A.5.1.1 Establish, plan, organize, implement, and maintain an effective system safety effort that is integrated into all life cycle phases. MIL-STD-882D APPENDIX A 24 A.5.1.2 Ensure that system safety planning is documented to provide all program participants with visibility into how the system safety effort is to be conducted. A.5.1.3 Establish definitive safety requirements for the procurement, development, and sustainment of the system. The requirements should be set forth clearly in the appropriate system specifications and contractual documents. A.5.1.4 Provide historical safety data to developers. A.5.1.5 Monitor the developer’s system safety activities and review and approve delivered data in a timely manner, if applicable, to ensure adequate performance and compliance with safety requirements. A.5.1.6 Ensure that the appropriate system specifications are updated to reflect results of analyses, tests, and evaluations. A.5.1.7 Evaluate new lessons learned for inclusion into appropriate databases and submit recommendations to the responsible organization. A.5.1.8 Establish system safety teams to assist the program manager in developing and implementing a system safety effort. A.5.1.9 Provide technical data on Government-furnished Equipment or Governmentfurnished Property to enable the developer to accomplish the defined tasks. A.5.1.10 Document acceptance of residual mishap risk and associated hazards. A.5.1.11 Keep the system users apprised of system hazards and residual mishap risk. A.5.1.12 Ensure the program meets the intent of the latest MIL-STD 882. A.5.1.13 Ensure adequate resources are available to support the program system safety effort. A.5.1.14 Ensure system safety technical and managerial personnel are qualified and certified for the job. A.6 NOTES A.6.1 DoD acquisition practices and safety analysis techniques. Information on DoD acquisition practices and safety analysis techniques is available at the referenced Internet sites. Nothing in the referenced information is considered binding or additive to the requirements provided in this standard. A.6.1.1 Defense Acquisition Deskbook. Wright-Patterson Air Force Base, Ohio: Deskbook Joint Program Office. MIL-STD-882D APPENDIX A 25 A.6.1.2 System Safety Analysis Handbook. Unionville, VA: System Safety Society. MIL-STD-882D 26 CONCLUDING MATERIAL Custodians: Preparing activity: Army - AV Air Force - 40 Navy - AS Air Force – 40 Project SAFT - 0038 Reviewing activities: Army - AR, AT, CR, MI Navy - EC, OS, SA, SH Air Force - 10, 11, 13, 19 STANDARDIZATION DOCUMENT IMPROVEMENT PROPOSAL INSTRUCTIONS 1. The preparing activity must complete blocks 1, 2, 3, and 8. In block 1, both the document number and revision letter should be given. 2. The submitter of this form must complete blocks 4, 5, 6, and 7, and send to preparing activity. 3 The preparing activity must provide a reply within 30 days from receipt of the form. NOTE: This form may not be used to request copies of documents, nor to request waivers, or clarification of requirements on current contracts. Comments submitted on this form do not constitute or imply authorization to waive any portion of the referenced document(s) or to amend contractual requirements. I RECOMMEND A CHANGE: 1. DOCUMENT NUMBER MIL-STD-882 2. DOCUMENT DATE (YYYYMMDD) 20000210 3. DOCUMENT TITLE System Safety 4. NATURE OF CHANGE (Identify paragraph number and include proposed rewrite, if possible. Attach extra sheets as needed.) 5. REASON FOR RECOMMENDATION 6. SUBMITTER a. NAME (Last, First, Middle Initial) b. ORGANIZATION c. ADDRESS (Include zip code) d. TELEPHONE (Include Area Code) (1) Commercial (2) DSN (if applicable) 7. DATE SUBMITTED (YYYYMMDD) 8. PREPARING ACTIVITY a. NAME Headquarters, Air Force Materiel Command System Safety Division b. TELEPHONE (Include Area Code) (1) Commercial (937) 257-6007 (2) DSN 787-6007 b. ADDRESS (Include Zip Code) HQ AFMC/SES 4375 Chidlaw Road Wright Patterson AFB, Ohio 45433-5006 IF YOU DO NOT RECEIVE A REPLY WITHIN 45 DAYS, CONTACT: Defense Standardization Program Office (DLSC-LM) 8725 John J. Kingman Road, Suite 2533 Fort Belvoir, Virginia 22060-6621 Telephone 703 767-6888 DSN 427-6888 DD Form 1426, FEB 1999 (EG) PREVIOUS EDITION IS OBSOLETE. WHS/DIOR, Feb 99




欢迎光临 航空论坛_航空翻译_民航英语翻译_飞行翻译 (http://bbs.aero.cn/) Powered by Discuz! X2