航空 发表于 2010-4-6 23:20:49

欧空管安全风险管理的调查

**** Hidden Message *****

民航 发表于 2010-4-7 14:02:44

A Survey of the EUROCONTROL Approach to Safety Risk Management<BR>R. H. Pierce, MSc.; CSE International Ltd; Flixborough, UK<BR>Keywords: ATM, ESARR, hazard, risk, separation(loss of)<BR>Abstract<BR>This paper takes a critical look at two of the current ESARRs, namely ESARR3 which covers safety<BR>management systems, and ESARR 4 which covers risk tolerability. ESARR 4 defines a series of hazard<BR>severities related to degrees of loss of separation (including the most serious, a mid-air collision, which<BR>attracts a numeric target level of safety) but in practice these are difficult to apply since any failure can<BR>lead to an accident with some (small) probability. Various ways of solving this problem and deriving<BR>useful equipment failure rate targets are discussed in the paper including statistical risk modelling and<BR>event tree analysis.<BR>Introduction<BR>EUROCONTROL is an organisation set up by agreement between European governments to further the<BR>safety of air navigation within Europe. EUROCONTROL is not synonymous with the European Union<BR>and has a much wider membership. One component of EUROCONTROL is the Safety Regulatory<BR>Commission (SRC), which has the task of defining a common regulatory standard for Air Traffic Services<BR>(ATS) providers.<BR>To this end the SRC, acting through its technical support component the Safety Regulation Unit (SRU),<BR>has established a set of EUROCONTROL Safety Regulatory Requirements (ESARRs) which all national<BR>ATS regulators are intended to apply in their own jurisdiction. There are six ESARRs currently<BR>promulgated.<BR>• ESARR 1: Safety Oversight in ATM<BR>• ESARR 2: Incident Reporting and Assessment<BR>• ESARR 3: Use of Safety Management Systems<BR>• ESARR 4: Risk Assessment and Mitigation in ATM<BR>• ESARR 5: Personnel Requirements<BR>• ESARR 6: Software in ATM Systems.<BR>This paper takes a critical look at two of these ESARRS, namely ESARR3 which covers safety<BR>management systems, and ESARR 4, which covers risk tolerability. ESARR 4 defines a series of hazard<BR>severities related to degrees of loss of separation (including the most serious, a mid-air collision) but in<BR>practice these are difficult to apply since any failure can lead to an accident with some (small) probability.<BR>Various ways of solving this problem and deriving useful equipment failure rate targets are discussed in<BR>the paper including statistical risk modelling and event tree analysis.<BR>Hazards and Risks in ATM<BR>ATM systems are probably unique among plant and transportation control systems in that the human<BR>being (the Air Traffic Control Officer or ATCO as he or she is known in Europe) provides the control<BR>function, while the equipment provides information and communications services to support the ATCO in<BR>this task. This is by contrast to other industries where the equipment (control and protection systems)<BR>provides the control function with the human being acting in a monitoring or supervisory role. ATCOs<BR>provide a continuous control service, again by contrast to the case in most industries where human<BR>intervention is the exception rather than the rule (even in a railway signalling system where the signaller<BR>has to request routes for trains on a frequent basis, the signalling system or “interlocking” is designed to<BR>prevent unsafe actions from being carried out).<BR>The main hazards that an ATM system can create at the interface between the air traffic control centre and<BR>the aircraft are:<BR>• Loss of separation between aircraft,<BR>• Loss of separation between aircraft and terrain.<BR>There are other issues, for example failure to advise the aircrew of adverse weather (such as windshear) or<BR>providing incorrect meteorological data (altimeter pressure settings), but these may be regarded as causal<BR>factors leading to one of the top level hazards. Loss of separation is the main concern of most ATC<BR>centres and of ESARR 4 (ref. 1). Loss of separation can lead to catastrophic consequences, the most<BR>notorious recent case being the mid-air collision near &Uuml;berlingen in southern Germany in July 2002<BR>(ref. 2). In other cases, such as the close approach of two wide-body aircraft over Japan in 2000, a<BR>number of passenger and cabin crew injuries were caused either by wake turbulence or violent avoiding<BR>action.<BR>Standard separation between aircraft in Europe is 5nm horizontally or 1000ft vertically, in the case where<BR>radar surveillance service is available (greater separations are applied where there is no radar cover, and a<BR>3nm separation is often applied on final approach). Erosion of separation is regarded as an incident<BR>reportable to the regulator. It will be clear to the reader that not all losses of separation are in fact<BR>hazardous, and in fact there is a continuum of severity between a minor technical infringement of<BR>separation and a very close approach or collision.<BR>Causal factors for loss of separation can be divided into ATCO error, pilot error or equipment failure. As<BR>might be expected, the contribution of human error to losses of separation is dominant. Estimates of the<BR>proportion of ATM-related incidents caused directly by human error as opposed to those caused by<BR>equipment failure range from 95% to 98% or even more. However, we should note that a major<BR>contributory factor to the &Uuml;berlingen accident was the severely (intentionally) degraded mode in which<BR>the ATM equipment was operating on the evening of the accident, which was not properly understood by<BR>the ATCO in control.<BR>ESARR 3<BR>ESARR 3 (ref. 3) requires all ATS providers to implement a safety management system (SMS) and sets<BR>out requirements for the topics that the SMS must address. The safety objective of the ATM service is<BR>stated to be “while providing an ATM service, the principal safety objective is to minimise the ATM<BR>contribution to the risk of an aircraft accident so far as reasonable practicable”<BR>Risk reduction so far as reasonably practicable is exactly equivalent to the principle that risks should be<BR>As Low As Reasonably Practicable (ALARP). The ALARP principle originated in the United Kingdom<BR>and is well explained in the UK Health and Safety Executive document Reducing Risks, Protecting<BR>People (ref. 4). In a nutshell, ALARP calls for risk reduction measures to be taken until any further<BR>expenditure would be out of proportion to the gain achieved. Of course, the risk must be tolerable in the<BR>first place before the ALARP principle is applied: one cannot put an intolerably unsafe system into<BR>service on the ground that it is not reasonably practical to improve it. The degree of expenditure called for<BR>by ALARP will depend on the residual risk. If this is towards the upper end of the tolerability band then<BR>considerable expenditure may be called for, if the risk is already low then only modest improvements may<BR>be required.<BR>In practice, the ALARP principle is usually implemented by means of a hazard-risk index or risk<BR>classification scheme, which will be familiar to many safety engineers. A typical risk classification<BR>scheme applied to ATM equipment is shown in the following matrix (application of such a scheme to<BR>human error is controversial and will not be considered further in this paper).<BR>The hazard severity classes have the following summarised interpretations:<BR>1 Inability to provide any form of air traffic control service<BR>2 Ability to provide an air traffic control service is severely compromised for a<BR>significant period of time<BR>3 Ability to provide an air traffic control service is impaired for a significant period of<BR>time<BR>4 No immediate effect on safety but persistence may cause a loss of safety margins<BR>Table 1 - ATM Equipment Hazard Severities<BR>Severity Class<BR>Hazard<BR>Occurrence<BR>Rate (/h)<BR>4 3 2 1<BR>&gt; 10-3 C A A A<BR>10-3 to 10-4 D B A A<BR>10-4 to 10-5 D C B A<BR>10-5 to 10-6 D D C B<BR>10-6 to 10-7 D D D C<BR>&lt; 10-7 D D D D<BR>Table 2 - ATM Equipment Risk Classification Matrix<BR>In this case, a risk class A is intolerable, class B and C are in the ALARP area, while class D is acceptable<BR>and there is no need for further application of the ALARP principle. Under the ALARP principle a risk<BR>class D should be the target for all systems and a higher risk is only acceptable if class D cannot<BR>reasonably be achieved.<BR>Any ATM hazard can lead to a loss of separation and to a finite, although generally very small,<BR>probability of a mid-air collision - an accident with multiple fatalities. It is therefore not useful to classify<BR>ATM equipment hazards severity by means of the expected harm, since in this case all hazards would be<BR>classed as Catastrophic in severity. This would lead to unreasonably stringent failure rate targets being<BR>necessary to achieve a risk class D, and would not distinguish hazards that in practice are of differing<BR>severity and require more or less stringent control measures. A more useful classification is to consider<BR>the effect of the hazard on the ability of the ATCOs to maintain a safe air traffic service (and therefore<BR>indirectly on the probability of a loss of separation and mid-air collision). This approach has been used by<BR>a number of ATM service providers in Europe for a number of years, and is expressed in Table 1.<BR>The approach taken by ESARR 4, and the difficulties it presents, is discussed in the next section.<BR>ESARR 4 Requirements for Risk Control<BR>ESARR 4 defines five levels of hazard severity:<BR>Severity Class Description<BR>1 Accidents, including mid-air collisions or controlled flight into terrain<BR>2 Serious incidents<BR>3 Minor incidents<BR>4 Significant incidents<BR>5 No immediate effect on safety<BR>Table 3 - ESARR 4 Incident and Hazard Severities<BR>Apart from Class 1 incidents, the other classes are not defined in detail but examples are given of the<BR>effect of the hazard on operations. For example, a Class 2 incident is one which involves “large reduction<BR>in separation….without crew or ATC fully controlling the situation or able to recover from the situation”<BR>or “one or more aircraft deviating from their intended clearance so that abrupt manoeuvre is required to<BR>avoid collision………”. These can only be examples, because “provision of an erroneous ATC clearance<BR>such that abrupt manoeuvre is required to avoid collision…..” would be an equally valid example of a<BR>Class 2 hazard. The initiating event for the sequence that led to the &Uuml;berlingen accident was in fact an<BR>incorrect flight level clearance.<BR>The ESARR 4 hazard descriptions are consistent with those in ESARR 2, which is concerned with<BR>establishing a common framework for incident data collection and reporting.<BR>A severity 1 occurrence is by definition not a hazard but an accident. For such an incident, ESARR 4 sets<BR>out a Target Level of Safety (TLS) in terms of accident rates (mishaps involving harm to human beings)<BR>caused by ATM factors. The TLS is 1.55 × 10-8 accidents per flight hour. This may be thought of as the<BR>upper limit of tolerability for accidents resulting from the ATM hazards discussed in the previous section.<BR>Application of the ALARP principle from ESARR 3 however would imply that service providers should<BR>attempt to do better than the TLS. This is in fact stated indirectly in ESARR 4 in the sentence “As a<BR>necessary complement to demonstrating that these quantitative objectives are met, additional safety<BR>management considerations shall be applied so that more safety is added to the ATM system whenever<BR>reasonable”. This appears to be a rather convoluted re-statement of the ALARP principle.<BR>Tolerable occurrence rates of Class 2 to Class 4 hazards are not given in the present version of ESARR 4,<BR>although it is intended that this information should be included when sufficient incident data has been<BR>collected to establish the relationship between the occurrence rate of incidents of various classes and<BR>accidents. Individual ATM service providers are thus left with the problem of deciding for themselves<BR>what maximum tolerable probabilities to assign to the other hazard severities. However, the provision of<BR>an overall TLS is a valuable first step in setting quantitative safety objectives.<BR>It should be noted in this regard that the TLS is set in terms of flight hours. Any ATM centre should reexpress<BR>the TLS in terms of units which are appropriate to it, given its traffic levels and the average time<BR>for which a flight is under the control of that centre (for an en-route centre in Europe, this is typically<BR>about 20 minutes). If such a centre handles one million flights per year, the TLS for that centre in terms of<BR>accidents per operating hour would be close to 6 × 10-7.<BR>As was discussed earlier, it is very difficult to use the ESARR 4 risk classification scheme directly to set<BR>equipment hazard rate targets, for the simple reason that it is not possible to examine an equipment failure<BR>(or even a human error) and state immediately that it will lead to any of the above incident severity<BR>classes. Indeed, ESARR 4 recognises this by stating that the risk classification scheme only applies to an<BR>overall safety performance at the national level and is not directly applicable to the classification of<BR>individual hazards. It is therefore necessary to devise a scheme for setting safety requirements, which is<BR>consistent with ESARR 4, and with the ALARP principle. Furthermore, when considering ATM<BR>equipment we must remember that the TLS has to be apportioned between equipment and human<BR>contributions. Because the proportion of ATM incidents due to human error is very high, as discussed<BR>above, only about 5% of the TLS can be allocated to equipment. So the TLS for equipment hazards needs<BR>to be something like 7.75 × 10-10 per flight hour (or 3 × 10-8 per operating hour).<BR>How is this problem to be solved? Various methods can be used, including consequence modelling and<BR>calibration of a suitable risk classification matrix. These are discussed in the next two sections.<BR>A point to note here is that ESARR 4 does not allow certain systems classed as “safety nets” to be taken<BR>into account in determining quantitative safety requirements for ATM systems or equipment. A case in<BR>point is the Traffic Alert and Collision Avoidance System or TCAS. This is a system which detects the<BR>presence of another aircraft and warns the crew if it is likely to approach too closely; if both aircraft are<BR>TCAS equipped the systems negotiate a resolution advisory (RA) which the aircrew should obey. TCAS<BR>is a very effective system in that it frequently prevents infringements of separation turning into more<BR>serious incidents. However, in the &Uuml;berlingen accident it was a contributory factor in the accident<BR>sequence since the RA issued to one of the aircraft contradicted the instruction of the ATCO on duty and<BR>the pilot chose to obey the ATCO rather than the TCAS. It seems reasonable therefore not to take credit<BR>for this system.<BR>Event Tree Modelling<BR>One approach to the application of ESARR 4 to equipment is to carry out detailed consequence modelling<BR>for each ATM equipment failure hazard to determine the range of possible outcomes in terms of their<BR>severity class, and the relative probability of each outcome. Using this information a safety requirement<BR>can be stated which would ensure that the tolerable hazard rates at the ESARR 4 level would be met.<BR>The event tree method is well suited for this purpose. The initiating event is the hazard (which will be<BR>some equipment failure mode such as loss of the radar display). The subsequent events in the tree<BR>represent the success or failure of the various mitigating factors (barriers to escalation as they are<BR>sometimes known) that stand between the hazard at this level and the various outcomes, expressed as<BR>ESARR 4 hazard severities.<BR>The following example of an event tree illustrates how the method might be used. The system being<BR>considered is an electronic flight data display for airport control towers, which gives the tower controller<BR>information such the destination, callsign, SSR transponder code, take-off time and departure route of<BR>each aircraft. The system is intended to assist the controller by deciding the sequence of departures and<BR>the time at which each aircraft can be cleared to take off. A crucial factor in this calculation is aircraft<BR>wake turbulence category of each aircraft. It is well known that aircraft create severe vortices behind<BR>them when taking off, and this can affect the stability of the following aircraft. A certain time has<BR>therefore to be left between each aircraft to let the resulting turbulence subside.<BR>Aircraft are divided into four wake turbulence categories depending on their weight, and there are<BR>mandatory time separations to be applied between take-offs depending on the wake turbulence categories<BR>of the leading and following aircraft. The hazard to be considered is “calculation of incorrect take-off<BR>times with respect to wake turbulence categories”. In this case hazard could lead directly to an accident,<BR>for example a light turboprop aircraft taking off a minute behind a Boeing 747 could easily lose stability<BR>and crash into the ground.<BR>The event tree for this scenario is given in Figure 1 below. This is not intended to be a complete analysis<BR>of the situation but is reasonably representative of the method; however, the actual success probabilities<BR>are purely conjectural.<BR>Figure 1 - Event Tree for Accident Scenario<BR>There are some problems with using an event tree in this way. The first is that assigning probabilities to<BR>the intermediate events is often difficult. For example, it may be hard to assign a credible success<BR>probability to an event involving human behaviour such as the detection of an anomalous situation, or the<BR>application of a corrective action. Unless good human error statistics have been kept, which is not often<BR>the case, it may be necessary to rely on expert judgement, which can be contentious especially if the<BR>situation being analysed is only infrequently encountered. In other cases it may be necessary to carry out<BR>detailed separation infringement or collision probability modelling to determine the relative likelihood of<BR>the ESARR 4 severity classes. An example of collision probability modelling using simulation and expert<BR>judgement is given in an airspace risk assessment conducted by Airservices Australia (ref. 5), although<BR>this was not concerned with assigning safety requirements to ATM equipment.<BR>The other problem is simply that developing an event tree or other consequence model for every<BR>equipment failure mode can be very time consuming.<BR>Calibrated Risk Classification Matrix<BR>In this approach, a conventional risk classification matrix as shown in Table 2 is used, but the hazard rate<BR>figures are adjusted to be consistent with the TLS for the ATS unit in question (converted to units of<BR>accidents per operating hour). Once this calibration of the risk classification matrix has been achieved,<BR>individual projects can use it to derive safety requirements in terms of hazard occurrence rates without<BR>performing detailed modelling.<BR>A simple method of calibration is to consider a Class 1 hazard at the equipment level (inability to provide<BR>any form of ATC service, see Table 1) and decide the relative likelihood of an ESARR 4 severity class 1<BR>incident, in other words an aircraft accident. This will then set the maximum tolerable occurrence rate for<BR>an equipment Class 1 hazard (the maximum occurrence rate which will still achieve a Class B risk).<BR>Hazards of lesser severity are then assigned successively lower targets, usually with an order of<BR>magnitude reduction for each severity class, as shown in Table 2.<BR>Even if an equipment class 1 hazard occurs (which is normally a complete loss of communications<BR>between ATC and aircraft), and the aircraft are left to proceed without instructions from ATC, there is<BR>still a very small chance that a mid-air collision will occur. Informally, this is mainly because the sky is a<BR>big place and aircraft are relatively small, and because the aircraft will generally be properly separated<BR>before the system failure occurs. Aircrew can also mitigate the collision risk by other means such as<BR>contacting other ATM service providers (such as major airports or adjacent centres) for emergency traffic<BR>information, and increased vigilance. They can in practice also rely on TCAS but as noted above this<BR>cannot be taken into account. Modelling work and incident (AIRPROX) statistics indicates that there are<BR>least two orders of magnitude of mitigation between the Class 1 failure and a mid-air collision, and<BR>possibly as much as three orders of magnitude.<BR>As with all risk classification and safety requirements derivation schemes, it is inappropriate to consider<BR>the risk from each hazard individually, because the contribution to the overall achieved incident and<BR>accident rate is the sum of the incident rates from each hazard (assuming that all the hazards can occur<BR>independently). Treating hazards individually in this way is sometimes referred to as “salami slicing risk”.<BR>Generally, some assumption is made about the total number of such hazards that could occur, and the<BR>failure rate bands in the risk classification matrix are set accordingly. For example, it may be assumed<BR>that there is only one system which could cause a Class 1 failure, and a further 10 systems each with 5<BR>Class 2 and 5 Class 3 hazards. However, this kind of assumption needs to be re-visited regularly to cater<BR>for changes to the equipment in use, as discussed in the next section.<BR>Altering the balance of risk between equipment and people<BR>As ATM equipment becomes more advanced and offers ATC staff more tools to manage traffic, an<BR>increasing number of hazards can be created by equipment failures simply due to the number of new<BR>functions that are provided and could then fail. This could result in an increased contribution to the risk<BR>budget from equipment failures, and in the end may require a re-calibration of the risk classification<BR>matrix. However, the new features and functions may well improve human error rates, and therefore<BR>reduce the overall contribution from human error. Since human error contributes something like 95% of<BR>the overall risk, any improvements in this area should result in an overall risk reduction even if the<BR>equipment contribution rises somewhat.<BR>Conclusions<BR>The EUROCONTROL requirements set out in ESARR 3 and ESARR 4 set out the overall risk<BR>tolerability framework for ATM in Europe. Although ESARR 4 is very valuable in providing a numeric<BR>target for the ATM contribution to aircraft accidents, it does not provide detailed requirements for the<BR>control of less severe hazards, and individual ATS units must apply it with care and thought to their own<BR>situations. Moreover, ESARR 3 and 4 also call for risk to be reduced ALARP. Proper application of<BR>ESARRs 3 and 4 should therefore result in a European ATM system which is very safe, and equally safe<BR>regardless of which country’s airspace is being traversed.<BR>References<BR>1. EUROCONTROL SRC, ESARR 4, Risk Assessment and Mitigation in ATM, Edition 1.0, 2001.<BR>2. Bundesstelle fur Flugunfalluntersucherung, Investigation Report AX001-1-2/03. Braunschweig,<BR>2004.<BR>3. EUROCONTROL SRC, ESARR 3, Use of Safety Management Systems by ATM Service Providers,<BR>Edition 1.0, 2000.<BR>4. Health and Safety Executive, Reducing Risk, Protecting People: HSE’s Decision Making Process.<BR>London: 2001.<BR>5. Airservices Australia, Airspace Risk Assessment, Class E over Class D Towers, Version 1.0, 2004,<BR>http://www.airservicesaustralia.com/pilotcentre/nas/.<BR>Biography<BR>R. H. Pierce, MSc., Consulting Engineer, CSE International Ltd, Glanford House, Flixborough,<BR>Scunthorpe DN15 8SN, UK. Telephone - +44 1724 862169, facsimile - +44 1724 856256, e-mail –<BR>ron.pierce@cse-international.com.<BR>Mr. Pierce has extensive experience in software engineering topics (compilers, program analysis tools and<BR>software engineering methods). He has over 12 years experience in software and system safety<BR>assessment for industry domains including air traffic management and railway control and signalling<BR>systems.

涟漪雨 发表于 2010-11-11 10:04:00

认真学习!

braveofwind 发表于 2011-2-17 19:44:38

<P>需要看看啊</P>

topgun008 发表于 2011-5-9 08:41:55

eurocontrol SMS

kmlihe 发表于 2015-7-4 14:10:47

thank you very much

buaawu 发表于 2016-3-22 14:44:58

flight operations duties 飞行运行职责
页: [1]
查看完整版本: 欧空管安全风险管理的调查