### **Optimal Series-Parallel Topology of Multi-State System With ** ### **Two Failure Modes** **Gregory Levitin** The Israel Electric Corporation Ltd, Haifa
**Abstract ** When considering systems built from units with two failure modes, changes in system topology that increase system reliability in open mode can decrease system reliability in closed mode and vise versa. For the given set of units composing a system there always exists a configuration which provides maximal reliability for the system experiencing both failure modes. In this paper, we suggest an algorithm for finding optimal series-parallel topology (configuration) for systems consisting of units characterized by different reliability and nominal performance rates. Such systems are multi-state because they can have different levels of output performance depending on the combination of units available at the moment. Reliability is defined as the probability of satisfaction of given constraints imposed on system performance in both modes. The procedure developed to solve this problem is based on the use of a universal moment generating function (UMGF) for the fast evaluation of multi-state system reliability and a genetic algorithm (GA) for optimization. Basic UMGF technique operators are developed for two different types of systems based, respectively, on transmitting capacity and on processing time. Solution encoding for GA implementation is developed and basic GA procedures and parameters are chosen for the given problem. Examples of the optimization of series-parallel configurations for systems of both types are presented.
*Keywords:* Multi-state system; Two failure modes; Optimal configuration; Universal moment generating function; Genetic algorithm.
*Acronyms* MSS multi-state system STFM system with two failure modes UMGF universal moment generating function GA genetic algorithm OPD output performance distribution ### Notation N number of units in a MSS G_{x} random output performance rate of the entire MSS in mode x, x{o,c}, where o and c denote open and closed modes respectively G_{xk} output performance rate of the entire MSS in mode x at state k p_{xk} probability that a system is at state k in mode x g_{x} nominal performance rate of MSS unit in mode x performance rate of MSS unit in fault state in mode x d_{j} random output performance rate of the j-th MSS unit q_{xj} probability that the j-th unit of MSS is in operational condition in mode x W_{x} system demand in mode x F_{x}(G_{x},W_{x}) function representing the relation between MSS performance rate and demand in mode x Q_{x} probability of system fault in mode x u_{x}(z) u-function representing a subsystem (unit) performance distribution in mode x U_{x}(z) the u-function of the entire MSS in mode x _{} composition operator over u-functions ^{x}_{cj} function determining performance rate for a group of units for MSS of type j in mode x, c{s,p}, where s and p denote series connection and parallel connection respectively, j{f,t}, where f and t denote transmitted flow model and operation time model respectively. ( U_{x}(z),F_{x},W_{x}) operator over MSS u-function which determines probability Pr{ F(G_{x},W_{x})0} in mode x **1. Introduction** Systems with two failure modes consist of units that can fail in either of two different modes. For example, switching systems can fail to close when commanded to close and can also fail to open when commanded to open. Typical examples of switching devices with two failure modes are fluid flow valves and electronic diodes. The study of factors influencing the reliability of STFM started as early as in the 50's [1-3] and still attracts interest of researchers [4,5]. Several works were devoted to optimizing STFM structure subject to different constraints [6-14]. In 1984 Barlow and Heidtman [13] suggested using a generating function method for computing k-out-of-n reliability of STFM. In 1995 Yokota et al. first used a genetic algorithm for STFM optimization [14]. The aforementioned works consider only reliability characteristics of units composing the system (so called binary-state model). In many practical cases, some measures of unit (system) performance must be taken into account. For example, fluid-transmitting capacity is an important characteristic of a system containing fluid valves, while operating time is crucial when a system of electronic switches is considered. When one deals with a system consisting of units with different performance and reliability characteristics, the failure effect will be essentially different for units with different nominal performance rates. A system can have different levels of output performance depending on the combination of units available at any given moment. Therefore, the system should be considered to be multi-state. The multi-state STFM model is more realistic than the binary-state one because in many cases one has to consider not pure switching systems, but systems consisting of different production equipment having alarm switches that operate in cases of contingency. Characteristics of the alarm switches can differ for different types of equipment. When the reliability of such a multi-state STFM is evaluated, it is important to estimate the impact of each unit on the system output performance. The measures of multi-state STFM reliability and an algorithm for their evaluation were suggested in [15]. Consider a system of switching units connected in a series-parallel network with a designated source and sink. The system fails in the closed mode if there exists at least one path between source and sink, which consists only of units experiencing closed-mode failure. The system fails in the open mode if every path between source and sink includes at least one unit in the open-mode failure state. The duality of roles of parallel and series connection of units in the two operation modes creates a situation in which any change in system configuration, while increasing system reliability in an open mode, decreases it in a closed mode and vice versa [16,17]. There exist two types of STFM structure optimization problem. The first one is an extension of the well-known redundancy optimization problem. In this problem, one has to determine the number of parallel elements (wits identical functionality) for each system component when the system structure (topology of the reliability block-diagram representing interaction of the components) is given. The algorithms for solving this problem were studied in [6,8,10,14] without respect to the element performances. The redundancy optimization method for the multi-state STFM was developed in [15]. The second problem is to find the configuration (topology of the reliability block-diagram) for a given set of elements that provides the greatest possible system reliability. This problem was formulated and solved in [12] for the binary-state STFM (without respect to the element performances). This paper suggests an algorithm for solving the optimal system configuration problem for the multi-state STFM. The problem of optimizing series-parallel system configuration formulated in [12] is the following: Given a fixed number of identical units with two failure modes that function independently, find the series-parallel configuration of these units that provides maximal system reliability. In this paper, the problem is formulated as follows: Find the series-parallel configuration of a given number of statistically independent units that provides maximal system reliability when the units can experience two failure modes and are characterized by different performance rates and reliability indices. To solve this generalized problem we use the method for evaluating the availability of multi-state series-parallel STFM suggested in [15]. This method, based on universal moment generating functions, proved to be convenient for numeric implementation and effective at solving problems of MSS redundancy and maintenance optimization [19-22], as well as importance analysis [23]. Unlike fault-tree analysis, the UMGF method provides an opportunity to treat systems with the same topologies but with different measures of system performance in the same way. To find the optimal system topology a genetic algorithm (GA) is used which is based on principles of evolution. In order to implement the GA we suggest a method for the system topology encoding which is more compact than one presented in [12]. We also adapt the main GA procedures to make the optimization procedure effective. Section 2 of this paper briefly describes multi-state STFM reliability measures and the technique used for evaluating the availability of a given series-parallel configuration when the performance rate and availability of its units are known for open and closed modes. Two models of MSS are considered according to measures of system performance: transmitted flow and operation time. Section 3 describes the topology encoding method, the used optimization approach and its adaptation to the formulated problem. In the fourth section, illustrative examples are presented in which optimal configurations are found for both of the considered types of multi-state STFM.
**2. Multi-state system availability and its estimation method** *2.1. Measures of multi-state STFM reliability* When applied to multi-state STFM, reliability is considered to be a measure of the ability of a system to meet the demand (required performance level) at each mode. The system fails in open mode if the condition is not satisfied, and it fails in the closed mode if the condition is not satisfied, where G_{o}(t) and G_{c}(t) are output performance of the MSS at time t* *in its open and closed modes, and W_{o} and W_{c} are* *required system output performances in its open and closed modes, respectively. Since the failures in open and closed modes, which have conditional probabilities and , (1) respectively, are mutually exclusive events and probabilities of the both modes are equal to 0.5 (each command to close is followed by command to open and vice versa), the entire system reliability can be defined as R(t)=1-0.5(Q_{o}(t)-Q_{c}(t)). (2) For renewable MSS, a multi-state stationary (steady-state) availability can be introduced as after enough time has passed for this probability to become constant. In the steady-state, system output performance can have different values in open and closed modes: G_{o}(t){G_{o1},…,G_{oI}} and G_{c}(t){G_{c1},…,G_{cJ}}, where I and J are the total numbers of possible different system states in these modes. The distributions of state probabilities are: (3) Two pairs of vectors (**G**_{o},**p**_{o}) and (**G**_{c},**p**_{c}) define system output performance distribution (OPD) in open and closed modes respectively: **G**_{o}={G_{oi}, 1iI}, **p**_{o}={p_{oi}, 1iI} and** G**_{c}={G_{cj}, 1jJ}, **p**_{c}={p_{oj}, 1jJ}. In order to obtain MSS stationary availability, one has to determine fault probabilities in both modes as (4) and apply equation (2). *2.2. The u-function Representation of System (Unit) Output Performance Distribution.* The procedure used in this paper for MSS availability evaluation is based on the universal moment generating function (u-function or u-transform) technique, which was introduced in [18] and which proved to be very effective for high dimension combinatorial problems. The detailed description of the universal generating function applications to two types of important practical systems can be found in [15, 19]. The u-function extends the widely known ordinary moment generating function, which is used for evaluation of reliability of STFM consisting of units with identical performance rates [13]. The universal moment generating function of a random discrete variable X is defined as a polynomial (9) where the discrete random variable X has J possible values and p_{j} is the probability that X is equal to x_{j}. In order to evaluate the probability that the random variable X satisfies condition F(X,X^{*})<0 (for example X-X^{*}<0), one should calculate the sum of the coefficients of polynomial u(z) for every term with x_{j}, satisfying the condition: (10) This can be done using the following operator over u(z): (11) where for individual term : In our case, the polynomial u(z) can define performance rate distributions, i.e. it represents all the possible states of the system by relating the probabilities of each state p_{ok }(or p_{ck})_{ }to the _{ }performance rate G_{ok} (or G_{ck}) of the system at that state. Note that the system OPDs in open and closed modes defined by the pairs of vectors {G_{oi}, 1iI}, {p_{oi}, 1iI} and {G_{cj}, 1jJ}, {p_{cj}, 1jJ} can now be represented as , (12) respectively, and the probability that the conditions and are met (system fault probability) can be determined by operators and . If only units with total failures are considered and each unit has nominal performance rate g, performance rate in fault state and availability q in the given mode, then OPD (**G**,**p**) of a system consisting of the single unit in this mode is (16) The individual u-function of such a unit has only two terms and can be defined as (17) In order to obtain the u-function of a system containing a number of units with given individual u-functions, composition operators are introduced. These operators determine the subsystem u-function expressed as polynomial u(z) for a group of units using simple algebraic operations over individual u-functions of units. All the composition operators take the form (18) and satisfy the following conditions: , (19) for arbitrary k. The function (^{.}) in composition operators expresses the entire performance rate of a subsystem consisting of different units in terms of the individual performance rates of the units. The definition of the function (^{.}) strictly depends on the type of connection between the units in the reliability logic-diagram, i.e. on the topology of the logic-diagram representing the subsystem. The function (^{.}) also depends on the physical nature of system performance rate and on the nature of the interaction of units. In this paper, we consider two types of system performance measures: transmitted flow and operation time. *2.3. Transmitted Flow Model* For a pair of units connected in parallel the total transmitting capacity of the system is equal to the sum of the capacities of its units. Therefore, the function (^{.}) in composition operator takes the form: ^{o}_{pf}(d_{1},d_{2})= ^{c}_{pf}(d_{1},d_{2})=d_{1}+d_{2}. (20) For a pair of units connected in series, the unit with the least capacity becomes the bottleneck of the system. In this case, the function (^{.}) takes the form: ^{o}_{sf}(d_{1},d_{2})= ^{c}_{sf}(d_{1},d_{2})=min(d_{1},d_{2}). (21) Applying the _{} operators with corresponding functions (^{.}) in sequence and using the rules (19), one can obtain the u-function for an arbitrary number of units connected in series or in parallel. Combining the two operators, one can obtain a u-function representing the performance distribution of a system with arbitrary series-parallel topology in both modes. The examples of using composition operators to obtain a MSS OPD can be found in [15, 23]. In order to determine the u-function of an individual flow transmitting unit with total failures (for example, a fluid flow valve) in the closed mode, one should note that in the operational state that has probability q_{c}, the unit should transmit nominal flow f (g=f) and in the failure state the unit fails to transmit any flow (=0). Therefore, according to (17), the u-function of the unit takes the form: u_{c}(z)=q_{c}z^{f}+(1-q_{c})z^{0}. (22) In the open mode, the unit has to prevent flow transmission through the system. If the unit succeeds (with probability q_{o}), the flow is 0 (g=0), and if it fails, the flow is equal to its nominal value in the closed mode (=f). The u-function of the unit in the open mode takes the form: u_{o}(z)=q_{o}z^{0}+(1-q_{o})z^{f}. (23) Having u-functions of individual units and applying corresponding composition operators _{}, one obtains the u-functions U_{c}(z) and U_{o}(z) of the entire system. To determine system reliability, one has to define conditions of system failure and success. For the flow transmitting system, it is natural to require that in its closed mode the amount of flow should exceed some specified value W_{c}, while in the open mode it should not exceed a value W_{o}. Therefore, the conditions of system failure are F_{c}(G_{c},W_{c})= G_{c} -W_{c}<0 and F_{o}(G_{o},W_{o})=W_{o}-G_{o}<0. (26) With these conditions one can easily evaluate system availability using operators and and Eq. (2). *2.4. Operation Time Model* Consider a component (subsystem) consisting of two units connected in parallel. The component disconnection is completed only when the both units, including the slowest one, are opened. Therefore, the operation time of a pair of units in the open mode is equal to the greatest of the operation times of the units. The function (^{.}) in composition operator for the open mode takes the form: ^{o}_{pt}(d_{1},d_{2})=max{d_{1},d_{2}}. (27) For a pair of units connected in series the first disconnected unit disconnects the subsystem in the open mode. Therefore, the function (^{.}) takes the form: ^{o}_{st}(d_{1},d_{2})=min(d_{1},d_{2}). (28) If two units are connected in parallel within a component, the first connected unit makes the component connected. Therefore, the operation time of a pair of units in closed mode is equal to the shortest of the operation times of the units. The function (^{.}) in composition operator for the closed mode takes the form: ^{c}_{pt}(d_{1},d_{2})=min{d_{1},d_{2}}. (29) For a pair of units connected in series, both units, including the slowest one, should be connected to make the system connected in the closed mode Therefore, the corresponding function (^{.}) takes the form: ^{c}_{st}(d_{1},d_{2})=max(d_{1},d_{2}). (30) To determine the u-function of an individual unit with total failures (for example, electronic diode) in closed and open modes, note that the unit operates in times t_{c} and t_{o} with probabilities q_{c }and q_{o}, respectively. If the unit fails to operate, its operation time is equal to infinity (). Therefore, according to (17), the u-functions of the unit for the two modes takes the form: . (31) Applying corresponding composition operators _{} over the u-functions of individual units, one obtains the u-functions representing OPDs of the entire system in both modes. For a system in which operation time is the crucial factor, it is natural to require that in its closed and open modes the operation times should not exceed values W_{c} and W_{o}, respectively. The conditions of system failure are F_{c}(G_{c},W_{c})= W_{c}-G_{c}<0 and F_{o}(G_{o},W_{o})=W_{o}-G_{o}<0. (34) With these conditions one can easily evaluate system availability using corresponding operators (11) and Eq. (2). Note that instead of value , one can use any value , where >max{W_{o},W_{c}}.
**Optimization technique** Finding the optimal series-parallel configuration even for moderate number of units is a complicated combinatorial optimization problem having an enormous number of possible solutions. An exhaustive examination of all these solutions is not realistic, considering reasonable time limitations. As in most combinatorial optimization problems, the quality of a given solution is the only information available during the search for the optimal solution. Therefore, a heuristic search algorithm is needed which uses only estimates of solution quality and which does not require derivative information to determine the next direction of the search. The recently developed family of genetic algorithms is based on the simple principle of evolutionary search in solution space. GAs have been proven to be effective optimization tools for a large number of applications. Successful applications of GAs in reliability engineering are reported in [14,19-22,24-30]. It is recognized that GAs have the theoretical property of global convergence [31]. Despite the fact that their convergence reliability and convergence velocity are contradictory, for most practical, moderately sized combinatorial problems, the proper choice of GA parameters allows solutions close enough to the optimal one to be obtained in a short time.
*Genetic Algorithm* Basic notions of GAs are originally inspired by biological genetics. GAs operate with "chromosomal" representation of solutions, where crossover, mutation and selection procedures are applied. "Chromosomal" representation requires the solution to be coded as a finite length string. Unlike various constructive optimization algorithms that use sophisticated methods to obtain a good singular solution, the GA deals with a set of solutions (population) and tends to manipulate each solution in the simplest manner. A brief introduction to genetic algorithms is presented in [32]. More detailed information on GAs can be found in Goldberg’s comprehensive book [33], and recent developments in GA theory and practice can be found in books [30,31].** **The steady state version of the GA used in this paper was developed by Whitley [34]. As reported in [35] this version, named GENITOR, outperforms the basic “generational” GA. The structure of steady state GA is as follows: |