Implementing a Fuzzy Relational Database and Querying System With Community Defined Membership Values
Smita Dattatri and Karen Joy November 5, 2004
Table of Contents
I. Introduction II. Theoretical Basis for Fuzzy Relational Databases III. Fuzzy Queries
IV. Project Description, Design and Implementation V. Future Work VI. References
VII. Appendix I. Introduction Conventional relational database systems dominate the database market. It could be argued, however, that since these are based on “crisp” data, the resulting data model bears little resemblance to reality. Crisp data refers to data that is precise and is without vagueness; fuzzy data allows imprecise and approximate data values that lie near some known value. Fuzzy databases and fuzzy queries, which utilize imprecise data, may be the preferred modeling alternative for some situations as they more closely represent how humans conceptualize reality through their perceptions, thinking and language. The purpose of this paper is to present an overview of both fuzzy relational databases and fuzzy queries and then to describe a database implementation that was designed to retrieve images using these constructs. Recommendations for future development on this implementation include expanding the functioning level of the current prototype and exploring both aggregate querying and machine learning to implement “smart” database retrievals.
II. Theoretical Basis for Fuzzy Relational Databases Fuzzy relational databases extend the conventional relational database model to allow for representation of imprecise data. In general, each value in a crisp relational database is taken from a specified attribute domain and is strongly typed and thus, the data is essentially homogeneous across all rows in the relation. First normal form enforces atomicity of these values. Fuzzy relational databases, however, may allow heterogeneous data for an attribute. Here values may be non-atomic and non compliant with first normal form as well. To establish its theoretical validity, fuzzy relational database theory is based on fuzzy set theory, which is extended from classical set theory. Zadeh (1965) is credited with developing both fuzzy logic and fuzzy set theory as a way to model the imprecision and uncertainty that is inherent in both the world and language. Most traditional formal modeling tools are crisp and deterministic and assume certainty. (Bedi, et. al., 2002) Classical propositions are required to be either true or false. However, this precision poorly represents how humans experience the world and valuable information is lost when using this modeling approach. In fuzzy theory, Zadeh argued that rather than a dichotomous, true/false edge, set boundaries and the determination of set membership is actually a matter of degree. Movement in, out and between sets is gradual, not abrupt as it is in crisp set theory. Fuzzy propositions have degrees of truth and are not absolutes. This powerful paradigm shift yields a more realistic and precise model. To illustrate, the United States government defines an “adult” as someone who has reached his or her 18^{th} birthday. When conceived of in terms of sets under the classical model, a person’s lifetime is consequently divided into the two equivalence classes, non-adulthood and adulthood. Other than establishing that someone is eighteen years or older, this model provides little extra information. An eighteen year old and an eighty two year old fall into the same set, yet it is likely that in most ways, both perceived and in reality, these individuals are quite different and the valuable information about these differences is lost. Further, a seventeen year old is likely quite similar to the eighteen year old but he/she would not be included in the same set. Under the fuzzy model, however, each individual has a degree of membership of adulthood. Here the transition from non-adulthood into adulthood can be modeled as a gradual process so that, for instance, a mid-teen would have a very low, near zero, membership value but a thirty year old would have a nearly full, or full, membership value. “Adulthood” would be represented by a curve that spanned a lifetime representing different membership values along the curve. Zadeh argued that this model actually represents human experience more accurately. Most people would probably agree that being an “adult” involves much more than reaching a chronological milestone and that conceptually, it refers to attributes such as maturity in judgment and values, experience, physical and mental competence and so forth. The fuzzy model captures these and also allows for representation of additional information in gray areas when, for instance, a person is of chronological age but is at a non-adult functioning level in other capacities (e.g., due to mental impairments, aging processes, etc.). Classical crisp sets are described by a characteristic function that maps an element from the universal set onto {0,1}. Given the universal set, U, a set S is described by its characteristic function CharS as S = {x| x Î U and CharS(x) = 1}. Thus, the characteristic functions of all set members take a value of 1; elements taking a value of 0 would not be included in the set. Fuzzy sets extend the characteristic function and use a membership function, mF(x). mF(x) is a characteristic function that most commonly maps the (crisp) universal set onto the real numbers [0,1]. The range value for x is its membership value m. Instead of being definitively in or out of the set, m gives a grade of membership. It indicates to what degree the element exhibits the quality that the set describes. That is, it corresponds to the degree of the truthfulness of the proposition "Element X is in ." In the classical relational model, null values can be used to indicate imprecision or uncertainty of information. In the fuzzy model, m represents the imprecision. The membership function is often conceptualized as an indicator of the degree of possibility that an element has a particular membership value. That is, possible values that an element can assume form a possibility distribution. The membership function is depicted graphically and simple membership functions can be modeled using a triangle or trapezoid. A triangular membership function is used when some approximate data value exists. This possibility distribution is triangular with the apex for X at m = 1 and a parameter, d, showing the range within which the “close” values lie. For example, consider the set COST = {$1K, $5K, $10K, $25K, $50K, $100K}. (Petry, 1996, p. 52) From COST the set HIGH COST may be defined. To show the degree of membership as a possibility distribution, elements are conventionally written in the form m/set element. The set HIGH COST = {0.0/$1K, .125/$5K, .5/$10K, .8/$25K, .9/$50K, 1.0/$100K}. Thus, the possibility of $1K's membership in the set HIGH COST is 0.00 but the possibility of $50K's membership in HIGH COST is a robust 0.90. Figure 1 shows a representation of the approximate value $5000 using a triangular membership function with a range of $500.
1
m
0 $4.5K $5K $5.5K Figure 1. Triangular Membership Function
A trapezoidal membership function is used to represent fuzzy variables. A fuzzy or linguistic variable is one in which fuzzy sets representing linguistic concepts are used to define the states of the variables. Graphically, these have a rectangular region for a range of values where m = 1 and triangular regions outside the rectangle where m = [0,1). Figure 2 shows the use of fuzzy variables. (Klir & Yuan, 1995, p.15) Temperatures ranging from 0-60 are divided into states and each state is a fuzzy set representing the linguistic concepts Very Low, Low, Medium, High and Very High temperatures. Using fuzzy variables allows for the gradual transition and overlap between the states.
“Very Low” “Low” “Medium” “High” “Very High” m=1
m=0 0 5 10 15 20 25 35 45 50 60 Temperature Figure 2. Temperature represented with fuzzy variables.
In the next example, the linguistic variable tall’s membership function is defined as: 0, If height(x) < 5 ft. tall(x) = (height – 5 ft.)/2 ft., If 5 ft. <= height <= 7 ft. 1, If height(x) > 7 ft. Tall’s membership function is shown graphically in Figure 3. 1 m 0 5 ft. 7 ft. Figure 3. Graphic Representation of Tall(x)
As Table 1 shows, each person is then assigned a degree of membership into the fuzzy subset tall using this function. Person | Height | Degree of Tallness Billy3’2”0.00 Yoke5’5”0.21 Erik5’10”0.42 Kareem7’2”1.0
Table 1. Degree of Tallness Using Tall(x) Function Thus, the degree of truth of the statement “Erik is tall” is 0.42. (Example modified from Carnegie Mellon University’s Computer Science Department Newsgroup) The support set and alpha cuts are used to further refine membership. The support set is a subset of the universal set and it includes only the members with m > 0. In the COST example above the element $1K is not included in the support set of HIGH COST. As Li & Liu (1990) describe, the alpha cut of a set creates an "ideal set" by establishing a threshold value of m for inclusion. Elements farther from this ideal set exhibit smaller degrees of membership. Those elements that are within the ideal set have a membership value of 1. In this example, using an alpha cut value of 0.50 on HIGH COST, the set “HIGH COST.50” = {$10K, $25K, $50K, $100K}.
Fuzzy Set Operations Fuzzy set theory must have operations equivalent to those in crisp set theory. The extension principle, introduced by Zadeh (1975), allows for the generalization of Boolean logic of classical mathematical crisp sets to fuzzy sets. Operations are extended to fuzzy sets in this manner using the membership function. Given fuzzy sets A, B with membership functions mA(X), mB(X) respectively, the following operations are defined: Fuzzy Set Equality: A = B <=> mA(X) = mB(X), that is, two fuzzy sets are equivalent when their membership functions are equivalent. Fuzzy Set Containment: B Ê A <=> mB(X) >= mA(X), that is, B contains A when B's membership function is greater than A's membership function. Fuzzy Set Complement: ØA = {X/(1 - mA(X))} Unlike crisp sets where C intersect ØC yields the empty set, a fuzzy set and its complement are not necessarily exclusive sets and may overlap.
Several functions have been used to define set union and intersection. However, Bellman and Giertz (1973) showed that only max and min have these following desirable properties: 1. The grade of membership in a compound fuzzy set depends only on the grade of membership in the subset that formed it; 2. Are commutative, associative and distributive; 3. Are continuous and non-decreasing with respect to each of their arguments; 4. Min{1,1} = 1 and max {0,0} = 0. Using these properties of min/max, union and intersection are defined as: Fuzzy Set Union: A union B <=> m (A union B)(X) = Max (mA(X), mB(X)) Fuzzy Set Intersection: A intersect B <=> m (A intersect B)(X) = Min (mA(X), mB(X)) Using these defined operations it can be shown that properties that hold for crisp set operations such as commutativity, associativity and DeMorgan's law are true for fuzzy sets as well. (Petry, 1996) Fuzzy sets also use several operators that do not exist in crisp set theory. The concentration, dilation and intensification operators are used for linguistic hedges. Linguistic hedges are linguistic terms that modify other linguistic terms. Terms such as extremely, not so much and slightly are examples of linguistic hedges. These are used to modify a fuzzy proposition or truth-value. For example, to modify the linguistic term "wealthy," hedges may be used to express the ideas "extremely wealthy" or "somewhat wealthy." Linguistic hedges are meaningless and thus non applicable with crisp predicates (Klir & Yuan, 1995). For example, if a car's color is red in a crisp set, it is meaningless to use hedges to suggest a car is "slightly red" or "extremely red". Linguistic terms are discussed further below. Concentration, dilation and intensification are defined as: Fuzzy Set Concentration, Con(A): m (Con(A)x) = (mA(X)) ^ p Concentration reduces the membership values proportionally more for weaker members. Fuzzy Set Dilation, Dil(A): m (Dil(A)x) = (mA(X)) ^ (1/p) Dilation increases the membership values proportionally more for weaker members. Fuzzy Set Intensification, Int(A) Intensification sets a m level boundary and increases the membership values of those members that are within the boundary and reduces those that are outside the boundary. Thus, the contrast between members and non-members is intensified. The values used for p are chosen by consensus. For instance, the hedge "very" may use concentration with a p value of 2, whereas the hedge "extremely" may use a p value of 3 or more. To illustrate, concentration is used to express "very", i.e., Very(A) = Con(A). Using the set HIGH COST = {0.0/$1K, .125/$5K, .5/$10K, .8/$25K, .9/$50K, 1.0/$100K} from above, the concentration function with a p value of 2 on HIGH COST yields the set VERY HIGH COST = {.016/$5K, .25/$10K, .64/$25K, .81/$50K, 1.0/$100K}. Thus, weaker members were weakened further by use of this function. Fuzzy Relations The definition of a fuzzy relation is generalized from Codd’s definition of a traditional relation. (Codd, 1970) In the classical relational model, a relation r with attributes {A1, A2,. . .,An} is a subset of the Cartesian product of the domains of A1, A2,..., An. R consists of a set of tuples and each tuple is a true proposition that fully satisfies the relation. An implied identity relation that creates equivalent classes over the domain eliminates redundant tuples. A fuzzy relation, fr, similarly is a set of fuzzy tuples taken from the Cartesian product of the domains of fuzzy attributes. Here a tuple exists in fr when its membership value m meets a specified truth value, 0<=m<=1. Thus, each tuple satisfies the relation to some degree. In crisp relational databases, an identity relation forms equivalence classes, effectively removing redundant tuples. Fuzzy relations replace the identity relation with a similarity relation to eliminate redundancy. In fact, the identity relation used in classical relations is a special case of the similarity relation employed by the fuzzy model. (Zadeh, 1970) Fuzzy relations may be characterized by type. A fuzzy equivalence (or alpha-similar) relation is a relation with the properties of reflexivity, symmetry and transitivity. It divides the fuzzy relation into similarity classes for each element, X, in the set, grouping the elements into those that are similar to others to some specified degree. A similarity class is defined as a fuzzy set where the membership value of any of the set members represents the degree of similarity of that element with X. (Klir & Yuan, 1995) A fuzzy equivalence relation allows a relation to capture the imprecision of fuzziness and allows the comparison of fuzzy domain values as well. A fuzzy proximity (or alpha-proximate) relation is a relation where the properties of reflexivity and symmetry hold but the restriction of transitivity is relaxed. A fuzzy proximity relation also divides the relation into classes but unlike similarity relations where the boundaries of the similarity classes are static, here the partitions may change as the data values change. Without the restrictions of the transitive property, a relation’s representation of data is more flexible and thus, potentially is more accurate. (Petry, 1996) Integrity Constraints and Functional Dependencies in Fuzzy Relations As with conventional relational databases, fuzzy databases enforce functional dependencies during data insertion and update operations in order to maintain data integrity. Chen (1998) suggests that functional dependencies in fuzzy relations raise some additional issues. For example, what does it mean to say that X determines Y when X and Y are imprecise values? How is the degree of the proposition determined when the degree of truth belongs to [0,1] instead of {0,1}? Or, to what degree does X determine Y when different pairs from the same tuples yield different truth-values? To take these issues into account a general form of fuzzy functional dependencies exists that uses a measure of closeness. Closeness refers to the degree of similarity of same-set elements in contrast to their dissimilarity to elements of other sets. Thus, to say that X functionally determines Y means that close Y-values correspond to close X-values to some specified degree Q, Q Î [0,1]. This is commonly written X ® QY. Armstong’s axioms are also extended from the classical relational model allowing the inference of additional integrity constraints using reflexivity, augmentation and transitivity. (Chen, 1998) The concept of primary and foreign keys is also extended to fuzzy relations. A Q-key can be defined which determines each of the values in a tuple. Using the Q-key allows the extension of entity and referential integrity. (Chen, 1998) In the classical relational model, entity integrity requires a key value to be non-null. In the fuzzy model, however, the key may be defined to include imprecise values. Thus, referential integrity in the fuzzy model is based on a closeness match rather than a precise match between values. The normal forms are also extendable from the classical relational model to the fuzzy model to further enhance integrity. (See Chen, 1998, for a thorough treatment of the fuzzy normal forms.) Fuzzy Relational Database Implementation Issues As discussed above, an attribute can be described as a possibility distribution where each possible value of an attribute is written with its membership value and all of the elements form one set. (See Table 2.) However, using sets to describe an attribute challenges the first normal form of the relational model of relational database theory; using multi-valued attributes can complicate update and retrieval operations. Date (2004) recommends these be avoided because a simpler structure allows simpler database operations as well. (See Table 3.) In the current project implementation, a simple logical design wherein each of the possible attribute values and membership values was stored in a unique tuple allowed for storage and retrieval of fuzzy information without creating these additional issues. It is worth emphasizing again that storing the membership value—the measure of the truthfulness of the proposition—actually gives a proposition greater precision and strengthens its truthfulness and thus, it really isn’t fuzzy at all. (Date, 2003) | |