Скачать 92.67 Kb.
An Introduction and Brief Examination of Object-Oriented Data Modeling|
By Nick Cummings
Table of Contents
1. Introduction 2
2. Background 3
2.1 Object-Oriented(OO) Paradigm 3
2.1.1. Object-Oriented Data Model Concepts 3
2.1.2 Benefits of OOP 5
2.2 Why Current Data Modeling is insufficient for Object Storage. 6
2.3 Object-Oriented Data Modeling History 6
3. Object-Oriented Data Modeling 7
3.1 Object Persistence 7
3.2 Pure OO Data Model 8
3.2.1 Current Database Relationships 8
3.2.2 Relationships Implied by Inheritance and \“swizzling\ 9
3.2.3 Relationship Integrity 9
3.3 Hybrid O/R Data Model 10
3.3.1 How can the classical relational model support new kinds of data? 10
3.3.2 Domain = Object Class 10
3.3.3 Problems with SQL 11
4. Real World Applications 11
4.1 Pure OODBM 11
4.2 O/RDBM 12
4.3 RDBM, IBM’s DB2 12
5. Conclusion 13
Imagine you are the owner of a telephone service provider. In order to manage your company effectively you need to accurately keep track of some specific details. For starters you will need information on your customers, including but not limited to: name, address (including state, zip code, street, apt. number, etc.), phone number, number of lines, type of service, credit card info, etc. Now you must also keep track of your employees with some similar information.
Assuming you have 10,000 customers and 500 employees and you only have 10 attributes of information for each: in total you have 105,000 separate, yet related, pieces of data that all have to be stored and managed. Not only does this not take into account financial information (transactions, in house expenses, etc.), office information (address, who’s in what office, office phone numbers, etc.), extended services (DSL, long distance, call-waiting), or detailed customer/employee information but it also represents a tiny telephone service provider.
As you can see a business must store and organize many details about themselves. Therefore, data management is a very important and, possibly, daunting task. The problem arising is a need to store and manage complex data structures. This can be seen in application design using Object Oriented Architectures over previous Procedural Architectures; as business and application software becomes more complicated, so must storage means.
Fundamentally a data model is designed to mimic real life situations in order to store data in an accurate manner, but there’s a problem – real life is more complex than the data current management systems can support. Future databases will need to handle much more sophisticated types of data than current commercial types typically do . These sophisticated data structures are a necessity for real world data modeling and data requirements. As a result, “one of the slowest changing areas in computing today [is changing]… Right now database designers are dealing with the emergence of a new logical data model – the object oriented data model…” . The Object-Oriented data model (OODM) is ideal for real-world situations because it is specifically designed to look at data before defining program logic. As a result, the program developed is designed to fit the data rather than making concessions with data structures to fit program logic. This allows us to identify program objects, discussed later, which are analogous to real-world objects.
To understand the specifics behind OODM it is best to take a look at the two technologies that lead to its’ development: the Object-Oriented Paradigm and current Relational Data Modeling. Lastly we examine previous object-oriented database management systems (OODBMS) in order to show how varied an approach one can take when trying to create an OODM.
The Object-Oriented Paradigm is the basis behind object-oriented programming (OOP) and represents a new approach toward program organization. When taking an Object-Oriented approach the goal is to combine both data and functions that operate on that data into a single unit, called an Object . An OO Program is, typically, a number of Objects that communicate with each other using one another’s member functions. We will discuss later how this approach simplifies writing, debugging, maintenance, and gives the program a closer relation to the real-world.
Before we begin talking about the OO paradigm we look at something that has nothing to do with programming at all. This case in point, given by Jan Harrington, identifies some fundamental aspects behind the OO Paradigm.
First, assume you are trying to teach a person how to play some various card games. Your first set of instructions is for the form of Solitaire called Klondike. You decide to break the instructions for the game into two parts: information about the game, and possible questions your student may have. So you produce a set of instructions that looks like: 
Information about the Game:
Dealing: Left to Right – First Pass: First card face up, six cards face down…
Playing: One or Three cards can be turned up at a time. As encountered, put aces on top...
Winning: All cards built on top of their aces.
Questions to Ask:
What is the name of the game?
Read Name section.
How many decks do I need?
Read Decks section.
How do I deal the Game?
Read Dealing section.
How do I play the Game?
Read Playing section.
How do I win the Game?
Read Winning section.
The next game that you write instructions for is “Go Fish” using the same pattern for this set of instructions as used for Klondike. Next, using the same pattern, you write the instructions for “War”. Although the three games have very different rules of play, and goals, they can all be described using the same pattern.
This example is very similar to an Object-Oriented approach, and we can use it to identify some key data concepts: Objects, Classes, Encapsulation, and Inheritance.
Objects: An object is a self-contained element used by the program. In this case: each individual game (i.e.: the Klondike Object).
Classes: “A class is… a description of a number of similar objects” , the logical structure of an object. In this case: the template used for each card game.
(Template for the CARDGAME Class)
Information about the Game: Class Variables
Questions to Ask: Class Methods
What is the name of the game?
Read Name section.
How many decks do I need?
Read Decks section.
How do I deal the Game?
Read Dealing section.
How do I play the Game?
Read Playing section.
How do I win the Game?
Read Winning section.
Encapsulation: The Object, having both attributes and methods, is complete in itself. External programs know nothing of structure and don’t need to know.
Inheritance: Many objects have methods in common. Inheritance is used to reduce duplication.
In our example we identified the objects: various card games. Then we designed a general template, class, for all card games that would fit the information we wanted. Finally, we filled in the information for the different games. Our finished product is a general template for all card games, and three object instances (Klondike, Go Fish, War) all with their own individual information. But how does this approach help us?
The Object-Oriented Paradigm was developed because of limitations discovered in earlier approaches to programming. Robert Lafore identifies these limitations in procedural and structured programming through examples of increasingly sized programs.
Languages, such as C, Pascal, FORTRAN, are Procedural; meaning, each statement in the program tells the computer to do something. The program is therefore, a list of instructions. Structured Programming is similar to procedural but it divides the program into functions. Each function has a defined purpose and interface to other functions. However, the principle is the same: a grouping of components that execute a list of instructions .
As these lists grow larger and larger they become increasingly difficult to maintain or even comprehend. Robert Lafore states the procedural and structured paradigm’s flaw rests in two related problems: First - potentially unlimited access to global data, and Second – unrelated functions and data:
Unrestricted Access  - there are two kinds of data in a procedural program: Local Data – hidden inside a function, used exclusively by the function; and Global Data – which can be accessed by any function in the program.
The problem is, whenever more than one function accesses the same data it must be made global. As the program is modified and updated it is possible that these connections will become varied and numerous. As the program grows larger, the number of different connections increases. Now, suppose we need to change a data value from a short to a long: every function that accesses the data must be identified and appropriately changed. Any inconsistencies could have consequences throughout the program.
Unrelated Functions and Data  – this can also be thought of as inaccurate “Real World Modeling”. The arrangement of separate data and functions in Procedural Programming is a poor model of the real world. In an OO Program there is a close correspondence between real-life objects being modeled and objects in the OO Paradigm. A “Card-Game” object in the OO world represents that card game in the real-world. Everything about a card game is included in its’ class description. This gives us an excellent was to conceptualize our program. This is a huge leap over procedural programming which can have global variables and functions distributed all over the place.
The Object-Oriented approach represents a powerful way to cope these kinds of problems and, in turn, with complexity. The benefits of the object-oriented approach are numerous but there are two main benefits to this method.
As a result of these organizational and real-world benefits an OO Program can be easily conceptualized. Additionally, since each object is complete in itself you know exactly which functions access which data: the object’s functions access the object’s data, and nothing else can access it. This simplifies writing, debugging, and maintenance because everything is clearly mapped in the classes. You can see how every object access different data and you can see how Objects interact as opposed to seeing how the program as a whole executes. If you are not convinced on the benefits of an OO approach you will just have to try it out yourself.
The Relational Data Model was first introduced by E.F. Codd in 1970. Nowadays Relational Data Modeling is the industry standard for database design and is the foundation of the majority of today’s Database Management Systems (DBMS). Data is stored in relations (a two-dimensional table) that adhere to the following characteristics :
Example of relation:
Example of relation:
The real-world connection between relationships is represented using foreign keys: An attribute of relationship B is put into another relationship A as a foreign key. What this does is it associates every instance of A with another instance in B, i.e.: B is buildings, A is apartments. Put the building name in an instance of A and now that particular apartment is associated with only one building, but one building can be associated with many apartments.
Not all relations that follow the minimum definition are effective or appropriately structured. For some relations changing data can cause modification anomalies (the situation that exists when inserting or deleting a row from a table inserts or deletes, respectively, facts about two or more themes ). Normalization, a technique for checking the quality of a relational design and applying additional tables or constraints, is used to eliminate such anomalies.
As you can see relational models must adhere to very strict guidelines. Due to the complexity of the objects we want to store, our current relational database management systems are a poor choice for storage: pointers need to be switched to permanent IDs, relational structures must be defined to represent objects, and code must be written to interface with the DBMS . These additional steps are precisely what make current relational database management systems insufficient for object support.
Now that the reader has a fundamental understanding of the two guiding points behind OODM we look at some of the recent attempts in this field:
As established the goal behind this new data model is the ability to store complex data structures that current database management systems are unable to support. This one requirement is called Object Persistence and can be achieved any number of ways. However, all techniques of achieving object persistence fall under two fundamental categories: “Pure” OODM based on the Object-Oriented Paradigm (Pure Object-Oriented Data Models), and hybrids of current Relational Data Models with Object support (Object/Relational Data Models - O/RDM). The two examined are a “Pure” OODM identified by Jan Harrington and an O/RDM developed by C.J. Date and Hugh Darwen.
The goal behind OODM is the storage of complex objects. This seems like a simple thing since we are already dealing with applications that are designed for storage, but as we will see there are a number of intermediary steps that must be done in order to handle complex objects outside of an OO world. First, we examine what data needs to be managed in order for an object to be stored.
In order to store an object we must store all data known about it. Then we store, in a separate place to be accessed by other objects and derived classes, all the inherent methods. By this point we are already asking far more from our relational model than it is designed to support without making concessions. Finally we must store all the relations this object has with any and all other objects. Now we are asking for more than current OO Programming can handle.
In an OO Program, objects use pointers to reference one another. However, a pointer is an in-memory address and, therefore, is only available during execution; the pointer disappears when the program terminates. We use a technique known as “swizzling” to convert this in-memory pointer into a unique, permanent, identifier for the object. In order to store a single object we must:
Now we look at necessary steps to achieve object persistence using: Traditional file Processing, Relational DBMS, and an ideal OODBMS .
Traditional file Processing:
An ideal OODBMS operates in the OO Architecture and eliminates the need to define an intermediary structure between objects and storage methods. It is also not necessary to incorporate additional languages to handle short-comings with object management or swizzling. Furthermore, since the OODBMS has an OO architecture it is much easier to use in conjunction with other OO languages. Ideally, all that is needed to handle object support in an OO Architectural Database is an appropriately coded “save” method.
The first model is identified by Jan Harrington. Harrington states that OODM is a direct outgrowth of the object-oriented paradigm. Therefore, entity objects used by OO Programs are directly analogous to database entities used by “pure” OODBMS . Using the object-oriented paradigm as a guiding tool Harrington identifies key points to address in order to construct an OODM.
In order for our new model to be effective it must - definitely - be able to model traditional relationships supported by current DBMS. The “Pure” OODM handles these and other relations, using an Object Identifier.
An Object Identifier is a hidden, permanent (as long as the object is a part of the database), internal database identifier, assigned and used only by the DBMS for each individual object: Harrington’s model is general so the Object Identifier can be anything from a memory address to the location of a file. The use of these object identifiers implies that the only way to query or traverse the database is through predefined identifiers. Therefore, the “Pure” OODM is navigational. This limits programmers and users to using predefined relations but in a well designed database this may increase performance (using object identifiers to locate tables is faster than joining tables) .
Now, we see how the “Pure” OODM utilizes these Object Identifiers to accomplish One-to-Many and Many-to-Many relationships:
1-N: “One Building has many Apartments”
N-M: “A CUSTOMER may buy many products and a PRODUCT may be purchased by many customers.”
These relationships in the OO world are very similar to those in the Relational world; the difference is, here we are taking advantage of the Object Identifier. Since we need to store objects of arbitrary complexity we automatically require very general storage: meaning we can store pretty much anything as an attribute. This allows us to store large files such as multimedia, object methods, and other complicated data - i.e.: Object Identifiers. Now Harrington’s theoretical model is capable of: storing very complicated data with no restrictions to what can be stored in an attribute, swizzling using Object Identifiers, and all Relational Model relationships. Next we see how relations implied by Object-Orientation are handled.
Now that we have bridged that gap between our OODM and current RDM we must extend the model to handle other forms of relationships, implied by inheritance. Harrington’s model supports simple and multiple inheritances through interfaces. Since interfaces are all that is inherited the behavior of the parent is the only thing past on. This causes a problem with multiple inheritances since no two interfaces in the same hierarchy may have the same name; as such, method overloading is not supported. The way the OODM handles relationships implied by inheritance is similar to the way it handles previously defined relations: The object identifier is adapted on the class level, and each class has the identifiers of the derived classes and vice-versa.
“Is A”: SALARIED/ PART-TIME ‘is an’ EMPLOYEE
As you can see the Object Identifiers are the trick to all of these relations. By adding support for more complicated data structures and a simple dedicated identifier attribute on the class and object level Harrington is able to make connections between complex objects, and store them. This model fully accomplishes object persistence but let us examine how Harrington makes these identifiers synch up.
To make sure the identifiers representing object relations match we use a technique similar to referential integrity that Harrington calls: Relationship Integrity.
Referential Integrity are the rules that specify what should take place when an insert, update, or delete happens on a parent or child. For example: in order for an Employee to be a member of the Furniture Department, the Furniture Department must exist. When deleting the Furniture Department we must delete all employees of the department, otherwise we have employees of a non-existent department.
Relationship Integrity is the way Harrington checks to make sure the object identifier on both sides of a relationship match. This is accomplished using inverse relationships. With an inverse relationship there is an attribute for the base class EMPLOYEE.children, and an attribute in the derived class CHILD.parent. Offered by the OODBMS is syntax similar to constraints used in referential integrity that can be used to specify where the inverse object identifier should appear. Example of how this code might look:
children : (set) CHILD
Inverse is Child.parent
parent : EMPLOYEE
inverse is Employee.children
Just as it is the responsibility of the relational database designer to specify referential integrity, it is the job of the OO database designers to enforce relationship integrity using inverse relations.
Before we get started discussing Date and Darwen’s theoretical model I must first warn you that the relational world that Date and Darwen work within is very specific. Many prescriptions, proscriptions, suggestions, clarifications of terminology and a defined “new relational algebra” outline the relational world that Date and Darwen work in. What you must understand is that while this world is very specific there is one fundamental finding that validates their work.
This second model is the extension of the traditional relational model to include object support. Date and Darwen approach this way due to the very logical rationale: “it would be unthinkable to walk away from so many years of solid relational research and development” Therefore, the goal Date and Darwen attempt to achieve is the dramatic extension in the range of possible data that can be kept and manipulated in our database. But, how can we support new kinds of data within the classical relational framework? 
In order to support new kinds of data in our classical framework Date and Darwen identify a crucial question: “What concept is it in the relational world that is the counterpart to the concept object class in the object world?”
Earlier we defined Object Class as the template used to derive all objects of that class. Since an object class is the basis for all objects, if we were to find the counterpart to Object Class in the relational world we would then be able to find the counterpart to Object. Furthermore, if the counterpart to object is already part of the relational world it is possible that the relational model already has some form of support.
This will be Date and Darwen’s approach to the O/RDM. Once we have found the closest connection to Objects in the relational world, we can figure out how to store and manage them.
We mentioned earlier that an Object Class is, fundamentally, a user-defined data type. Therefore we could consider numbers; in general, to be a very simple class and INTEGERS could be an object of this class. Our integer object has attributes (value of the integer) and methods (operators: =, +). Here is an example of how this code may look if INTEGER were an object with methods setvalue( ), and add( ):
int x = 3, INTEGERX.setvalue(3)
int x + 3, INTEGERX.add(3)
Also note that our integer class is only accessible through these methods and its attributes can only be altered using the same methods. So Object Class is a data type, but how does this help us? It helps because “a domain is nothing more nor less than a data type – possibly… system-defined like INTEGER or CHAR, more generally user-defined like PART#, QTY, [or LASTNAME]” 
A domain is a named set of all possible values that an attribute can have. For Example: Domain – LastName, all objects with a LastName field would be a part of this domain. Therefore, in this example, our Domain is nothing more than a named string data type. In the relational world that Date and Darwen operate in there is no limitation to the complexity of a Domain. Therefore, it would be possible to have Domain – ADDRESS(street, city, apt. number, zip code, …), and now every ADDRESS field within a table would be an instance of the ADDRESS domain, and operators similar to those defined for an INTEGER should be able to be defined for these more complicated domains (however, in this example it would not make any sense, but if we were to have a Part# and Quantity objects, we should be able to multiply the two to get a shipment weight).
If both Domains and Object Classes can be considered data types each with: attributes (values), methods (operators), and being encapsulated (a string does not need to know about an integer in order to function appropriately). Then we can say that Domain is the answer to our question. Which is exactly what Date and Darwen do, they go on to define Object Class and Domain as:
“…a data-type, system or user-defined, of arbitrary internal complexity, whose values are manipulable solely by means of the operators defined for the type in question.” 
Now we have reached the most important part of Date and Darwen’s model. In the end we find that the counter part to Object Class in the relational world is Domain. Therefore, if a Relational DBMS implemented the relational model correctly (as defined by Date and Darwen) it would already have Object Support. Now, according to Date and Darwen, our problem is not in our data model, but in our management software.
As the previous section identified, the hybrid data model can be accomplished through the “correct” implementation of the relational data model (as defined by Date and Darwen). If current commercial DBMS are designed to implement the relational data model then where is the flaw? Since SQL is most prominent, the authors use it to identify a major flaw: No Support for Domains past simple data types such as: images, date, time, numbers (int, long, short), strings.
Ideally, the SQL relational standard should be able to store and manage data of any complexity in every attribute for an object instance. Since the general variation of SQL is unable to store complicated objects without additional work, Date and Darwen see it as being unsuitable for future Relational Database Modeling.
Now we have identified, essentially, three different data models that can be considered for database management systems (“Pure” OO, Object/Relational, Relational), but which is better? Here we will compare the three models and how they fare in real-world situations.
First is the “Pure” object-oriented model. Since these models are quite new we look at hypothetical implementations as well as some commercial attempts.
Using Harrington’s hypothetical model in a test case called “Mighty-Mite Motors” we are try to reengineer the database system so that it will use an OODM. At the end of the test case we identify some noticeable areas where the OODBMS excels and struggles. For starters, the OO approach avoids data duplication – the table, MATERIALS_NEEDED(Qty), appears as though it knows nothing about itself (i.e.: what material is even needed) but in actuality the Object Identifiers hidden from the user know exactly what other objects are related to each MATERIALS_NEEDED instance. Second, this technique allows us to reuse structural elements. For example, define an Object Class ADDRESS(street, city, state, zip code,…) and use this to define the address of every customer: CUSTOMER(Name, ADDRESS, …). The structure of the ADDRESS class is then used in every instance of a CUSTOMER, this helps organize the database and allows us to reuse useful templates.
A problem with this approach was identified earlier but emerges in this example. When creating direct many-to-many relationships it is possible to lose precision. In this example it occurs when relating features to models, many features can be on one model and one feature may belong to many models. It is possible to associate the wrong feature with the wrong model. This forces us to create an additional table between Features and Models for clarity.
In General an OO Data Model is ideal for those situations that deal with complicated data rather than those with a high transaction volume. By the fact, it is very well suited for industries such as : Artificial Intelligence, Computer Aided Design and Manufacturing, J.P. Morgan who uses OODM in modeling financial instruments (i.e.: derivatives and bonds), airplane constructors who use OODM to directly manage aircraft parts, basically any industry that has to store and manage very complicated data, but not an overwhelming amount of it. Finally, the OODM excels in its natural relation to the OO Architecture. Because of this our OODBMS is easily implemented by any OO Programming language with only negligible compromises if any.
Although it is established by Date and Darwen that a true relational database is sufficient for storing objects it is also argued that current relational DBMSs are not true relational models. For that reason we scrutinize SQLJ and JDBC’s (commercial O/RDBMSs) design and performance in real-world applications.
SQLJ is a language designed and standardized to embedded SQL support in Java. SQLJ consists of special syntax for embedding SQL statements, a translator that converts SQLJ to Java, and an execution model which executes the SQL commands via an underlying application program interface (API), commonly JDBC .
JDBC, based on a previous standard for database programming (ODBC), defines a set of interfaces for SQL-based database access from Java. When executing SQL statements you call methods in the JDBC library using Java. The SQL-99 standard (a proposed O/R standard) introduces SQL Object types which are comparable to objects and classes used in Java. JDBC offers two ways to make this connection between SQL object types and Java objects/classes in order to accomplish object support .
By default JDBC maps SQL objects and object references to the generic java.sql.Struct which has methods to get the attributes and methods present in the SQL object. Alternatively SQL Object types can be mapped to custom Java classes using the interface java.sql.SQLData. This alternative way offers methods readSQL() and writeSQL() which allow you to specify how data is read and written between the Java object and the database. Finally, JDBC offers a special Java interface called ResultSet used to display tabular data .
Performance concerns when using SQLJ and JDBC arise due to the variety of interfaces used to access the database (there are JDBC drivers for almost every conceivable DBMS) and various ways of handling SQL statements and batching. The problem of having multiple interfaces just makes the whole process all the more complex but is not really a technical problem.
The problem with SQL statements is with two different approaches to coding them . The first is a ‘prepared statement’ which verifies its metadata once, and a ‘statement’ which verifies every time. Another issue is with batching. Batching sends multiple SQL statements to the server at once, but some DBMS products limit the batching that you may do (i.e.: Oracle only supports batching for Prepared Statements). The problem with batching is similar to the problem with statements: do you batch statements or prepared statements together? Which offers the best performance increase for which JDBC drivers?
All these problems with SQLJ and JDBC are surmountable but add to the complexity of the process as a whole. In general these models are naturally well-suited for high transaction volumes, just like relational designs, and are capable of object support in a more complicated/restrictive manner than a “Pure” OODM.
Finally we examine a relational data model that is currently in use. We are primarily concerned with how it responds to the demands of increased data complexity, interaction with Java (primarily since the previous two were judged on this), and its overall contribution to real-world applications. The RDBMS that is considered is IBM’s DB2.
DB2 has been implemented in a variety of different forms in order to be usable by databases of various size and complexity (including drivers with support for objects).
DB2, as a relational model, accomplishes the increased data complexity through the use of Multi-Dimensional Clustering (MDC) Tables .
A Multi-Dimensional Clustering (MDC) Table allows the database designer to organize data across multiple dimensions. For Example: Assume you want to organize data based on product sold, location sold, and sale date. In a normal relational model you might have: PRODUCT(Name,
|This paper introduces the basic concepts of Agile Database Techniques, and effective ways to go about the Data-Oriented aspects of Object Oriented software||Uml-based modeling of data-oriented web applications|
|Applying uml and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development (3rd Edition)||Object-Oriented Object-Oriented Languages|
|Advanced object and class concepts; Association ends; n-ary associations; Aggregation; Abstract classes; Multiple inheritance; Metadata; Reification; Constraints; Derived data; Packages; unit III : s tate Modeling 6 Hrs||Object-Oriented Metrics: an Annotated Bibliography|
|Hypergraph-based Object-oriented Model for gis application||An Object-Oriented Support Tool for the Design of Casting Procedures|
|The Domain Analysis Integrated in an Object Oriented Development Methodology||Evaluating the impact of different types of inheritance on the object oriented software metrics|