Скачать 29.48 Kb.
This paper introduces the basic concepts of Agile Database Techniques, and effective ways to go about the Data-Oriented aspects of Object Oriented software development. My presentation stresses the differences between Data Modeling and Object-Oriented Modeling and the different skills and philosophies that all IT professionals require to be effective in a Data-Oriented approach. Also, I will focus on mapping objects to Relational Databases, how to deal with Legacy Data issues, and the cost of Data Model changes within the two technologies.
All the people who work on software-based systems know that there is always tension between developers and data professionals. This is not a secret; they simply have different visions and priorities. Developers are often focused on the specific needs of a single project and often strive to work as much as possible in isolation from the rest of the organization. Database administrators (DBAs) focus on the databases that they are responsible for, often minimizing changes to them. Data administrators and data architects focus on the overall data needs of the enterprise, sometimes to the virtual exclusion of the immediate needs of project teams. One of the purposes of this independent study is to find strategies to solve this conflict so that all the IT professionals to work together effectively.
The first step of any organization should be to recognize that they have a serious problem with respect to how their application developer and data professionals work together on one level. Scott W. Ambler in his book Agile Database Techniques, suggests some interesting ideas to take into consideration when solving this problem . He says: “Data is one of several important aspects of software-based systems. Most, if not all, applications are based on moving, utilizing, of otherwise manipulating some kind of data, after all” therefore he adds: “Agile software developers recognize the importance of working together effectively with others and will act accordingly. They have the humility to respect and appreciate the view and abilities of others. The attitude that ‘my group is the center of the universe and everyone has to conform to our vision and follow our process’ doesn’t work well.”
It is important to define and review the differences between the two technologies: Object-Oriented (OO) and Relational Database. Relational databases are proposed to separate the physical storage of data from its conceptual representation, based on the concept that data for an application can be stored in rows of one or more tables. A row of data in one table is often associated with, or related to, one or more rows of data in another table. On the other hand, Object Oriented database is based on the concept that systems should be built from interacting components called Objects. An object typically has two components: state (value) and behavior (operations) . It is somewhat similar to a program variable in a programming language, except that it will typically have a complex internal data structure as well as specialized operations defined by the programmer.
Another difference between these two technologies is identifying data. Relational databases use a meaningful data item that identifies each row uniquely. The key that is selected to identify tuples uniquely within the relation (or table) is called Primary Key. A Relation in a database always has a Primary Key, which is a unique combination of one or more table columns, to identify uniquely records. On the other hand, Objects must be assigned a meaningless but unique identifier (OID). An OID is a single attribute that uniquely identifies an object . The value of an OID is not visible to the external user, but it is used internally by the system to identify each object uniquely and to create and manage inter-object references. An OID can be assigned to program variables of the appropriate type when needed. The main property required of an OID is that it be immutable; that is, the OID value of a particular object should not change. This preserves the identity of the real-world object being represented. Hence, an OO database system must have some mechanism for generating OIDs and preserving the immutability property. It is also desirable that each OID be used only once; that is, even if an object is removed from the database, its OID should not be assigned to another object. These two properties imply that the OID should not depend on any attribute values of the object, since the value of an attribute may change. It is also generally considered inappropriate to base the OID on the physical address of the object in storage, since the physical address can change after a physical reorganization of the database. The main difference between key and OIDs is that keys are business-domain attributes, whereas OIDs are always artificial.
Inheritance is another important difference. In Enhanced Entity-Relationship (EER) Modeling, type inheritance is an important concept associated with subclasses. An entity is defined by the attributes it possesses and the relationship types in which it participates. If the entity is in a subclass then it represents the same real world object as in the superclass, but may possess subclass-specific attributes, as well as those associated with the superclass. In other words, an entity that is a member of a subclass inherits all the attributes of the entity as a member of the superclass. The entity also inherits all the relationships in which the superclass participates .
In the contrary, when we are talking about inheritance in Object Oriented model, two concepts exist for specifying object types: interfaces and classes. In addition, two types of inheritance relationships exist, as I mentioned before: the word behavior to refer to operations, and state to refer to value or properties (attributes and relationships).
It is worth defining these concepts to have a better understanding of what inheritance in OO means. An interface is a specification of the abstract behavior of an object type, which specifies the operation signatures. Although an interface may have state properties (attributes and relationships) as part of its specifications, and these cannot be inherited from the interface. An interface also is noninstantiable that is; one cannot create objects that correspond to an interface definition. A Class is a specification of both the abstract behavior and abstract state of an object type, and is instantiable that is, one can create individual object instances corresponding to a class definition. A subclass inherits all the behavior and state of its superclass. Because interfaces are noninstantiable, they are mainly used to specify abstract operations that can be inherited by classes or by other interfaces. This is called behavior inheritance .
Considering all these differences, mapping between paradigms is an important issue for modern data developers, but there is an “impedance mismatch” between the Object model and the Relational model. Furthermore, the Object model is focused on building applications out of objects that have both data and behavior, whereas the Relational model is focused on storing data. This impedance mismatch comes into play when you look at the preferred approach to access: with the Object-Oriented model one traverses objects via their relationship, whereas with the Relational modal one duplicates data to join the rows in tables. This fundamental difference results in a less than ideal combination of the two paradigms. Therefore, one of the secrets of success for mapping Objects to Relational databases is to understand both paradigms and their differences, and then make intelligent trade-offs based on that knowledge and some fundamental techniques. Agile data developers suggest some mapping techniques, Mapping attributes to columns, implementing Inheritance in a Relational database mapping classes to tables, and implementing and Mapping relationships.
Mapping attributes to columns. A class attribute will map to either zero or a number of columns in a Relational database. It is important to remember that not all attributes are persistent. For example, an Invoice class may have a Grand_Total attribute that is used by its instances for calculations purposes, but that is not saved to the database. Furthermore, some object attributes are objects in their own right; for example, a Course object has an instance of Text_Book as an attribute, which maps to several columns in the database. (Actually, chances are that the Text_Book class will map to one or more tables in its own right). The important thing is that this is a recursive definition: at some point the attribute will be mapped to zero or more columns. It is also possible that several attributes could map to a single column in a table. For example, a class representing a U.S zip code may have three numeric attributes, one representing each of the sections in a full zip code, whereas the zip code may be stored as a single column in an address table.
Implementing inheritance in a Relational database. This technique basically boils down to figuring out how to organize the inherited attributes within your persistence model. The way in which we resolve this challenge can have a major impact on the system design. There are three fundamentals solutions for mapping inheritance into Relational database. All of them have advantages and disadvantages:
Now, in order for us to understand these fundamental techniques let’s keep in mind a simple class hierarchy with two classes; Student class and Professor class inherited from a Person class.
- Using one data entity for an entire class hierarchy. With this approach, we map and entire hierarchy into one data entity, where all the attributes of all the classes in the hierarchy are stored. A specific column (PersonOID) is introduced for the primary key of the table; using OIDs in all the solutions, just to be consistent and to take the best approach for assigning keys to data entities. The advantages of this approach are that it is simple, that polymorphism is supported when a person changes roles, and that ad-hoc reporting (reporting performed for the specific purposes of a small group of users, who commonly write the reports themselves) is also very easy with this approach because all of the personal data you need is found in one table. The disadvantages are that every time a new attribute is added anywhere in the hierarchy a new attribute must be added to the table. And, if a mistake is made while adding a single attribute, it could affect all the classes within the hierarchy in addition to the class hierarchy of whatever class got the new attribute. It also potentially wastes a lot of space in the database.
- Using one data entity per concrete class. With this approach each data entity includes both the attributes and the inherited attributes of the class is taken. So, there would be data entities corresponding to both the Student class and the Professor class because they are concrete. The Person class is abstract, so it would not have an Entity/Table in the database. Each of the data entities is assigned its own primary key; StudentOID and ProfessorOID respectively. The advantage of this approach is that it is still easy to perform ad-hoc reporting, given that all the data you need about a single class is stored in only one table. However, there are several disadvantages. One is that when you modify a class you also have to modify its table and the table of any of its subclasses, which is a lot of work. Second, whenever an object changes its role, for instance from student to professor; you need to copy the data into the appropriate table and assign it a new OID. Once again, it is a lot of work. Third, it is difficult to support multiple roles and still maintain data integrity.
- Using one data entity per class. With this approach you create one table per class; the attributes of which are the OID and the attributes that are specific to that class. In other words, the PersonOID is used as the primary key for all three data entities. The interesting feature is that the PersonOID column in both Professor and Student is assigned two stereotypes (something to take into account when we are talking about Persistence model). The advantage of this approach is that it conforms best to Object-Oriented concepts. It supports polymorphism very well as you have records in the appropriate tables for each role that an object might have. It is also very easy to modify superclasses and add new subclasses because you merely need to modify or add one table. The disadvantages to this approach are: first, there are many tables in the database, one table for each class. Second, it takes longer to read and write data using this technique because you have to access multiple tables. Third, ad-hoc reporting on your database is difficult, unless you add views to simulate the desired tables.
Implementing and Mapping Relationships. In addition inheritance mapping, you must also map the relationship that the object is involved with, so they can be restored at a later date. Before that, it is crucial to understand the differences between the types of Object Relationships in order to map efficiently. These are: association, aggregation, and composition .
From a database perspective, the only difference between association and aggregation/composition relationship is how tightly the objects are bound to each other. In other words, with aggregation you usually want to read in the part when you read in the whole, whereas with an association it is used to model other types of object relationship and it is not always as obvious what you need to do. In addition, there are two categories of object relationship that you need to be concerned with when mapping, commonly known as multiplicity and directionality.
The important issue here as a general rule with relationship mapping, is that you should keep the multiplicity the same, (in most of the cases) and if you use a consistent key strategy within your database, it can greatly simplify your relationship mapping efforts. Therefore, it is fundamental to understand that the differences between Object Oriented and Relational databases that make it difficult to map classes to tables. It would be a mistake to replace your existing database schemas or data models to drive the development of your object models. I only begin to compare and contrast the architectural trade-offs of these approaches and how they can affect the design phase of an application.
One of the principal aims of Legacy data is to facilitate comprehension of the internal structure and behavior of a data schema, but as database technology evolves, the legacy DBMSs will be gradually replace by new offering. However, many mainframe systems are running database that date back 15 or more years. By definition, a legacy data source is any file, database, of software asset (such as a web service of a business application) that supplies or produces data and that has already been deployed. And, for a fact, legacy data is often difficult to work with because of a combination of quality, design, architecture and/or political issues . Unfortunately, due to degradation, legacy systems very often provide low quality levels, and, as a consequence, their maintenance becomes very costly. The goal of Agile Database is to expose both application developers and Agile DBAs to the realities of working with legacy data.
It is important for data developers and Agile DBAs to have a general knowledge of what and where the legacy data resources exist, and the problems which are involved. The common legacy data sources include existing relational databases, as well as hierarchical, network, object, XML documents, flat files, dimensional databases and object/relational databases, also legacy applications that have been wrapped, and legacy services such as Web services of CICS (Customer Information Control System) transactions, can also provide access to existing information. Therefore, the reality is that Legacy data often doesn’t provide the full range of information required by the team because the data does not reflect their new requirements. Also, it is often constrained by the other applications that work with it, constraints that are then put on the team, and it reduces the team’s flexibility because they cannot easily manipulate the source data schema to reflect the needs of their object schema. These problems don’t include the fundamental design problems that data developers and Agile DBAs need to be aware of, which are rarely perfect and often suffer from significant challenges. These design problems may be the result of poor database design in the first place, perhaps the designers did not have a very good understanding of data modeling or, maybe the initial design of a data source was very good but over time the quality degraded as ill-advised schema changes were made. Whatever is the case, agile database focuses on working closely with application programmers to overcome these problems.
Working with legacy data is a difficult task, unfortunately, we all have to do it, so it’s better to accept this fact, gain the skills that we need to succeed, and then get on with the work. In the book “Agile Database Techniques” Scott W. Ambler introduces a collection of strategies and technologies for working with legacy data. The first one is to avoid working with it if possible. He asks, “Why needlessly suffer problems?” If we create your own stand-alone database and accept legacy data as is, you can create a database schema that reflects your actual needs and your team chooses to directly access the data without a conversion effort . This sounds like taking the big design up front approaching to development forces legacy schemas on one person but, if your database schema is created in an iterative and incremental manner, it is possible for data professionals to also work in this manner but that they must choose to do so. If this is the case, they will write only the data-oriented code that they require for the business requirement that they are currently working on. Therefore, their data-oriented code will grow and evolve in an iterative and incremental fashion, just as the code for the rest of the application evolves.
We now have all the major pieces needed to easily discuss and analyze object-relational possibilities. We have discussed object modeling and relational modeling, Mapping Object to Relational Database and some issues with Legacy data. Now is time to take into account the implementation of the needed functionalities. Depending on how much of the integrated model requires, it will impact performance, maintenance, and cost.
Relational Database Management Systems (RDBMSs) programmers sometimes spend more than 25 percent of their coding time mapping program objects to the database, said Akmal Chaudhri, a database expert and visiting scholar at The City University in London. "If you’re modeling a Boeing 747 with an Object DBMS, the relationships between aircraft parts are directly managed [by the database]," With a relational database, he said, you have to decompose the aircraft into tables and then join the tables when you need to reconstruct the aircraft. On the other hand "The result for Object Oriented Database Management Systems (OODMBSs) is less code to develop, reducing development time, and maintenance costs” . The reduced cost of maintenance alone makes the introduction of an Object Oriented Database a great idea.
One of the major forces that any Data developer should take into account is performance and Object Oriented databases can store data sets in their entirety and thus typically run faster than relational databases, which must break data sets into parts for storage within tables and then reassemble them in response to queries. In addition, OO databases can automatically cache data in the client application’s memory, thereby eliminating extra calls to the DBMS’s back end and speeding up responses .
I have attempted to outline the goal of the Agile Database methodology, which is to define strategies that IT professionals can apply in a wide variety of situations to work together effectively on the data aspects of software systems. Agile software development is real and it's here to stay. If data professionals are to remain relevant they must adopt techniques that reflect the realities of modern development, and that includes working in an agile and evolutionary manner. I explored these presentation techniques for evolutionary database development, including: effective ways to go about the Data-Oriented aspects of Object Oriented software development; differences between Data Modeling and Object-Oriented Modeling; mapping objects to Relational Databases; techniques for working with Legacy Data issues; and the cost of Data Model changes within the two technologies. There are no doubts that object oriented ideas are here to stay and they have influenced the traditional relational model. The merging of two thoughts leads to an intermediate solution Object-Relational Database Management System (ORDBMS), which merges advantages of both models. So, we are bound to move to the similar object model on the database side to model the complete system more logically and meaningfully. Although we have present for the RDBMS but future for OODBMS can be visualized clearly.
 Ambler, Scott W. Agile Database Techniques. Indiana: Wiley Publishing, Inc. 2003.
 Connolly, Thomas, and Carolyn Begg. Database Systems. 4th ed. London: Addison
 Elmasri, Ramez, and Navathe B. Shamkant. Fundamental of Database Systems. 4th
ed. New York: Addison Wesley, 2003.
 Ambler, Scott W. “Mapping Objects to Relational Databases.” Software
Development Oct. 1995: 63-72.
 Wolfgang, Keller. “Mapping Objects to Tables.” Proceedings EuroPLoP. 2005.
 Bianchi, Alessandro, and Danilo Caivano. “Iterative Legacy Systems,” IEEE
Software, March 2003.
 Larman, Craig. Agile & Iterative Development. Boston: Addison Wesley, 2004.
|An Introduction and Brief Examination of Object-Oriented Data Modeling||Evaluating the impact of different types of inheritance on the object oriented software metrics|
|Object-Oriented Object-Oriented Languages||Object-Oriented Metrics: an Annotated Bibliography|
|Hypergraph-based Object-oriented Model for gis application||The Domain Analysis Integrated in an Object Oriented Development Methodology|
|An Object-Oriented Support Tool for the Design of Casting Procedures||Design Methods and Analysis of Algorithms 4 csm102 Object Oriented Programming through java|
|3. 5 Digital System Design 6 Computer Oriented Numerical Techniques Semester IV||Integrated Model-driven Development Environments for Equation-based Object-oriented Languages|