Introduction to Database Design

Introduction to Database Design Dr. Kanda Runapongsa ([email protected]) Dept of Computer Engineering Khon Kaen University. 2 Overview ... Physical d...

0 downloads 106 Views 271KB Size
Introduction to Database Design Dr. Kanda Runapongsa ([email protected]) Dept of Computer Engineering Khon Kaen University

Overview What

are the steps in designing a database? Why is the ER model used to create an initial design? What are the main concepts in the ER model? What are guidelines for using the ER model effectively? 2

Database Design Process  The

process can be divided into six steps  The ER model is most relevant to the first three  Requirement

Analysis  Conceptual database design  Logical database design  Schema refinement  Physical database design  Application and security design 3

Requirement Analysis  Understand

what data is to be stored  Know what applications must be built on top of the database  Understand what operations are most frequent and subject to performance requirements  Summary: Find out what the users want from the database 4

Conceptual Database Design  Create

a simple description of the data that closely matches how users and developers think of the data  This step uses the ER model which is one of several high-level or semantic, data models used in the database design  The initial design must be translated into a data model supported by a commercial system

5

Logical Database Design  Choose

a DBMS to implement the database design  Convert the conceptual database design into a database schema in the data model of the chosen DBMS  Here, we consider only relation DBMSs  The

task is to convert an ER schema into a relational database schema 6

Schema Refinement  Analyze

the collection of relations in the relational database schema to identify potential problems and to refine it  Schema refinement can be guided by some elegant and powerful theory  For

example, we can apply the theory of normalizing relations to reduce the redundancy of data stored in the database

7

Physical Database Design  Consider

typical expected workloads that the database must support and further refine the database design to ensure that it meets desired performance criteria  For example, we may build indexes on some tables and cluster some tables into the same location on the disk 8

Application and Security Design  Any

software project that involves a DBMS must consider aspects of the application that go beyond the database itself  We must describe the role of each entity in every process as part of a complete workflow for that task  For each role, we must identify the parts of the database that must be accessible and the parts of the database that must not be accessible

9

Conceptual Design  What

are the entities and relationships in the enterprise?  What information about these entities and relationships should we store in the database?  What are the integrity constraint or business rules that hold?  A database schema in the ER model can be represented by using ER diagrams  We can map an ER diagram into a relational schema 10

ER Model Basics: Entity (1/2)  Entity:

a real-world object distinguishable from other objects  An entity is described (in DB) using a set of attributes  Entity set: a collection of similar entities  All

entities in an entity set have the same set of attributes; this is what we mean by similar  Each entity set has a key  Each attribute has a domain

11

ER Model Basics: Entity (2/2) A

key is a minimal set of attributes whose values uniquely identify an entity in the set  There could be more than one candidate key  If

so, we designate one of them as the primary key  Which one should we design? 12

Example: Entity  ER

diagram of an entity “Employees”

ssn

name

lot

Employees

The key of this entity is “ssn” which is underlined 

Entity set “Employees” ssn name lot 111 A 12 222 B 24  Entity “111 A 12” is similar to Entity “222 B 24” since both of them are described using the same set of 

13

ER Model Basics: Relationship (1/2) A

relationship is an association among two or more entities, e.g., Khun Mana works in the department of Computer Engineering  Relationship Set: Collection of similar relationships

14

ER Model Basics: Relationship (2/2)  An n-ary relationship set R relates n entities sets E1…En; each relationship in R involves entities E1, …, En  Same

entity set could participate in different relationship sets  Employees

works in which department  Employees supervise which other employees  Same

entity different “roles” in same set

 Employees

supervise subordinates  Employees are supervised by supervisors 15

Example1: Relationships A

relationship between “Employees” and “Departments” since name ssn

dname

lot Employees

 “since”

did Works_In

budget

Departments

is a descriptive attribute of this relationship  A relationship must be uniquely identified by the participating entities 16

Example2: Relationships Same

entity set could participate in different relationship sets, or in different “roles” in same set. name ssn

Manages

lot

Employees Employees

Works_In

Departments

supervisor

super-visee

Reports_To 17

Key Constraints (1/2)  Consider

Works_In: An employee can work in many departments; a department can have many employees‟  In contrast, each dept has at most one manager, according to the key constraint in Manages  Many kinds of relationships One-to-one  One-to-many  Many-to-one  Many-to-many 

18

Key Constraints (2/2) since name

ssn

dname lot

Employees

1-to-1

did

Manages

1-to Many

budget

Departments

Many-to-1

Many-to-Many

19

Participation Constraints 

Does every department have a manager?  If

so, this is a participation constraint: the participation of Departments in Manager is said to be total (vs. partial)  Each employee works in at least one department and that each department has at least one employee  the participation of both Employees and Departments in Works_In is total  A participation that is not total is said to be partial

20

Example: Participation Constraints Employees and Departments have total participation for relationship “Works_In”  For relationship “Manages”, Departments have total participation but Employees have partial participation 

since

name

ssn

dname did

lot Employees

Manages

budget Departments

Works_In

since

21

Weak Entities A

weak entity can be identified uniquely only by considering the primary key of another (owner) entity  Owner

entity set and weak entity set must participate in a one-to-many relationship set (one owner, many weak entities)  Weak entity set must have total participation in this identifying relationship set 22

Example: Weak Entities  “Employees”

is an owner entity set and “Dependents” is a weak entity set  “Dependents” have total participation in the relationship name ssn

lot

Employees

cost

Policy

pname

age

Dependents

23

ISA („is a‟) Hierarchies (1/3)  In

superclass-subclass in C++, attributes in superclass are inherited to subclass  If we declare A ISA B, every A entity is also considered to be a B entity  Hourly_Emps ISA Employees  Overlap constraints: Can Chai be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed)

24

ISA („is a‟) Hierarchies (2/3)  Covering

constraints: Does every Employees entity also have to be an Hourly_Emps or a Contact_Emps entity? (Yes/no)  Reasons for using ISA:  To

add descriptive attributes specific to a subclass  To identify entities that participate in a relationship 25

ISA („is a‟) Hierarchies (3/3) name ssn

lot

Employees

hourly_wages

hours_worked ISA contractid

Hourly_Emps

Contract_Emps 26

Aggregation  Used

when we have to model a relationship involving (entity sets and) a relationship set  Aggregation

allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships

 Aggregation

vs. ternary relationship

 Monitors

is a distinct relationship, with a descriptive attribute  Also, can say that each sponsorship is monitored by at most one employee

27

Example: Aggregation name

ssn

lot Employees

Monitors

since

started_on pid

pbudget Projects

until

Sponsors

dname did

budget Departments 28

Conceptual Design Using the ER Model  Design

choices:

 Should

a concept be modeled as an entity or an attribute?  Should a concept be modeled as an entity or a relationship?  Identifying relationships: Binary or ternary? Aggregation?  Constraints

in the ER model:

A

lot of data semantics can (and should) be captured 29

Entity vs. Attribute (1/2)  Should

address be an attribute of Employees or an entity (connected to Employees by a relationship)?  Depends upon the use we want to make of address information, and the semantics of the data If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued)  If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic) 

30

Entity vs. Attribute (2/2) 



Works_In4 does not allow an employee to name work in a department ssn for two or more periods. Employees Similar to the problem of wanting to record several addresses for an employee: We want to record several

values of the descriptive attributes for each instance of this relationship.

Accomplished by introducing new entity set, Duration.

from

to dname

lot

did Works_In4

budget

Departments

name

dname

ssn

lot

Employees

from

did Works_In4

Duration

budget

Departments

to 31

Entity vs. Relationships 



First ER diagram OK if a manager gets a separate name discretionary ssn budget for each dept. Employees What if a manager gets a discretionary name budget that covers ssn all managed depts? 



Redundancy: dbudget stored for each dept managed by manager. Misleading: Suggests dbudget associated with department-mgr combination.

since

dbudget

lot

dname

did

budget Departments

Manages2

lot dname

since did

Employees

Manages2

budget Departments

ISA

Managers

dbudget

This fixes the problem! 32

Binary vs. Ternary Relationships (1/3) name ssn

pname

lot Employees

Dependents

Covers

Bad design

age

Policies policyid

cost

name

pname

ssn

lot

age

Dependents

Employees Purchaser

Beneficiary

Better design policyid

Policies cost

33

Binary vs. Ternary Relationships (2/3)  If

each policy is owned by just 1 employee, and each dependent is tied to the covering policy, the first diagram is inaccurate  What are the additional constraints in the 2nd diagram?  Each

policy is owned by at most one employee  Each dependent is covered by at most one policy 34

Binary vs. Ternary Relationships (3/3)  An

example in the other direction: a ternary relation Contracts relates entity sets Parts, Departments, and Suppliers, and has descriptive attribute qty.  No combination of binary relationships is an adequate substitute: S

„can-supply‟ P, D „needs‟ P, and D „deals-with‟ S does not imply that D has agreed to buy P from S  How do we record qty?

35

Summary of ER (1/4)  Conceptual

design follows requirement analysis  Yields

a high-level description of data to be stored

 ER

model popular for conceptual design  Constructs

are expressive, close to the way people think about their applications

 Basic

constructs: entities, relationships, and attributes (of entities and relationships)

36

Summary of ER (2/4)  Some

additional constructs: weak entities, ISA hierarchies, and aggregation  Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/covering constraints for ISA hierarchies  Some

constraints (notably, functional dependencies) cannot be expressed in the ER model

37

Summary of ER (3/4)  ER

design is subjective. There are often many ways to model a given scenario!  Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include:  Entity

vs. attribute, entity vs. relationship, binary or n-ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation

38

Summary of ER (4/4)  Ensuring

good database design: resulting relational schema should be analyzed and refined further  Functional

dependency information and normalization techniques are especially useful

39

Reference  Database

Management System 3rd, R. Ramakrishnan and J. Gehrke

40