ࡱ> QSPY UEbjbjWW *n==+A)]BBBBBBBVVVV8dV 822HHHHHH  $> 2 BHHHHH BBHH2HBHBH VVBBBBH : %BB H, bku=VV^| Database Design Definitions Data: Meaningful facts, text, graphics, images, sound, video segments. Database: An organized collection of logically related data. Information: Data processed to be useful in decision making. Definition of a Database A persistent collection of structured data which may be geographically distributed & may be accessed by 1 or more users concurrently from a number of locations. A database is a collection of information supporting specific activities of the organisation, specific members of the organisation or particular applications used by the organisation. A database models a part of the real world. It is no a random assortment of information or a collection of data without an aim. Databases are collections of interrelated files. The key word is interrelated. The records in each file must allow for relationships (think of them as pointers) to the records in other files. Definition of a Database Management System A DBMS is a data storage and retrieval system which permits data to be stored non-redundantly while making it appear to the user as if the data is well-integrated. Advantages of a Database Approach Minimal Data Redundancy/Improved Consistency Data Integration Multiple Relationships Data Independence/Reduced Maintenance Improved Data Sharing Increased Application Development Productivity Enforcement of Standards Improved Data Quality (Constraints) Better Data Accessibility/ Responsiveness Security, Backup/Recovery, Concurrency Pros: The principal advantage of a database is the ability to share the same data across multiple applications and systems. Database technology offers the advantage of storing data in flexible formats. Databases allow the use of the data in ways not originally specified by the end-users - data independence. The database scope can even be extended without impacting existing programs that use it. New fields and record types can be added to the database without affecting current programs. Costs & Risks of a Database Approach New, Specialized Personnel Installation Management Cost and Complexity Conversion Costs Need for Explicit Backup and Recovery Organizational Conflict Cons: Database technology is more complex than file technology. Special software, called a database management system (DBMS), is required. A DBMS is still somewhat slower than file technology. Database technology requires a significant investment. The cost of developing databases is higher because analysts and programmers must learn how to use the DBMS. In order to achieve the benefits of database technology, analysts and database specialists must adhere to rigorous design principles. Another potential problem with the database approach is the increased vulnerability inherent in the use of shared data. To fully exploit the advantages of database technology, a database must be carefully designed. The end product is called a database schema, a technical blueprint of the database. Database design translates the data models that were developed for the system users during the definition phase, into data structures supported by the chosen database technology. Subsequent to database design, system builders will construct those data structures using the language and tools of the chosen database technology. Range of Database Applications Personal Database e.g. Workgroup Database e.g. Department Database e.g. Enterprise Database e.g. Components of a Database Environment CASE Tools Repository Database Management System (DBMS) Database Application Programs User Interface Data Administrators System Developers End Users Evolution of Database Systems Hierarchical and Network (Legacy) Relational Object-oriented Object-relational Starting Point for a Creating a Database Structure Logical Data Structure Entity Relationship Diagram Information Model Whatever you know it as Mapping to Database Structure is pretty obvious but there is more to designing a good database than the initial structure Mapping Entity -> Table Attribute -> Column Relationship -> Primary/Foreign Keys There are also a number of other less obvious concerns Security Backup/Recovery Database Sizing & Performance Indexing Full Table Scan Searching a table using non-key values will always result in a full table scan A full table scan is where the database ( or DBMS ) has to test each row in the table to see if it meets the selection criteria This behaviour is typical of a reporting process An index can be created to speed up the search & the performance the whole system less processing time is needed & so it can be allocated elsewhere An index is as the name suggests a lookup similar to that in a book only it isnt page numbers that are returned but physical disk addresses Overhead of an Index In the same way as the index of a book takes up a number of pages, an index also takes up disk space. There is a trade off between the space required by the index & the accuracy/detail it contains And so, for regular, scheduled processing of this type an index is highly desirable while for ad hoc reporting, the cost of disk space does not always out-weigh the performance benefit An Index is not obvious from an ERD. Typically, it is the task of a DBA to design such facilities Backup & Recovery Backup A number of different methods of backing up a database exist A copy may be stored on another file system mirror image ideally, they will be held apart, geographically to avoid natural disasters but they may only be apart in terms of the hardware on which they reside The database may be written to a backup tape & stored in a safe place Typically, companies agree to store backups of each others data A backup should be taken regularly depending on the amount of time needed to make the backup ; the amount of database traffic ; the storage requirements of the backup ; the cost ( downtime, loss of service, etc. ) should be backup need to be restored see below The scheduled times between backups vary from database to database, organisation to organisation Daily ideal, given high database traffic - overnight Weekly acceptable given medium to low database traffic - weekends Monthly risky given any amount of traffic month end Annually an organisation with this approach deserves everything it gets 365 days to choose from Recovery As well as backing up the database content the transaction logs should also be backed-up & re-initialised A transaction log is a record ( in sequence ) of every database update, insert & delete that has happened since the last backup occurred If the database/hard-drive is corrupted or crashes the database can be rolled back from the current position to a more stable point ; or using a previous backup & the transaction logs, the database can be rolled forward to a given point before the crash/corruption happened Overhead of Transaction Logs They are maintained by the DBMS when a change to the database occurs processing is slowed as the data must be written to the database & to the transaction log They have a cost in terms of disk space obviously, from the above, they shouldnt be stored on the same HD as the database itself Selective Rollback/forward Given a run-away process ( A ) the changes made to the database by A can be undone by using the transaction logs Once the process and/or the data has been repaired, the process can be re-run Database Sizing A Forgotten Consideration The disk space required to store the database during everyday operation should be calculated To do this, the behaviour of the whole system should be understood Room for expected growth in the medium term ( 2-5 years ) should be included Table Spaces Some tables will grow rapidly e.g. payments in a banking system while others will grow much more slowly e.g. a table of postcodes Each table is given its own space to grow within Tables nearing 100% usage of the allocated space should be identified & a strategy for resolving the situation identified archive or delete unwanted data increase the space allocated to the table Index Sizing As the number of rows in a table increases the space required to maintain an accurate & efficient index also grows A decision should be taken to reduce the accuracy of the index ; or increase the space available to the index Archiving Archiving is still the most common method of dealing with an ageing database Old information is written to another medium ( CD, tape, video ) for storage & removed from the database structure At a later date, it may be imported back into the database if the need arises off-line archive A replica of the database ( containing archived data ) may be stored elsewhere physically or logically ( on another file system ) Access to this archive database will be possible but it will not hinder the performance of the current database in any way Monitoring Inefficiencies can easily creep into a database e.g. a column used to store a persons name is defined as String( 50 ) during the design phase, it was decided that this was a reasonable value However, 5 years later, the person table contains 500K rows & the longest string stored in that column is 35 ( 50 35 ) = 15 characters = 15 bytes per row Knee-jerk reaction Change the column to string( 35 ) Sods Law Better to set it to 37 & live with the wasted space If 75% or less of the rows only use 30 characters or less, may be a viable option to change the column to variable length string & live with the cost of storing the string length Remember this is only 1 column in 1 table Other string columns may yield additional benefits Similar approaches can be taken to columns using other data types does that monetary column really need to be 3dp? does that time column really need to be accurate to 1/1000th of a second Database Security Definition Database Security is concerned with access & protection against intrusion Access to the Database Username & password No access granted unless the users identifies themselves using valid details Access to the Data A database can contain a multitude of tables A typical user will only require access to a limited number of those tables to do their job access should be restricted to those needed Similarly, a user may be restricted in which rows/columns (s)he may view/change Row Level Access A bank clerk shouldnt be able to modify their own account but they may be able to modify other peoples as part of their job Column Level Access An HR clerk shouldnt be able to see employees personal details ( either their own or colleagues ) this access is only required by the HR Officer but they do need to be able to manipulate home addresses Clerk will not have access to such things as Salary, Sick Level, Appraisal Info Roles of Database Users Roles are created which provide access to the appropriate database objects Users gain access to those objects by being assigned a role A user may be assigned 1 or more roles not always a good idea Approaches to Security Pessimistic View The role starts life with absolutely no privileges Each privilege is consciously given The role matures until the correct level of access is achieved Optimistic View Opposite of above Privileges are defensively taken away Tends to be in response to unfavourable access from an inappropriate source Roles 2 Types A Role ( A ) may inherit the abilities or constraints of another role ( B ) & develop them further by adding or denying additional access concentric circles Similar roles may have similar levels of access but to different areas within the database sets Data Security Data Integrity is concerned with maintenance of the accuracy & consistency of the database against operational realities Example Some Background Process A monitors & controls the use of raw materials on a production line, maintaining the number of units held in the stock room reads a column value & updates it with a new, calculated value using the amount read from the database & a value supplied from elsewhere, representing the number of units moved from stores to the production line Process B monitors & controls the delivery of raw materials from a supplier, maintaining the number of units held in the stock room reads a column value & updates it with a new calculated value using the amount read from the database & a value supplied from elsewhere, representing the number of units delivered to stores Both are on-line processes they are running continuously throughout the day looping, waiting for user input Both are autonomous they have their own task to perform & do so religiously without communicating with other on-line processes Example Sequence of Events A delivery arrives from the supplier but to make room for it in stores, an amount is moved to the production line Process A reads the value held in stores - 200 Process B reads the value held in stores - 200 Process A reduces the value held in stores by that supplied to the production line ( 150 ) value in stores = 50 Process B increases the value held in stores by the number of units delivered ( 1000 ) value held in stores = 1200 Locking To get around this problem, we use locking Process A would select the value from the database while at the same time indicating to the database that it intends to update the value the DBMS will then limit the access to this piece of data. When A writes the value it has calculated back to the database, the lock can be released & B can then process this new value. A number of levels of locking exist as well a number of types Database Locking The whole database is locked no other process can gain access to any part of the database e.g. a backup routine While a backup is in progress, nothing should be accessing the database Table Locking All access to the table is denied no joins, etc. to the table can take place e.g. a script that is updating the table structure While the structure changes is taking place, the contents of the table ( & the indexes built on that table ) could be unreliable Page Locking Permission is denied to any process attempting to access table rows held on a page e.g. an update cursor will lock any rows meeting the selection criteria A batch process will update groups of rows in a table Row Locking Permission is denied to any process attempting to access a specific row e.g. a user may have selected a row for update using a GUI form Others are unable to update the row while it is locked elsewhere Dead-Lock A very undesirable situation 2 processes ( A & B ) must lock 2 database objects - table, page, row ( 1 & 2 ) to complete a database transaction A locks 1, B locks 2 A tries to lock 2 & B tries to lock 1 Both fail because the elements are locked from elsewhere Both will keep failing until 1 of the locks is released Solution Defensive programming - Timeout Either the processes ( A or B ) or the DBMS should recognise the deadlock & release the locks currently held Another attempt can be made later to complete the transaction Locks can be released voluntarily ( by the process ) or by force ( in the case of the DBMS ) Concurrency Multi-User System From the above it can be seen that a major advantage of a database system is access to information in a structured & controlled fashion However the biggest risk is striking the balance between usefulness & control Types of Locking Lock for Update The value is read from the database & will be used to calculate a new value to replace it The lock is needed to ensure the new value is accurate Typically used in batch processing Exclusive Read The value is need perform a lengthy calculation The lock is needed to ensure that the end result of the calculation is accurate If, way thru the calculation, the value was replaced, the end result would be inaccurate. Typically used in batch processing & over-night or financial reporting Dirty-Read The value is needed but the accuracy of the value does not have to be guaranteed Typically used in reports run through-out the day The process cannot wait on the data being released from the lock Using a dirty-read method, a process can read the contents of a locked object This method should be used with extreme caution !! do you really want to take an inaccurate value from the database Database Constraints Another method of ensuring data integrity They are similar to a foreign key where the key value must exist in another table This type of constraint limits the number of values the database column can hold e.g. the age of a person must be between 0 & 120, a persons gender they must be Male or Female (Alex Fleming, 2001 Page  PAGE 6 "i   ) , p    l{ %8=QVijo(Gk*m*+E-E.EGEHENEźźź0J j0JU j H*mH nH 6B*CJ$hmH nH B*CJ$hmH nH 6B*CJ(hmH nH B*CJ(hmH nH 5B*CJ(hmH nH  hmH nH  hmH nH mH nH Ac%k  M f & F & Fc%k  M f Y ztnhe]UM[           M c       ft       D  # p    Y  p  : K q  M xP & F & F & F>  p  : K q  M xPY %8=QVjo}ume][XUUUU    !          $  Z         ]     2D       !Y %8=QVjo )Git & F>o )Git&W-6Fdm}P!;ES_;sNd d 0 K k !!6!!!#"0"O"q""""a#####/$N$u$a&W-6Fdm}P!;ES_;sNd d  0 K k !!6!!!#"0"O"q""""a#####/$N$u$$$$u$$$$j%%Q&&&''(6(I(w((^))))0*z***** +Y+l++#,s,,---6.N.../,/=/p/////0h0x001J1}1112 2+22{33444.5C5555@6K6q6|6667!777789D9U999::l:: ;-;;;; <a$j%%Q&&&''(6(I(w((^))))0*z***** +Y+l++#,s,s,,---6.N.../,/=/p/////0h0x001J1}1112 2+2+22{33444.5C5555@6K6q6|6667!777789D9U999:::l:: ;-;;;; <R<<<<<<q===>(>H>>>P?\?n?~?? <R<<<<<<q===>(>H>>>P?\?n?~??@H@Y@i@@@A,A\AA BPB[BBBCmCCC"DtDD+E,E-EOEPERESETEUE2?@H@Y@i@@@A,A\AA BPB[BBBCmCCC"DtDD+E,E-ERESETEddNEOEPEQETEUEmH nH 0J j0JUTEUE(&P . A!"#$% [0@0NormalnP^n6D@!"D Heading 1$$@&a$:>*CJ KH >@12> Heading 2@&^5CJ@@@ Heading 3$7x@&^76CJ,, Heading 4$@&<A@< Default Paragraph Font,,Header  9r 4 @4Footer p#CJ&)@& Page Number(U@!( Hyperlink>*B*UAn(((+NEUE$5  $s,+2:?TEUE%')+,-/01246 ou$ <UE&(*.3#%+!t+ASAVA abM9N9*A+A+A-AGAQARASAVA Alex Fleming*C:\WebPages\lectures\Checked\DB-Design.doc Alex Fleming0C:\WebPages\lectures\Checked\Database-Design.doc Alex Fleming1C:\WebPages\afleming\lectures\database-design.doc Alex Fleming8C:\WINDOWS\TEMP\AutoRecovery save of database-design.asd Alex Fleming1C:\WebPages\afleming\lectures\database-design.doc Alex Fleming1C:\WebPages\afleming\lectures\database-design.doc Alex Fleming8C:\WINDOWS\TEMP\AutoRecovery save of database-design.asd Alex Fleming8C:\WINDOWS\TEMP\AutoRecovery save of database-design.asd Alex Fleming8C:\WINDOWS\TEMP\AutoRecovery save of database-design.asd Alex Fleming1C:\WebPages\afleming\lectures\database-design.doc|^`.*|ttHtttDttt @CJ@OJQJo(" t`@CJ8OJQJo(" Tt`@CJ8OJQJo( t`e@CJ$OJQJo(lt @CJOJQJo(8Pt @CJ$OJQJo(" t @CJOJQJo(<  @ddsddUA@@G:Times New Roman5Symbol3& :ArialEMonotype Sorts;Wingdings"hL\d\fPDm 5rj/f"0dAe8DATABASE DESIGN Alex Fleming Alex Fleming RhubarB26 Oh+'0 , H T ` lxDATABASE DESIGNATA Alex FlemingGNlexlex Normal.dotg Alex FlemingGN3exMicrosoft Word 8.0@^в@h ʪ@h@a=m 5 ՜.+,D՜.+,< hp|  ArAj DATABASE DESIGN Title 6> _PID_GUIDAN{4D6332C0-211A-11D4-8F0E-0000E8314749}  !"#$%&'()*+,-./012345679:;<=>?ABCDEFGIJKLMNORRoot Entry FRu=T1Table8WordDocument*nSummaryInformation(@DocumentSummaryInformation8HCompObjjObjectPoolRu=Ru=  FMicrosoft Word Document MSWordDocWord.Document.89q