Advanced Database Design
1 DatabaseAPI
1.1 Call level interface
(1) Library of DBMS specificinterface functions
(2) A vendor-specific applicationprogramming interface
1.2 ODBC and JDBC
(1) ODBC: independent of anyprogramming language, database system or operating system
(2) JDBC:
3-tier: JDBC-ODBC bridge (client-midware-server)
2-tier: JDBC (client-server)
2 StoredProcedures and Functions
2.1 PSM (Persistent Stored Modules)
(1) Functions and procedures aredefined and executed in database
(2) Stored procedures haveside-effect that update DB and are not allowed in queries
(3) Stored functions have noside-effect that update DB and are allowed in queries
2.2 Reasons for using stored functionsand queries
(1) Save communication time
(2) Put data centered code close toserver
(3) Make consistent DB updates
(4) Make 2-tier solution ratherthan application server
2.3 Deterministic and Not Deterministic
(1) Deterministic: functions are guaranteedto produce the same results every time they are invoked with the same set ofinput value
2.4 Cursor
(1) A cursor is a variable thatrepresents the current position in the stream over the tuples returned by aquery. The stream itself is called a scan
(2) Iterations over a scan: theusual way to use a cursor over a scan is to create a loop with “Fetch”statement and do something with the variable bound by “fetch”. By this way, itallows iteration over arbitrary large data sets without using memory
3 ActiveDatabase
3.1 Conventional model (passive)
(1) Clients control databaseupdates
(2) Real world problem: buy/sellstocks when price is below/above threshold
(3) Application needs toperiodically poll the DBMS: frequent polling-> expensive; infrequentpolling->might miss the right time to react
3.2 Active database
(1) ADBMS: recognize predefinedsituation; trigger predefined actions when situation occurs.
(2) ADBMS rules:
ECA: when event occurs if condition holds do execute action
Data: “Old” table (before update); “New” table (after update)
3.3 Materialized views
(1) Regular view: a regular view isa virtual table, which is not stored in the database but computed a query usingthe table which is issued
(2) Materialized view: amaterialized view is a master table which is automatically materialized by theDBMS when there are updates on any of the table in its view definition. E.g. “Creatematerialized view department as select dno, sum(salary) as total from employee”.Materialized view will be automatically maintained when updating table employee
3.4 Trigger and assertion
(1) Trigger:
E.g.
CREATETRIGGER CheckExceed
BEFOREUPDATE ON account
FOREACH ROW
BEGIN
DECLAREspecialty CONDITION FOR SQLSTATE '45000';
DECLAREoverdraftAmount INTEGER;
SEToverdraftAmount=ABS(new.balance)-old.credit;
IF(overdraftAmount>0 AND overdraftAmount/old.credit < 0.1 ANDnew.balance<0) THEN
INSERT INTO OverDraft VALUES(new.number,now(),overdraftAmount);
ELSEIF (new.balance<0 andoverdraftAmount/old.credit >= 0.1) THEN
SET new.balance=old.balance;
SIGNAL specialty
SET MESSAGE_TEXT = 'Cannot withdrawmore than 10% overdraft!';
ENDIF;
END$
(2) Assertion:
E.g.
Create assertion salary_constraints
check(not exist
(select* from employee e, employee m department d
wheree.salary>m.salary and e.dno=d.dno and d.mgrssn=m.ssn)
)
4 DatabaseIndex
4.1 File
(1) A file is a sequence ofrecords, where each record is a collection of data values representing a tuple
(2) Notion: file descriptor (fileheader) and block factor (block size)
(3) Operations: open, find,findnext, read, modify, insert, delete, recognize
4.2 Stream access to database operators
(1) The result of internal databaseoperator is usually represented as scans of tuples
(2) SQL queries are translated byquery optimizer into execution plans
(3) Basic operators defined overscan objects: open, next, eos, close
(4) Physical algebra operatorsusing scans: segment_scan, index_scan, sort
4.3 Ordered and unordered files
(1) Unordered: linear search;insertion is efficient; do not scale; heap file
(2) Ordered: binary search; readingrecords is efficient; scale; index sequential file
4.4 Hash files
(1) Collisions: an overflow file iskept for storing such records; overflow records that hash to each bucket can belinked together
(2) Disadvantage: mix number ofbuckets; ordered access on hash key is not efficient
4.5 Index
4.5.1 Definition: Indexes are datastructures used to speed up access to sets of records sored in a database.
4.5.2 Types of indexes:
(1) Primary index:
Defined on ordered files; data files are ordered on one or severalkey fields; include one index entry for each block
(2) Clustering index/Non-clusteringindex:
Defined on ordered/unordered files; data files are ordered/unorderedon non-key field
(3) Secondary index:
A secondary index provides a secondary means of accessing a file forwhich some primary access already exists;
It is a non-clustering index since the indexed records are notordered by the index keys
4.5.3 Dense & sparse index
(1) Dense: it has index entries forevery key-value in the data file. A secondary index is usually dense.
(2) Sparse: it has index entriesfor only some of search values. A primary index is usually sparse.
4.6 B- tree and B+ tree
(1) Nodes in B- tree are uniform: ((key,value, subtree),…,(key,value, subtree)), there is no difference between theleft node and the intermediate nodes
(2) Nodes in B+ tree:
Intermediate node: ((key, subtree),…, (key, subtree))
Left node: ((key, value),…, (key,value))
5 IntegrityConstraints
5.1 Domain constraints
Attribute values for attribute A shall be atomic values from domain (A)
5.2 Key constraints
Candidate keys for a relation must be unique
5.3 Entity integrity constraints
No primary key is allowed to have a null value
5.4 Referential integrityconstraints
A tuple that refers to another tuple in another relation must referto an existing tuple
5.5 Syntax constraints
6 DataSecurity and Authorization
6.1 Access control
(1) User accounts and passwords toprevent access to the system itself
(2) Login session
(3) Audit trail (extended log infoabout users’ data interaction)
(4) Discretionary access controlissuing privileges to users for access rights to certain data
6.2 Privilege-based mechanism
6.2.1 common method: grant and revokeprivileges
6.2.2 two types of privilege: theaccount level and relation level
6.2.3 access matrix
(1) M[s,o]->p, where s (subject)are rows in matrix and o (object) are columns in the matrix and p is theprivilege type
(2) E.g
Subject/object | relation | tuple | column | view | procedure |
users | read | write | read | select | update |
accounts | write | read | write | update | select |
programs | read | read | write | update | select |
6.2.4 authorization using views
(1) create views for some users
(2) grant privilege on a viewinstead of original table
(3) user cannot access originaltable but only the created view
6.2.5 statistical database security
(1) it’s mainly used to producestatistic on various populations
(2) It’s used to prohibit theretrieval of individual data but only allows queries that involve statisticalaggregated functions.
7 ObjectStores and Object-relational Database System
7.1 Current database problems
(1) Logical database design:semantics may disappear or be blurred when data is translated from extended ERmodel to less expressive relational data model
(2) Physical database design: newapplication may require data and index structure that are not supported by theDBMS.
7.2 Extending database
7.2.1 User defined types andfunctions: the compiled UDFs are executed inside the DBMS server. System needsto dynamically load compiled code into server.
7.2.2 Regular DBMS indexes allow low dimensionalityindexing
(1) Data structure for indexing:
Hash table-> D=0;
B- Tree->D=1;
KD- Tree->D>1;
(2) Two ways to use index
Using the explicit index & using the transparent index
(3) DBMS kernel extension
Call external index manager using foreign functions & utilizeexisting DBMS structure in procedures and queries.
8 No-sqlDatabase
8.1 No-SQL database examples
Key/value pair, web document database
8.2 No-SQL database characteristics
(1) Highly distributed andparalleled architectures
(2) Highly scalable systems bycompromised constancy
(3) New query languages for newapplications
8.3 Map Reduce
8.3.1 Highly scalable implementation
(1) Parallel batch processing
(2) Based on a scalable file system
(3) Over large amount of datastored in different files
8.3.2 Map reduce stages
(1) Input data:
System component that reads files from scalable file system andsends to map functions applied in parallel
(2) Map function:
Applied in parallel on many different files;
Do some computation;
Emit key/value pair as result;
Result stored by map reduce system as files
(3) Partition function:
Partition output key/value pairs from map function into groups ofkey/value pairs to be reduced in parallel
(4) Reduce function:
Iterates over set of key/value pairs to produce a reduced set ofkey/value pairs stored in the file system
9 DistributedDatabase
9.1 Distributed System
(1) A distributed system is anumber of autonomous computers communicating over a network with software forintegrated tasks
9.2 Distributed database
9.2.1 Transparency
(1) Database is transparently seenfrom application as one sql database
(2) Unaware of replication; automaticpropagation and recover
(3) Distributed query processor automaticallytranslate global query to queries at different sites
9.2.2 Design:
Manual fragmentation and replication of data tables
9.2.3 Definition:
A distributed database is a collection of multiple, logically interrelateddatabases distributed over a computer network
9.2.4 DDBMS:
It’s a software system that permits the management of DDB and makesthe distribution transparent to the users
9.2.5 Data replication
(1) Pros: improve read performanceand availability
(2) Cons: update must be propagatedto a majority of the replicas; temporary inconsistency.
9.2.6 Data fragmentation
(1) Pros: improve read performanceand update performance (local update)
(2) Cons: availability depends on reliableconnections; global updates require advanced synchronization
9.2.7 Correctness of fragmentation
(1) Completeness: assume table Rhas fragments R1, R2, … , Rn. It must be possible to find every tuple of R insome fragments Ri
(2) Reconstructablity: it must be possible to reconstruct the original relation with some relation operators. (vertical: join on key; Horizontal: union all)
(3) Distinctness: data items of R should only occur in exactly one fragment.
9.3 Parallel database
(1) Definition: use parallelprocessing in cluster of computer nodes running parallel data servers
(2) Features: automatic fragmentand replicas of data; transparent both for users including schema transparency.
9.4 Heterogeneous database
9.4.1 views, stored procedures,triggers to harmonize different schemas in different databases
9.4.2 It has one global uniformschema describing all its data managed by a DDBMS
9.4.3 Two kinds of heterogeneousdatabases
(1) Federated databases:
Collections of autonomous databases partially integrated by acontrol federated schema
(2) Mediator:
Middleware to provide schemas and queries to views of looselycoupled databases and other data sources.
10 DataStream and Management Systems
10.1 Data stream
(1) Motivation: too much data to bestored on disks, so would like to query data directly in the streams
10.2 Streams vs tables
(1) Potentially infinite in size
(2) Ordered
(3) Stop condition indicates whenstreams end
(4) Often very high stream data volumeand rate
(5) Real time delivery, quality of service
(6) Active query model, continuous queries
10.3 Continuous query
(1) CQs are turned on and run untilstop condition true
(2) CQs return unbounded datastream as results
(3) CQs operate usually monotone,cannot re-read stream
(4) CQs are often based on time-stampof stream elements
(5) CQs are specified over streamwindows.
10.4 Stream window
10.4.1 Window size based on
(1) Number of elements, a countingwindow
(2) Time stamp, a time window
(3) Period, a landmark window
10.4.2 Window stride
(1) A tumbling window
e.g. 10 elements for a 10-element window
(2) A sliding window
e.g. 2 elements for a 10-element window
(3) A sampling window
e.g. 100 elements for a 10-element window
10.4.3 Window and query
(1) The CQ language needs windowoperator to collect stream tuples into stream of windows of tuples; stride andsize are parameters
10.4.4 Stream join
(1) Stream join: streams can bemerge-joined on its order attribute. E.g. time stamp.
(2) Window join: make joins overthe current windows in database;
Operator: approximate