1. (a) What are Cursors, Stored Procedures and Triggers ? Explain each with the help of an example code.
Answer : -
A cursor is a temporary work area created in the system memory when a SQL statement is executed. A cursor contains information on a select statement and the rows of data accessed by it.
This temporary work area is used to store the data retrieved from the database, and manipulate this data. A cursor can hold more than one row, but can process only one row at a time. The set of rows the cursor holds is called the active set.
There are two types of cursors in PL/SQL :
Implicit Cursors - These are created by default when DML statements like, INSERT, UPDATE, and DELETE statements are executed. They are also created when a SELECT statement that returns just one row is executed.
Explicit Cursors - They must be created when you are executing a SELECT statement that returns more than one row. Even though the cursor stores multiple records, only one record can be processed at a time, which is called as current row. When you fetch a row the current row position moves to next row.
Example - Copy data from "old_accounts" table to "new_accounts" table
Old Table - old_accounts (account_number, customer_id, branch_id, balance)
New Table - new_accounts (account_number, customer_id, balance)
CREATE OR REPLACE PROCEDURE copy_table AS
CURSOR account_cursor IS SELECT * FROM old_accounts;
FETCH account_cursor INTO account_row;
EXIT WHEN account_cursor%NOTFOUND;
INSER INTO new_accounts VALUES(account_row.account_number, account_row.customer_id, account_row.balance);
A stored procedure is a prepared SQL code that you can save, so the code can be reused over and over again.
So if you have an SQL query that you write over and over again, save it as a stored procedure, and then just call it to execute it.
You can also pass parameters to a stored procedure, so that the stored procedure can act based on the parameter value(s) that is passed.
Example - Create a procedure to transfer amount from account A to account B.
account (account_number, customer_id, branch_id, balance)
CREATE OR REPLACE PROCEDURE amount_transfer
(debit_account IN NUMBER, credit_account IN NUMBER, amount IN NUMBER) AS
UPDATE account SET balance=balance-amount WHERE account_number=debit_account;
UPDATE account SET balance=balance+amount WHERE account_number=credit_account;
Run the procedure "transaction.sql" using the following command -
Execute this procedure using the following command -
SQL> exec amount_transfer(1005508022,1005540032,5000)
Triggers are stored programs, which are automatically executed or fired when some events occur. Triggers are, in fact, written to be executed in response to any of the following events:
Example - Write a trigger for a bank which show the transaction details of an account when transaction is made from this account.
account (account_number, customer_id, branch_id, balance)
CREATE OR REPLACE TRIGGER transaction
AFTER UPDATE OF balance ON Account
FOR EACH ROW
WHEN (NEW.balance > 0)
IF :NEW.balance > :OLD.balance THEN
amount := :NEW.balance - :OLD.balance;
dbms_output.put_line('Amount Credit : ' || amount);
amount := :OLD.balance - :NEW.balance;
dbms_output.put_line('Amount Debit : ' || amount);
dbms_output.put_line('Old Balance : ' || :OLD.balance);
dbms_output.put_line('New Balance : ' || :NEW.balance);
1. (b) Differentiate between Logical and Physical Database design.
Answer : -
Logical Database Design
A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. The steps for designing the logical data model are as follows :
Physical Database Design
Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. The steps for physical data model design are as follows :
1. (c) What is a View ? Explain any two strategies for implementing the Views.
Answer : - In SQL, a view is a virtual table based on the result-set of an SQL statement.
A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.
A View can either have all the rows of a table or specific rows based on certain condition.
CREATE or REPLACE VIEW view_name AS SELECT column_name(s) FROM table_name(s) WHERE condition;
A view can be dropped using a DROP statement as :
DROP VIEW view_name;
There are many advantages using view :
Security - Each user can be given permission to access the database only through a small set of views that contain the specific data the user is authorized to see, thus restricting the user's access to stored data.
Query Simplicity - A view can draw data from several different tables and present it as a single table, turning multi-table queries into single-table queries against the view.
Consistency - A view can present a consistent, unchanged image of the structure of the database, even if the underlying source tables are split, restructured, or renamed.
Data Integrity - If data is accessed and entered through a view, the DBMS can automatically check the data to ensure that it meets the specified integrity constraints.
1. (d) What is Hash join ? How is Hash join between two relations computed ? Explain the algorithm and cost calculation for simple hash join.
Answer : -
1. (e) Differentiate between a database management system and a data warehouse. What is the need for a data warehouse ?
Answer : -
Reasons You Need a Data Warehouse
Ensure consistency - Data warehouses are programmed to apply a uniform format to all collected data, which makes it easier for corporate decision-makers to analyze and share data insights with their colleagues around the globe. Standardizing data from different sources also reduces the risk of error in interpretation and improves overall accuracy.
Make better business decisions - Successful business leaders develop data-driven strategies and rarely make decisions without consulting the facts. Data warehousing improves the speed and efficiency of accessing different data sets and makes it easier for corporate decision-makers to derive insights that will guide the business and marketing strategies that set them apart from their competitors.
Improve their bottom line - Data warehouse platforms allow business leaders to quickly access their organization's historical activities and evaluate initiatives that have been successful — or unsuccessful — in the past. This allows executives to see where they can adjust their strategy to decrease costs, maximize efficiency and increase sales to improve their bottom line.
1. (f) What is granularity of data ? How does granularity of data items affect the performance of concurrency control ?
Answer : - Granularity could be defined as any entity whose data fields could be sub divided. For example, when you take an entity called as a person. A person’s identity could be further divided into following :
Name, Address, Gender, City, State, Country etc.
This means that a person as an entity has high granularity. Since there are many sub divisions of the same entity.
But if we were to choose gender - this could simply be maximum 3 values - Male, Female and Transgender. This means that Gender as a field/attribute has low granularity.
All concurrency control techniques assumed that the database was formed of a number of named data items. A database item could be chosen to be one of the following :
The granularity can affect the performance of concurrency control and recovery.
Granularity Level Considerations for Locking
The size of data items is often called the data item granularity. Fine granularity refers to small item size, whereas coarse granularity refers to large item size.
With larger data item size, the degree of concurrency permitted lowers. For example, if the data item size is a disk block, a transaction T that needs to lock a record B must lock the whole disk block X that contains B because a lock is associated with the whole data item (block). Now, if another transaction S wants to lock a different record C that happens to reside in the same block X in a conflicting lock mode, it is forced to wait. If the data item size was a single record, transaction S would be able to proceed, because it would be locking a different data item (record).
On the other hand, the smaller the data item size is, the more the number of items in the database. Because every item is associated with a lock, the system will have a larger number of active locks to be handled by the lock manager. More lock and unlock operations will be performed, causing a higher overhead. In addition, more storage space will be required for the lock table. For timestamps, storage is required for the read_TS and write_TS for each data item, and there will be similar overhead for handling a large number of items.
The data item size depends on the types of transactions involved. If a typical transaction accesses a small number of records, it is advantageous to have the data item granularity be one record. On the other hand, if a transaction typically accesses many records in the same file, it may be better to have block or file granularity so that the transaction will consider all those records as one (or a few) data items.
1. (g) Differentiate between Centralized DBMS and Distributed DBMS.
Answer : -
|A type of database that contains a single database located at one location in the network||A type of database that contains two or more database files located at different locations in the network|
|Managing, updating and taking in backups of data is easier because there is only one database file||As there are multiple database files in a distributed database, it requires time to synchronize data|
|Requires time for accessing data because multiple users access the database file||Speed in accessing the data is higher because the data is retrieved from the nearest database file|
|If the database fails, the user do not have access to a database||If one database fails, the users can still access other database files|
1. (h) What is Timestamp Ordering ? Explain timestamp based protocol for serializable schedule.
Answer : - In a multiprogramming environment where multiple transactions can be executed simultaneously, it is highly important to control the concurrency of transactions. We have concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent transactions. Concurrency control protocols can be broadly divided into two categories −
The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the time of execution, whereas timestamp-based protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age of the transaction. A transaction created at 0002 clock time would be older than all other transactions that come after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and the priority would be given to the older one.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on a data.
The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and write operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed according to the timestamp values of the transactions.
Timestamp ordering protocol works as follows −
If a transaction Ti issues a read(X) operation −
If a transaction Ti issues a write(X) operation −
This algorithm uses three rules to enforce Serializability
Rule 1 - Given two transactions T1 and T2, if T1 is reading the data item which T2 is writing, then T1’s execution phase cannot overlap with T2’s commit phase. T2 can commit only after T1 has finished execution.
Rule 2 - Given two transactions T1 and T2, if T1 is writing the data item that T2 is reading, then T1’s commit phase cannot overlap with T2’s execution phase. T2 can start executing only after T1 has already committed.
Rule 3 - Given two transactions T1 and T2, if T1 is writing the data item which T2 is also writing, then T1’s commit phase cannot overlap with T2’s commit phase. Tj can start to commit only after T1 has already committed.
2. (a) What is Data Mining ? How does Data Mining differ from OLTP ? Discuss Classification as a technique for data mining
Answer : - There is a huge amount of data available in the Information Industry. This data is of no use until it is converted into useful information. It is necessary to analyze this huge amount of data and extract useful information from it.
Data mining is a process used by companies to turn raw data into useful information.
Data Mining Process
The data mining process breaks down into five steps. First, organizations collect data and load it into their data warehouses. Next, they store and manage the data, either on in-house servers or the cloud. Business analysts, management teams and information technology professionals access the data and determine how they want to organize it. Then, application software sorts the data based on the user's results, and finally, the end user presents the data in an easy-to-share format, such as a graph or table.
Online Transaction Processing (OLTP)
OLTP is an Online Transaction Processing system. The main focus of OLTP system is to record the current Update, Insertion and Deletion while transaction. The OLTP queries are simpler and short and hence require less time in processing, and also requires less space.
OLTP database gets updated frequently. It may happen that a transaction in OLTP fails in middle, which may effect data integrity. So, it has to take special care of data integrity. OLTP database has normalized tables (3NF).
The best example for OLTP system is an ATM, in which using short transactions we modify the status of our account. OLTP system becomes the source of data for OLAP.
Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks.
A classification task begins with a data set in which the class assignments are known. For example, a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time. In addition to the historical credit rating, the data might track employment history, home ownership or rental, years of residence, number and type of investments, and so on. Credit rating would be the target, the other attributes would be the predictors, and the data for each customer would constitute a case.
2. (b) What are Data Marts ? Briefly discuss the significance of data marts.
Answer : - A data warehouse (DWH) is a system used to store information for use in data analysis and reporting. Data marts are areas of a data warehouses used to store information needed by a single department or even by an individual user. (Think of the DWH as a building, and data marts as offices inside the building.)
Significance of Data Marts
2. (c) What is XML ? How does XML differ from HTML ? What are the advantages of XML ? Create an XML schema for a list of students and their marks.
Answer : -
3. (a) Differentiate between Two-phase commit protocol and Three-phase commit protocol in distributed databases. "The three-phase commit protocol increases the system's availability and does not allow transactions to remain blocked until a failure is repaired." Justify the statement
Answer : - When a transaction happens in distributed database it must satisfy the ACID properties but in distributed environment a transaction may decide to commit at some sites and abort at other sites resulting in violation of atomicity property [survey of commit] to overcome this problem commit protocols are used.
Distributed database systems use distributed commit protocols to ensure the atomicity across the sites, when a transaction execute at multiple sites it must either be committed at all the sites or aborted at all the sites instead a transaction committed at one and aborted at another.
In distributed database system a transaction may access data from several locations (sites), each site has a local transaction manager (TM) responsible for maintaining a log for recovery purpose and a transaction coordinator (TC) used for execution of transaction originates at the site, distributing the transaction at appropriate sites and coordinating the execution of transaction at multiple sites.
Two–Phase Commit (2PC) Protocol
Coordinator sends a Prepare message along with the transaction to all participants and asks each one of them to cast their vote for commit or abort.
If participant can commit the transaction Vote-Commit is send to the coordinator and if participant cannot commit Vote-Abort is send to the coordinator.
Decision for commit or abort is taken by the coordinator in this phase. If Vote-Commit is received from all the participants then Global-Commit is send to all the participants and if at least one Vote-Abort is received then coordinator send Global-Abort to all those voted for commit.
Coordinator ask for acknowledgement (ACK) from participants .If a participant receives Global-Commit, it commit the transaction and ACK is send to the coordinator .In case participant receives Global-Abort it abort the transaction.
Three-Phase Commit (3PC) Protocol
Coordinator sends a Prepare message along with the transaction to all participants and asks each one of them to cast their vote for commit or abort
If participant can commit the transaction Vote-Commit is send to the coordinator and if participant can not commit Vote-Abort is send to the coordinator.
Coordinator waits for replies from all participants, if all participants have send Vote-Commit then coordinator sends Prepare-to-Commit to all. If any of participant has send Vote-Abort, coordinator sends Global-Abort to all those who have sent Vote-Commit.
Participants wait for reply from coordinator, if Prepare-to-Commit is received participants enter into Pre-Commit state and send Ready-to-Commit to coordinator.
Coordinator waits for replies from participants, if all participants send Ready-to-Commit coordinator sends Global-Commit to all participants. If coordinator does not receives Ready-to-Commit message from a participant within pre-defined time , it concludes that participant has voted for commit earlier but it unavailable due to some reason so coordinator sends Global-Commit to all the remaining participants.
On receiving Global-Commit participants commit the transaction and send acknowledgment to coordinator.
3. (b) What are Seinantic Databases ? List the features of semantic databases. Explain the process of searching the knowledge in semantic databases.
Answer : -
3. (c) What is a Data Dictionary ? List some features of a data dictionary. What are the various approaches to implement a Distributed Database Catalogue ?
Answer : - Data Dictionary is defined as the set of information which is responsible for describing the contents, format, and structure of a database and the relationship between its elements. The data dictionary are basically used to control access to and manipulation of the database.
A data dictionary contains :
Various Approaches to Implement a Distributed Database Catalogue
Distributed Nature of Organizational Units − Most organizations in the current times are subdivided into multiple units that are physically distributed over the globe. Each unit requires its own set of local data. Thus, the overall database of the organization becomes distributed.
Need for Sharing of Data − The multiple organizational units often need to communicate with each other and share their data and resources. This demands common databases or replicated databases that should be used in a synchronized manner.
Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) work upon diversified systems which may have common data. Distributed database systems aid both these processing by providing synchronized data.
Database Recovery − One of the common techniques used in DDBMS is replication of data across different sites. Replication of data automatically helps in data recovery if database in any site is damaged. Users can access data from other sites while the damaged site is being reconstructed. Thus, database failure may become almost inconspicuous to users.
Support for Multiple Application Software − Most organizations use a variety of application software each with its specific database support. DDBMS provides a uniform functionality for using the same data among different platforms.
4. (a) What is Shadow Paging ? Illustrate with an example. Give the advantages and disadvantages of shadow paging.
Answer : -
Shadow paging is an alternative to transaction-log based recovery techniques.
Here, the database considered as made up of fixed size disk blocks, called pages. These pages mapped to physical storage using a table, called page table.
The page table indexed by a page number of the database. The information about physical pages, in which database pages are stored, is kept in this page table.
This technique is similar to paging technique used by Operating Systems to allocate memory, particularly to manage virtual memory.
Execution of Transaction
During the execution of the transaction, two-page tables maintained.
Current Page Table - Used to access data items during transaction execution.
Shadow Page Table - Original page table, and not get modified during transaction execution.
At the start of the transaction, both tables same and point· to same pages.
The shadow page table never changed and used to restore the database in case of any failure occurs. However, current page table entries may change during transaction execution, as it used to record all updates made to the database.
When the transaction completes, the current page table becomes shadow page table. At this time, it considered that the transaction has committed.
As shown in this figure, two pages – page 2 & 5 – affected by a transaction and copied to new physical pages. The current page table points to these pages.
The shadow page table continues to point to old pages which are not changed by the transaction. So, this table and pages used for undoing the transaction.
4. (b) Define Multi-valued Dependency. What is Trivial Multi-valued Dependency ? State the fourth normal form.
Answer : -
4. (c) Explain any one clustering technique for data mining.
Answer : -
4. (d) What is Query Optimization ? Briefly discuss the techniques of Query Optimization with suitable examples.
Answer : - The query optimizer is built-in database software that determines the most efficient method for a SQL statement to access requested data.
The goal of query optimization is to reduce the system resources required to fulfill a query, and ultimately provide the user with the correct result set faster.
Considered the following tables :
Student (enrolment, s_name, s_age, p_code)
Programme (p_code, p_name,)
Suppose we have a query to retrieve the students with age 25 and studying in programme MCA. We can write this query in two different ways.
FROM Student AS st, Programme AS prog
WHERE st.p_code=prog.p_code AND st.s_age=25 AND prog.p_code='MCA';
FROM (SELECT * FROM Student WHERE s_age=25) AS st, (SELECT * FROM Programme WHERE p_code='MCA') AS prog
Here both the queries will return same result. But when we observe them closely we can see that first query will join the two tables first and then applies the filters. That means, it traverses whole table to join, hence the number of records involved is more. But the second query, applies the filters on each table first. This reduces the number of records on each table, hence the cost in this case is comparatively less.
5. (a) Explain the following :
Answer : -
Dynamic SQL is a programming methodology for generating and running statements at run-time. It is mainly used to write the general-purpose and flexible programs where the SQL statements will be created and executed at run-time based on the requirement.
5. (b) What are Mobile Databases ? List the characteristics and challenges of mobile databases.
Answer : - Mobile computing devices (example - smartphones and PDAs) store and share data over a mobile network, or a database which is actually stored by the mobile device. This could be a list of contacts, price information, distance travelled, or any other information.
Many applications require the ability to download information from an information repository and operate on this information even when out of range or disconnected. An example of this is your contacts and calendar on the phone. In this scenario, a user would require access to update information from files in the home directories on a server or customer records from a database. This type of access and work load generated by such users is different from the traditional workloads seen in client–server systems of today.
Mobile users must be able to work without a network connection due to poor or even non-existent connections. A cache could be maintained to hold recently accessed data and transactions so that they are not lost due to connection failure. Users might not require access to truly live data, only recently modified data, and uploading of changing might be deferred until reconnected.
Challenges of Mobile Database System
Limited Resources - The CPU power and storage of mobile devices is continuously increasing. However, they are far behind non-mobile systems such as servers on the Internet. Due to the size of the database, limited CPU power and storage capacity, mobile device need to perform simple operations on local data available in cache. Limited storage capacity also makes it difficult to cache entire databases to a mobile device.
Power Consumption - The most prominent limitation of mobile device is power. These devices rely entirely on battery power. Combined with the compact size of many mobile devices, this often means unusually expensive batteries must be used to obtain the necessary battery life.
Disconnection - Weather, terrain, and the range from the nearest signal point can all interfere with signal reception. Reception in tunnels, some buildings, and rural areas is still poor. Interaction between a mobile device and a database is directly affected by the device’s network connectivity. The two solutions approach to this disconnection challenges are : 1) Prevent disconnections, 2) Cope with disconnections. For mobile computers, allowing disconnections to happen and recovering from them is the better solution for asynchronous operation caching and reconciliation.
Insufficient Bandwidth - Mobile access is generally slower than direct cable connections. Using technologies such as GPRS and EDGE, and more recently 3G networks, bandwidth has been increased but still less compared to the wired network. Asymmetry problem is faced when bandwidth in the downstream direction is often much greater than bandwidth in the upstream direction.
Limited Storage - Due to mobility and portability, the sizes of memory and hard drive are smaller than the ones in the wired network. The consequences of this are less stored/cached/replicated data, fewer installed applications, and more communication.
5. (c) Explain Join Dependency with the help of an example. To which normal form does it correspond ? "Functional Dependencies and Multivalued Dependencies are special types of Join Dependencies." Justify the statement.
Answer : -
IGNOU Assignment Solution
IGNOU Question Paper Solution
WHAT WE DO
Provide BCA, MCA Projects
Provide Assignment & Question Paper Solution