SQL & PL/SQL: GROUPING DATA

Tuesday, 28 February 2017

GROUPING DATA

* GROUP BY clause

* Group functions

* HAVING clause

* ROLLUP and CUBE

In this chapter, we will see how to group data to get summary information. We have so far seen WHERE, ORDER BY clauses of SELECT command. In this chapter, we will see GROUP BY and HAVING clauses. We will also see how to use group functions and two new function introduced in Oracle8i - ROLLUP and CUBE.

GROUP BY Clause

Some times, we have to group the rows of the table by a particular column and perform certain operations on each group. For example, you may want to display the number of students in each batch. For this we have to group rows of STUDENTS table based on the column BCODE. This will create groups – one for each unique batch code. Then using group function COUNT we can count the number of rows (equivalent to number of students) in each group.

Now let us see how the above-mentioned task can be accomplished. The following SELECT command groups rows of STUDENTS table on BCODE using GROUP BY clause. Then it counts number of students we have in each batch using group function COUNT.

select bcode, count(rollno)

from students

group by bcode;

BCODE COUNT(ROLLNO)

----- -------------

b1 2

b2 3

b3 2

b4 1

b5 3

We will see one more example. In this we will display the total amount of fee paid by each student. For this we use PAYMENTS table and group function SUM. Here is the required query.

select rollno, sum(amount) from payments group by rollno;

ROLLNO SUM(AMOUNT)

--------- -----------

1 4500

2 4500

3 5000

4 5000

5 5000

6 3500

7 3500

8 4000

9 3000

10 6500

11 4500

While you are grouping the data the following points are to be taken in to account.

q Find out the table from where the data is to be taken.

q Find out the column or columns on which the data is to be grouped.

q Find out the group function that is to be used to get required aggregate.

GROUP BY clause is used to specify the column(s) on which the rows of the table are to be grouped. It means it divides the rows into different groups based on the column(s) that are given in GROUP BY clause. For example, in the above example, the rows of STUDENTS table are to be grouped based on the value of column ROLLNO. So all rows that contain the same roll number will be taken as one group. Like this the number of groups will be equivalent to the number of unique roll numbers in the table.

2. Points to remember

It is important to remember that only the following can be selected in SELECT while you are grouping the data.

q The columns that are given in GROUP BY clause can be selected.

q Group function

Group Functions

Group functions are the functions that are used to perform operations on groups. Table 1 lists all the available group functions. The general syntax of group functions is given below.

3. group-function (DISTINCT column)

If DISTINCT option is used then only distinct (unique) values are taken into account. The following query is used to count the number of students who have paid some amount.

select count(rollno) from payments;

COUNT(ROLLNO)

-------------

But the count includes the duplicates. That means, if the same student has made the payment twice then his number will be counted for twice. But what if I want to count how many students have made some payment without taking duplicates?. The answer is DISTINCT clause in the group function as shown in the next query.

select count(DISTINCT rollno) from payments;

COUNT(DISTINCTROLLNO)

---------------------

The DISTINCT clause is making COUNT function count only distinct value and not all values.

4. Note: COUNT(*) counts the number of rows in each group. Asterisk (*) refers to the entire row. You can use any column name instead to count the number of not null values in that column.

Group Function	Description
SUM	Returns the sum of the given column.
AVG	Returns the average of the given column.
MIN	Returns the minimum value in the given column.
MAX	Returns the maximum value in the given column.
STDDEV	Returns the standard deviation of the values in the given column.
VAR	Returns the variance of the values in the given column..
COUNT	Returns the number of not null values in the given column. Returns number of rows, if * is given instead of column.

Table 1: GROUP functions.

The following query displays the subject and date on which the most recent batch of that subject has started.

select ccode,max(stdate)

from batches

group by ccode;

CCODE MAX(STDAT

----- ---------

asp 15-JAN-01

c 20-JAN-01

java 05-APR-01

ora 15-AUG-01

vbnet 12-JUL-01

xml 02-MAR-01

It is also possible to use group functions without using GROUP BY clause. In this case the entire selected rows will be taken as a single group and group function performs the operation on the entire set of selected rows.

The following query will display the highest course fee.

select max(fee) from courses;

MAX(FEE)

---------

5500

5. GROUP BY and WHERE clause

If WHERE clause is used along with GROUP BY then WHERE clause is executed first. The rows that are selected by WHERE clause will be grouped by GROUP BY clause. The following query shows the number of batches that started for each subject in the last 3 months.

select ccode, count(*) count

from batches

where months_between(sysdate,stdate) <= 3

group by ccode;

CCODE COUNT

----- ---------

ora 1

vbnet 1

In the above query, first rows that satisfy the given condition are retrieved. Then these rows are passed to GROUP BY clause for grouping.

6. Note: GROUP BY clause should follow WHERE clause, if both are used together.

WHERE clause is used to select the rows that are to be used for grouping.

7. Grouping on more than one column

It is possible to group rows on more than one column. In this case, the first columns is used to group the data, and then within that group records are again grouped based on second column, and so on.

The following query displays the date on which the last batch was taken by each faculty for each subject.

select fcode, ccode, max(stdate)

from batches

group by fcode,ccode;

FCODE CCODE MAX(STDAT

----- ----- ---------

da asp 15-JAN-01

da vbnet 12-JUL-01

dh xml 02-MAR-01

gk ora 12-JAN-01

hs c 20-JAN-01

hs java 05-APR-01

jc ora 15-AUG-01

HAVING Clause

What if you have to select only a few groups based on the condition that is using the result of one of the group functions?. For example, what if you have to display the batches that have more than 10 students. HAVING clause is used for this purpose. It is used to filter groups based on the given condition. The following example shows the courses that have got more than 1 batch.

select ccode

from batches

group by ccode

having count(*) > 1;

CCODE

-----

ora

In the above example, first rows of BATCHES table are grouped on CCODE. Then the query selects the groups that contain more than 1 row. The later is achieved using HAVING clause, which is specially meant to select groups based on the given condition.

Here is a list of important points that are to be noted:

q WHERE clause can be used to check for conditions based on values of columns and expressions related to individual rows. It cannot be used with conditions related to groups.

q HAVING clause is specially designed to evaluate the conditions that are based on group functions such as SUM, and COUNT.

q HAVING clause cannot be used for conditions that are not related to groups.

The following few example will illustrate the above-mentioned important points.

SQL> select ccode

2 from batches

3 where count(*) > 1

4 group by ccode;

where count(*) > 1

ERROR at line 3:

ORA-00934: group function is not allowed here

The above example returns error because WHERE clause cannot be used with group functions.

SQL> select ccode, count(*)

2 from batches

3 group by ccode

4 having enddate is null;

having enddate is null

ERROR at line 4:

ORA-00979: not a GROUP BY expression

In the above example as we tried to use normal condition using HAVING clause Oracle returned an error saying that the given expression is not acceptable.

The following query is valid as HAVING clause is used with a condition that is related to group by expression – CCODE.

SQL> select ccode, count(*)

2 from batches

3 group by ccode

4 having length(ccode) > 3;

CCODE COUNT(*)

----- ---------

java 1

vbnet 1

8. Using WHERE and HAVING together

It is possible to use both WHERE and HAVING clauses together. When these two clauses are used together, first WHERE clause is to be given then HAVING clause.

Oracle first selects rows based on WHERE clause. After that it groups the selected data. Then groups are selected based on HAVING clause.

The following query displays the courses that we started for more than once in the last six months.

select ccode, count(*)

from batches

where months_between(sysdate,stdate) <= 6

group by ccode

having count(*) > 1;

CCODE COUNT(*)

----- ---------

ora 2

9. ORDER BY with GROUP BY

ORDER BY clause can be used along with GROUP BY clause to order the results. When used ORDER BY clause must be the last clause to be used in the query.

select fcode, count(*)

from course_faculty

group by fcode

order by count(*);

FCODE COUNT(*)

----- ---------

dh 1

gk 1

kl 1

jc 1

sw 1

da 2

hs 2

jj 3

10. Note: When ORDER BY is used with GROUP BY clause, ORDER BY should have either the group by column or group function.

11. Order of execution

Here is the order Oracle uses to execute different clauses given in SELECT command.

q Selects rows based on WHERE clause.

q Groups rows based on GROUP BY clause.

q Calculates results for each group.

q Eliminates groups based on HAVING clause.

q Then uses ORDER BY to order the results.

ROLLUP and CUBE

Oracle8i enhanced grouping by adding ROLLUP and CUBE. These two provide totals at multiple dimensions.

The following is the syntax of ROLLUP.

12. SELECT ...

13. GROUP BY ROLLUP (columns);

ROLLUP creates sub totals from most detailed to grand total. It moves from right to left in the list of columns given in ROLLUP.

Before we use ROLLUP, let us see the result of simple group by clause using BATCHES table.

select ccode,fcode, count(*)

from batches

group by ccode,fcode;

CCODE FCODE COUNT(*)

----- ----- ---------

asp da 1

c hs 1

c kl 1

java hs 1

ora gk 2

ora kl 1

vbnet da 2

xml dh 1

In case of normal GROUP BY clause we get count of each course and faculty. But what if you want to get total number of batches for each course and also the total number of batches.

The following ROLLUP creates the required aggregates.

select ccode,fcode, count(*)

from batches

group by rollup(ccode,fcode);

CCODE FCODE COUNT(*)

----- ----- ---------

asp da 1

asp 1

c hs 1

c kl 1

c 2

java hs 1

java 1

ora gk 2

ora kl 1

ora 3

vbnet da 2

vbnet 2

xml dh 1

xml 1

Apart from rows that come from GROUP BY, ROLLUP generates new rows that display the number of batches for each course and also total number of batches.

In case of total number of batches of a single course the CCODE is contains course code and FCODE contains null. In case of total number of batches both CCODE and FCODE contain null value.

The following query will display a meaningful value for these columns.

select nvl(ccode,'ALL courses'), nvl(fcode,'All faculty'), count(*)

from batches

group by rollup(ccode,fcode);

NVL(CCODE,' NVL(FCODE,' COUNT(*)

----------- ----------- ---------

asp da 1

asp All faculty 1

c hs 1

c kl 1

c All faculty 2

java hs 1

java All faculty 1

ora gk 2

ora kl 1

ora All faculty 3

vbnet da 2

vbnet All faculty 2

xml dh 1

xml All faculty 1

ALL courses All faculty 10

14. GROUPING function

This will return a value of 1 if the column’s value is generated by ROLLUP. So we can use DECODE and GROUPING functions to get the same result as the above.

select decode(grouping(ccode),1,'ALL courses',ccode) ccode,

decode(grouping(fcode),1,'All faculty',fcode) fcode,

count(*) count

from batches

group by rollup(ccode,fcode);

CCODE FCODE COUNT

----------- ----------- ---------

asp da 1

asp All faculty 1

c hs 1

c kl 1

c All faculty 2

java hs 1

java All faculty 1

ora gk 2

ora kl 1

ora All faculty 3

vbnet da 2

vbnet All faculty 2

xml dh 1

xml All faculty 1

ALL courses All faculty 10

CUBE

This generates the same subtotals as ROLLUP and plus a few more. This provides all possible subtotals. For example in the previous output of ROLLUP we got the number of batches taken by each faculty for each course, number of batches for each course and total number of bathes.

CUBE apart from generating all these can also generate subtotals for each faculty. That means it provides number of batches taken by each faculty also.

select decode(grouping(ccode),1,'ALL courses',ccode) ccode,

decode(grouping(fcode),1,'All faculty',fcode) fcode,

count(*) count

from batches

group by cube(ccode,fcode);

CCODE FCODE COUNT

----------- ----------- ---------

asp da 1

asp All faculty 1

c hs 1

c kl 1

c All faculty 2

java hs 1

java All faculty 1

ora gk 2

ora kl 1

ora All faculty 3

vbnet da 2

vbnet All faculty 2

xml dh 1

xml All faculty 1

ALL courses da 3

ALL courses dh 1

ALL courses gk 2

ALL courses hs 2

ALL courses kl 2

ALL courses All faculty 10

CUBE adds five more rows to the output of ROLLUP. Each of these new rows is to display the total number of batches taken by each faculty for all courses.

Summary

GROUP BY clause is used to group the rows of the table based on the given columns. Group functions can be used to calculate aggregates like average of each group. HAVING clause is used to filter groups based on the result of group function. Oracle executes first WHERE then GROUP BY then HAVING and finally ORDER BY.

No comments:

Tuesday, 28 February 2017

GROUPING DATA

* GROUP BY clause

* Group functions

* HAVING clause

* ROLLUP and CUBE

GROUP BY Clause

select bcode, count(rollno)

from students

group by bcode;

BCODE COUNT(ROLLNO)

----- -------------

b1 2

b2 3

b3 2

b4 1

b5 3

We will see one more example. In this we will display the total amount of fee paid by each student. For this we use PAYMENTS table and group function SUM. Here is the required query.

select rollno, sum(amount) from payments group by rollno;

ROLLNO SUM(AMOUNT)

--------- -----------

1 4500

2 4500

3 5000

4 5000

5 5000

6 3500

7 3500

8 4000

9 3000

10 6500

11 4500

While you are grouping the data the following points are to be taken in to account.

q Find out the table from where the data is to be taken.

q Find out the column or columns on which the data is to be grouped.

q Find out the group function that is to be used to get required aggregate.

2. Points to remember

It is important to remember that only the following can be selected in SELECT while you are grouping the data.

q The columns that are given in GROUP BY clause can be selected.

q Group function

Group Functions

Group functions are the functions that are used to perform operations on groups. Table 1 lists all the available group functions. The general syntax of group functions is given below.

3. group-function (DISTINCT column)

If DISTINCT option is used then only distinct (unique) values are taken into account. The following query is used to count the number of students who have paid some amount.

select count(rollno) from payments;

COUNT(ROLLNO)

-------------

select count(DISTINCT rollno) from payments;

COUNT(DISTINCTROLLNO)

---------------------

The DISTINCT clause is making COUNT function count only distinct value and not all values.

4. Note: COUNT(*) counts the number of rows in each group. Asterisk (*) refers to the entire row. You can use any column name instead to count the number of not null values in that column.

Group Function	Description
SUM	Returns the sum of the given column.
AVG	Returns the average of the given column.
MIN	Returns the minimum value in the given column.
MAX	Returns the maximum value in the given column.
STDDEV	Returns the standard deviation of the values in the given column.
VAR	Returns the variance of the values in the given column..
COUNT	Returns the number of not null values in the given column. Returns number of rows, if * is given instead of column.

Table 1: GROUP functions.

The following query displays the subject and date on which the most recent batch of that subject has started.

select ccode,max(stdate)

from batches

group by ccode;

CCODE MAX(STDAT

----- ---------

asp 15-JAN-01

c 20-JAN-01

java 05-APR-01

ora 15-AUG-01

vbnet 12-JUL-01

xml 02-MAR-01

The following query will display the highest course fee.

select max(fee) from courses;

MAX(FEE)

---------

5500

5. GROUP BY and WHERE clause

select ccode, count(*) count

from batches

where months_between(sysdate,stdate) <= 3

group by ccode;

CCODE COUNT

----- ---------

ora 1

vbnet 1

In the above query, first rows that satisfy the given condition are retrieved. Then these rows are passed to GROUP BY clause for grouping.

6. Note: GROUP BY clause should follow WHERE clause, if both are used together.

WHERE clause is used to select the rows that are to be used for grouping.

7. Grouping on more than one column

The following query displays the date on which the last batch was taken by each faculty for each subject.

select fcode, ccode, max(stdate)

from batches

group by fcode,ccode;

FCODE CCODE MAX(STDAT

----- ----- ---------

da asp 15-JAN-01

da vbnet 12-JUL-01

dh xml 02-MAR-01

gk ora 12-JAN-01

hs c 20-JAN-01

hs java 05-APR-01

jc ora 15-AUG-01

HAVING Clause

select ccode

from batches

group by ccode

having count(*) > 1;

CCODE

-----

ora

Here is a list of important points that are to be noted:

q WHERE clause can be used to check for conditions based on values of columns and expressions related to individual rows. It cannot be used with conditions related to groups.

q HAVING clause is specially designed to evaluate the conditions that are based on group functions such as SUM, and COUNT.

q HAVING clause cannot be used for conditions that are not related to groups.

The following few example will illustrate the above-mentioned important points.

SQL> select ccode

2 from batches

3 where count(*) > 1

4 group by ccode;

where count(*) > 1

ERROR at line 3:

ORA-00934: group function is not allowed here

The above example returns error because WHERE clause cannot be used with group functions.

SQL> select ccode, count(*)

2 from batches

3 group by ccode

4 having enddate is null;

having enddate is null

ERROR at line 4:

ORA-00979: not a GROUP BY expression

In the above example as we tried to use normal condition using HAVING clause Oracle returned an error saying that the given expression is not acceptable.

The following query is valid as HAVING clause is used with a condition that is related to group by expression – CCODE.

SQL> select ccode, count(*)

2 from batches

3 group by ccode

4 having length(ccode) > 3;

CCODE COUNT(*)

----- ---------

java 1

vbnet 1

8. Using WHERE and HAVING together

It is possible to use both WHERE and HAVING clauses together. When these two clauses are used together, first WHERE clause is to be given then HAVING clause.

Oracle first selects rows based on WHERE clause. After that it groups the selected data. Then groups are selected based on HAVING clause.

The following query displays the courses that we started for more than once in the last six months.

select ccode, count(*)

from batches

where months_between(sysdate,stdate) <= 6

group by ccode

having count(*) > 1;

CCODE COUNT(*)

----- ---------

ora 2

9. ORDER BY with GROUP BY

ORDER BY clause can be used along with GROUP BY clause to order the results. When used ORDER BY clause must be the last clause to be used in the query.

select fcode, count(*)

from course_faculty

group by fcode

order by count(*);

FCODE COUNT(*)

----- ---------

dh 1

gk 1

kl 1

jc 1

sw 1

da 2

hs 2

jj 3

10. Note: When ORDER BY is used with GROUP BY clause, ORDER BY should have either the group by column or group function.

11. Order of execution

Here is the order Oracle uses to execute different clauses given in SELECT command.

q Selects rows based on WHERE clause.

q Groups rows based on GROUP BY clause.

q Calculates results for each group.

q Eliminates groups based on HAVING clause.

q Then uses ORDER BY to order the results.

ROLLUP and CUBE

Oracle8i enhanced grouping by adding ROLLUP and CUBE. These two provide totals at multiple dimensions.

The following is the syntax of ROLLUP.

12. SELECT ...

13. GROUP BY ROLLUP (columns);

ROLLUP creates sub totals from most detailed to grand total. It moves from right to left in the list of columns given in ROLLUP.

Before we use ROLLUP, let us see the result of simple group by clause using BATCHES table.

select ccode,fcode, count(*)

from batches

group by ccode,fcode;

CCODE FCODE COUNT(*)

----- ----- ---------

asp da 1

c hs 1

c kl 1

java hs 1

ora gk 2

ora kl 1

vbnet da 2

xml dh 1

In case of normal GROUP BY clause we get count of each course and faculty. But what if you want to get total number of batches for each course and also the total number of batches.

The following ROLLUP creates the required aggregates.

select ccode,fcode, count(*)

from batches

group by rollup(ccode,fcode);

CCODE FCODE COUNT(*)

----- ----- ---------

asp da 1

asp 1

c hs 1

c kl 1

c 2

java hs 1

java 1

ora gk 2

ora kl 1

ora 3

vbnet da 2

vbnet 2

xml dh 1

xml 1

Apart from rows that come from GROUP BY, ROLLUP generates new rows that display the number of batches for each course and also total number of batches.

In case of total number of batches of a single course the CCODE is contains course code and FCODE contains null. In case of total number of batches both CCODE and FCODE contain null value.

The following query will display a meaningful value for these columns.

select nvl(ccode,'ALL courses'), nvl(fcode,'All faculty'), count(*)

from batches

group by rollup(ccode,fcode);

NVL(CCODE,' NVL(FCODE,' COUNT(*)

----------- ----------- ---------

asp da 1

asp All faculty 1

c hs 1

c kl 1

c All faculty 2

java hs 1

java All faculty 1

ora gk 2

ora kl 1

ora All faculty 3

vbnet da 2

vbnet All faculty 2

xml dh 1

xml All faculty 1

ALL courses All faculty 10

14. GROUPING function

This will return a value of 1 if the column’s value is generated by ROLLUP. So we can use DECODE and GROUPING functions to get the same result as the above.

select decode(grouping(ccode),1,'ALL courses',ccode) ccode,

decode(grouping(fcode),1,'All faculty',fcode) fcode,

count(*) count

from batches

group by rollup(ccode,fcode);

CCODE FCODE COUNT

----------- ----------- ---------

asp da 1

asp All faculty 1

c hs 1

c kl 1

c All faculty 2

java hs 1

java All faculty 1

ora gk 2

ora kl 1

ora All faculty 3

vbnet da 2

vbnet All faculty 2

xml dh 1

xml All faculty 1

ALL courses All faculty 10

CUBE

CUBE apart from generating all these can also generate subtotals for each faculty. That means it provides number of batches taken by each faculty also.

select decode(grouping(ccode),1,'ALL courses',ccode) ccode,

decode(grouping(fcode),1,'All faculty',fcode) fcode,

count(*) count

from batches

group by cube(ccode,fcode);

CCODE FCODE COUNT

----------- ----------- ---------

asp da 1

asp All faculty 1

c hs 1

c kl 1

c All faculty 2

java hs 1

java All faculty 1

ora gk 2

ora kl 1

ora All faculty 3

vbnet da 2

vbnet All faculty 2

xml dh 1

xml All faculty 1

ALL courses da 3

ALL courses dh 1

ALL courses gk 2

ALL courses hs 2

ALL courses kl 2

ALL courses All faculty 10

CUBE adds five more rows to the output of ROLLUP. Each of these new rows is to display the total number of batches taken by each faculty for all courses.

Summary

SQL & PL/SQL

Tuesday, 28 February 2017

GROUPING DATA

No comments:

Post a Comment

SQL Interview Questions and Answers

GROUPING DATA

No comments:

Post a Comment