Tuesday, 28 February 2017

GROUPING DATA

*  GROUP BY clause
*  Group functions
*  HAVING clause
*  ROLLUP and CUBE

In this chapter, we will see how to group data to get summary information. We have so far seen WHERE, ORDER BY clauses of SELECT command. In this chapter, we will see GROUP BY and HAVING clauses.  We will also see how to use group functions and two new function introduced in Oracle8i  - ROLLUP and CUBE.

GROUP BY Clause
Some times, we have to group the rows of the table by a particular column and perform certain operations on each group. For example, you may want to display the number of students in each batch.  For this we have to group rows of STUDENTS table based on the column BCODE. This will create groups – one for each unique batch code.  Then using group function COUNT we can count the number of rows (equivalent to number of students) in each group.

Now let us see how the above-mentioned task can be accomplished. The following SELECT command groups rows of STUDENTS table on BCODE using GROUP BY clause. Then it counts number of students we have in each batch using group function COUNT.

select  bcode, count(rollno)
from  students
group by bcode;

BCODE COUNT(ROLLNO)
----- -------------
b1                2
b2                3
b3                2
b4                1
b5                3


We will see one more example. In this we will display the total amount of fee paid by each student. For this we use PAYMENTS table and group function SUM. Here is the required query.


select  rollno, sum(amount) from payments group by rollno;

   ROLLNO SUM(AMOUNT)
--------- -----------
        1        4500
        2        4500
        3        5000
        4        5000
        5        5000
        6        3500
        7        3500
        8        4000
        9        3000
       10        6500
       11        4500

While you are grouping the data the following points are to be taken in to account.

q  Find out the table from where the data is to be taken.
q  Find out the column or columns on which the data is to be grouped.
q  Find out the group function that is to be used to get required aggregate.

GROUP BY clause is used to specify the column(s) on which the rows of the table are to be grouped.  It means it divides the rows into different groups based on the column(s) that are given in GROUP BY clause. For example, in the above example, the rows of STUDENTS table are to be grouped based on the value of column ROLLNO. So all rows that contain the same roll number will be taken as one group. Like this the number of groups will be equivalent to the number of unique roll numbers in the table.

2.            Points to remember
It is important to remember that only the following can be selected in SELECT while you are grouping the data.

q  The columns that are given in GROUP BY clause can be selected.
q  Group function


Group Functions
Group functions are the functions that are used to perform operations on groups.  Table 1 lists all the available group functions. The general syntax of group functions is given below.

3.       group-function (DISTINCT  column)

If DISTINCT option is used then only distinct (unique) values are taken into account.  The following query is used to count the number of students who have paid some amount.


select  count(rollno) from payments;

COUNT(ROLLNO)
-------------
           17

But the count includes the duplicates. That means, if the same student has made the payment twice then his number will be counted for twice.  But what if I want to count how many students have made some payment without taking duplicates?. The answer is DISTINCT clause in the group function as shown in the next query.


select  count(DISTINCT rollno) from payments;

COUNT(DISTINCTROLLNO)
---------------------
                   11


The DISTINCT clause is making COUNT function count only distinct value and not all values.


4.            Note: COUNT(*) counts the number of rows  in each  group. Asterisk (*) refers to the entire row. You can use any column name instead to count the number of not null values in that column.



Group Function
Description
SUM
Returns the sum of the given column.
AVG
Returns the average of the given column.
MIN
Returns  the minimum value in the given column.
MAX
Returns  the maximum  value in the given column.
STDDEV
Returns  the standard deviation of the values in the given column.
VAR
Returns  the variance of the values in the given column..
COUNT
Returns the number of not null values in the given column. Returns number of rows, if  * is given instead of column.
Table 1: GROUP functions.

The following query displays the subject and date on which the most recent batch of that subject has started.

select  ccode,max(stdate)
from batches
group by ccode;

CCODE MAX(STDAT
----- ---------
asp   15-JAN-01
c     20-JAN-01
java  05-APR-01
ora   15-AUG-01
vbnet 12-JUL-01
xml   02-MAR-01

It is also possible to use group functions without using GROUP BY clause. In this case the entire selected rows will be taken as a single group and group function performs the operation on the entire set of selected rows.

The following query will display the highest course fee.

select max(fee) from courses;

 MAX(FEE)
---------
            5500

5.            GROUP BY and WHERE clause
If WHERE clause is used along with GROUP BY then WHERE clause is executed first.  The rows that are selected by WHERE clause will be grouped by GROUP BY clause. The following query shows the number of batches that started for each subject in the last 3 months.

select  ccode, count(*) count
from  batches
where   months_between(sysdate,stdate) <= 3
group by ccode;

CCODE     COUNT
----- ---------
ora           1
vbnet         1

In the above query, first rows that satisfy the given condition are retrieved. Then these rows are passed to GROUP BY clause for grouping.

6.            Note: GROUP BY clause should follow WHERE clause, if both are used together.


WHERE clause is used to select the rows that are to be used for grouping.

7.            Grouping on more than one column
It is possible to group rows on more than one column. In this case, the first columns is used to group the data, and then within that group records are again grouped based on second column, and so on.

The following query displays the  date on which the last batch  was taken by each faculty for each subject.

select  fcode, ccode, max(stdate)
from  batches
group by  fcode,ccode;

FCODE CCODE MAX(STDAT
----- ----- ---------
da    asp   15-JAN-01
da    vbnet 12-JUL-01
dh    xml   02-MAR-01
gk    ora   12-JAN-01
hs    c     20-JAN-01
hs    java  05-APR-01
jc    ora   15-AUG-01

HAVING Clause
What if  you have to select  only  a   few groups   based on the condition  that is using the result of one of the  group functions?.  For example, what if  you have to display  the batches that have more than 10 students. HAVING clause is used for this purpose. It is used to filter groups based on the given condition. The following example shows the courses that have got more than 1 batch.


select  ccode
from batches
group by  ccode
having count(*) > 1;

CCODE
-----
ora

In the above example, first rows of BATCHES table are grouped on CCODE. Then the query selects the groups that contain more than 1 row.  The later is achieved using HAVING clause, which is specially meant to select groups based on the given condition.


Here is a list of important points that are to be noted:

q  WHERE clause can be used to check for conditions based on values of columns and expressions related to individual rows. It cannot be used with conditions related to groups.
q  HAVING clause is specially designed to evaluate the conditions that are based on group functions such as SUM, and COUNT.
q  HAVING clause cannot be used for conditions that are not related to groups.

The following few example will illustrate the above-mentioned important points.

SQL> select  ccode
  2  from batches
  3  where count(*) > 1
  4  group by ccode;
where count(*) > 1
      *
ERROR at line 3:
ORA-00934: group function is not allowed here


The above example returns error because WHERE clause cannot be used with group functions.

SQL> select  ccode, count(*)
  2  from batches
  3  group by ccode
  4  having  enddate is null;
having  enddate is null
        *
ERROR at line 4:
ORA-00979: not a GROUP BY expression

In the above example as we tried to use normal condition using HAVING clause Oracle returned an error saying that the given expression is not acceptable.

The following query is valid as HAVING clause is used with a condition that is related to group by expression – CCODE.

SQL> select ccode, count(*)
  2  from batches
  3  group by ccode
  4  having  length(ccode) > 3;

CCODE  COUNT(*)
----- ---------
java          1
vbnet         1


8.            Using WHERE and HAVING together
It is possible to use both WHERE and HAVING clauses together. When these two clauses are used together, first WHERE clause is to be given then HAVING clause.

Oracle first selects rows based on WHERE clause. After that it groups the selected data. Then groups are selected based on HAVING clause.

The following query displays the courses that we started for more than once in the last six months.

select ccode, count(*)
from batches
where  months_between(sysdate,stdate) <= 6
group by ccode
having  count(*) > 1;

CCODE  COUNT(*)
----- ---------
ora           2


9.            ORDER BY with GROUP BY
ORDER BY clause can be used along with GROUP BY clause to order the results. When used ORDER BY clause must be the last clause to be used in the query.

select fcode, count(*)
from course_faculty
group by fcode
order by  count(*);

FCODE  COUNT(*)
----- ---------
dh            1
gk            1
kl            1
jc            1
sw            1
da            2
hs            2
jj            3


10.          Note: When ORDER BY is used with GROUP BY clause, ORDER BY should have either the group by column or group function.

11.         Order of execution
Here is the order Oracle uses to execute different clauses given in SELECT command.

q  Selects rows based on WHERE clause.
q  Groups rows based on GROUP BY clause.
q  Calculates results for each group.
q  Eliminates groups based on HAVING clause.
q  Then uses ORDER BY to order the results.

ROLLUP and CUBE
Oracle8i enhanced grouping by adding ROLLUP and CUBE. These two provide totals at multiple dimensions.

The following is the syntax of ROLLUP.

12.     SELECT ...
13.     GROUP BY ROLLUP (columns);

ROLLUP creates sub totals from most detailed to grand total.  It moves from right to left in the list of columns given in ROLLUP.

Before we use ROLLUP, let us see the result of simple group by clause using BATCHES table.

select ccode,fcode, count(*)
from  batches
group by ccode,fcode;


CCODE FCODE  COUNT(*)
----- ----- ---------
asp   da            1
c     hs              1
c     kl              1
java  hs            1
ora   gk            2
ora   kl              1
vbnet da            2
xml   dh            1

In case of normal GROUP BY clause we get count of each course and faculty. But what if you want to get total number of batches for each course and also the total number of batches.

The following ROLLUP creates the required aggregates.


select ccode,fcode, count(*)
from  batches
group by rollup(ccode,fcode);

CCODE FCODE  COUNT(*)
----- ----- ---------
asp   da            1
asp                  1
c     hs              1
c     kl              1
c                      2
java  hs             1
java                 1
ora   gk            2
ora   kl             1
ora                  3
vbnet da           2
vbnet               2
xml   dh             1
xml                   1
                       10

Apart from rows that come from GROUP BY, ROLLUP generates new rows that display the number of batches for each course and also total number of batches.

In case of total number of batches of a single course the CCODE is contains course code and FCODE contains null. In case of total number of batches both CCODE and FCODE contain null value.

The following query will display a meaningful value for these columns.

select nvl(ccode,'ALL courses'), nvl(fcode,'All faculty'), count(*)
from  batches
group by rollup(ccode,fcode);

NVL(CCODE,' NVL(FCODE,'  COUNT(*)
----------- ----------- ---------
asp         da                      1
asp         All faculty          1
c           hs                       1
c           kl                        1
c           All faculty            2
java        hs                      1
java        All faculty          1
ora         gk                      2
ora         kl                       1
ora         All faculty           3
vbnet       da                     2
vbnet       All faculty          2
xml         dh                      1
xml         All faculty            1
ALL courses All faculty    10

14.         GROUPING function
This will return a value of 1 if the column’s value is generated by ROLLUP. So we can use DECODE and GROUPING functions to get the same result as the above.

select decode(grouping(ccode),1,'ALL courses',ccode)  ccode,
       decode(grouping(fcode),1,'All faculty',fcode)  fcode,
       count(*) count
from  batches
group by rollup(ccode,fcode);

CCODE       FCODE           COUNT
----------- ----------- ---------
asp         da                    1
asp         All faculty         1
c           hs                       1
c           kl                       1
c           All faculty           2
java        hs                     1
java        All faculty          1
ora         gk                     2
ora         kl                      1
ora         All faculty          3
vbnet       da                     2
vbnet       All faculty         2
xml         dh                      1
xml         All faculty          1
ALL courses All faculty   10

CUBE
This generates the same subtotals as ROLLUP and plus a few more. This provides all possible subtotals.  For example in the previous output of ROLLUP we got the number of batches taken by each faculty for each course, number of batches for each course and total number of bathes.

CUBE apart from generating all these can also generate subtotals for each faculty. That means it provides number of batches taken by each faculty also.

select decode(grouping(ccode),1,'ALL courses',ccode)  ccode,
       decode(grouping(fcode),1,'All faculty',fcode)  fcode,
       count(*) count
from  batches
group by cube(ccode,fcode);

CCODE       FCODE           COUNT
----------- ----------- ---------
asp         da                          1
asp         All faculty              1
c           hs                           1
c           kl                           1
c           All faculty               2
java        hs                          1
java        All faculty              1
ora         gk                         2
ora         kl                          1
ora         All faculty              3
vbnet       da                        2
vbnet       All faculty             2
xml         dh                         1
xml         All faculty              1
ALL courses da                   3
ALL courses dh                   1
ALL courses gk                   2
ALL courses hs                   2
ALL courses kl                    2
ALL courses All faculty       10

CUBE adds five more rows to the output of ROLLUP. Each of these new rows is to display the total number of batches taken by each faculty for all courses.
 
Summary

GROUP BY clause is used to group the rows of the table based on the given columns. Group functions can be used to calculate aggregates like average of each group. HAVING clause is used to filter groups based on the result of group function. Oracle executes first WHERE then GROUP BY then HAVING and finally ORDER BY.

No comments:

Post a Comment

SQL Interview Questions and Answers

There is given sql interview questions and answers that has been asked in many companies. For PL/SQL interview questions, visit our next...