Social Icons

SAMPLE Function

TeradataWiki-teradata SAMPLE Function
The teradata SAMPLE function is used to generate samples of data from a table or view.
It can be done in two ways.
  • SAMPLE n - Will results a sample of n rows.
  • If the number n is greater than the number of rows in the table, the sample will consist of the number of rows in the table.
SAMPLE n - where n is a decimal value less than 1.00 and greater than .00

The SAMPLE function allows sampling of data based on:
  • A percentage of a table.
  • An actual number of rows
Example 1:
SELECT Employee_number
FROM Employee
SAMPLE 10;

Result
Employee_number
---------------
1003
1023
1004
1015
 801
1065
1022
1002
1018
1005


Example 2:
SELECT Employee_number
FROM Employee
SAMPLE  .25
ORDER BY 1;

In the above example, 25% of the rows of the employee table are to be returned. The employee table has 26 rows.
26 * .25 = 6.50 = 7 rows in the sample

Result:

Employee_number
---------------
1003
1023
1001
1054
1004
1015
1012

Note:
  • 7 rows out of 26 are returned.
  • Fractional results greater than .4999 generate an added row.
  • 25% of 26 = 6.5 which rounds to 7.

Using SAMPLEID

SAMPLEID is an extension for Sample to get multiple set in a single query.
The SAMPLEID may be selected, used for ordering, or used as a column in a new table.

Example 1:
Lets consider department table has 9 rows and get three samples from the department table, one with 25% of the rows, another with 25% and a third with 50%.

SELECT department_number
    ,sampleid
FROM department
SAMPLE .25, .25, .50
ORDER BY sampleid;

Result:
department_number     SampleId
       -----------------         -----------
              301                     1
              403                     1
              402                     2
              201                     2
              100                     3
              501                     3
              302                     3
              401                     3
              600                     3

Note that all 9 of the 9 rows of the department table are returned.
This is due to the individual calculations.
Consider the following calculations

9 *.25 = 2.25 = 2
9 *.25 = 2.25 = 2
9 *.50 = 4.50 = 5
                     -----
                       9

Example 2
Get three samples from the department table, one with 27% of the rows, another with 35% and a third with 2%.

SELECT department_number
     ,SAMPLEID
FROM department
SAMPLE .27, .35, .02
ORDER BY SAMPLEID;

Result
department_number     SampleId
-----------------                -----------
              402                     1
              403                     1
              100                     2
              302                     2
              401                     2

Note:
  • The first two samples are able to return rows.
  • The last sample is too small to return a row.

Example 3
Get three samples from the department table, one with 3 rows, another with 5 and a third with 8.

SELECT department_number
    ,sampleid
FROM department
SAMPLE  3, 5, 8
ORDER BY sampleid;

 *** Query completed. 9 rows found. 2 columns returned.
 *** Warning: 7473 Requested sample is larger than table rows.

Result

department_number     SampleId
-----------------               -----------
              501                      1
              402                      1
              403                      1
              100                      2
              302                      2
              301                      2
              401                      2
              201                      2
              600                      3

Note:
  • Because the rows are not repeated to different sample sets, the supply of rows is exhausted before the third set can be completed.
  • This results in a warning that there were not enough rows to populate all samples as requested.
  • This warning is seen in the BTEQ environment, but not in ODBC.
  • The third sample gets the only remaining row.

No comments:

Post a Comment