## Pages

### SAMPLE Function

The teradata SAMPLE function is used to generate samples of data from a table or view.
It can be done in two ways.
• SAMPLE n - Will results a sample of n rows.
• If the number n is greater than the number of rows in the table, the sample will consist of the number of rows in the table.
SAMPLE n - where n is a decimal value less than 1.00 and greater than .00

The SAMPLE function allows sampling of data based on:
• A percentage of a table.
• An actual number of rows
Example 1:
SELECT Employee_number
FROM Employee
SAMPLE 10;

Result
Employee_number
---------------
1003
1023
1004
1015
801
1065
1022
1002
1018
1005

Example 2:
SELECT Employee_number
FROM Employee
SAMPLE  .25
ORDER BY 1;

In the above example, 25% of the rows of the employee table are to be returned. The employee table has 26 rows.
26 * .25 = 6.50 = 7 rows in the sample

Result:

Employee_number
---------------
1003
1023
1001
1054
1004
1015
1012

Note:
• 7 rows out of 26 are returned.
• Fractional results greater than .4999 generate an added row.
• 25% of 26 = 6.5 which rounds to 7.

Using SAMPLEID

SAMPLEID is an extension for Sample to get multiple set in a single query.
The SAMPLEID may be selected, used for ordering, or used as a column in a new table.

Example 1:
Lets consider department table has 9 rows and get three samples from the department table, one with 25% of the rows, another with 25% and a third with 50%.

SELECT department_number
,sampleid
FROM department
SAMPLE .25, .25, .50
ORDER BY sampleid;

Result:
department_number     SampleId
-----------------         -----------
301                     1
403                     1
402                     2
201                     2
100                     3
501                     3
302                     3
401                     3
600                     3

Note that all 9 of the 9 rows of the department table are returned.
This is due to the individual calculations.
Consider the following calculations

9 *.25 = 2.25 = 2
9 *.25 = 2.25 = 2
9 *.50 = 4.50 = 5
-----
9

Example 2
Get three samples from the department table, one with 27% of the rows, another with 35% and a third with 2%.

SELECT department_number
,SAMPLEID
FROM department
SAMPLE .27, .35, .02
ORDER BY SAMPLEID;

Result
department_number     SampleId
-----------------                -----------
402                     1
403                     1
100                     2
302                     2
401                     2

Note:
• The first two samples are able to return rows.
• The last sample is too small to return a row.

Example 3
Get three samples from the department table, one with 3 rows, another with 5 and a third with 8.

SELECT department_number
,sampleid
FROM department
SAMPLE  3, 5, 8
ORDER BY sampleid;

*** Query completed. 9 rows found. 2 columns returned.
*** Warning: 7473 Requested sample is larger than table rows.

Result

department_number     SampleId
-----------------               -----------
501                      1
402                      1
403                      1
100                      2
302                      2
301                      2
401                      2
201                      2
600                      3

Note:
• Because the rows are not repeated to different sample sets, the supply of rows is exhausted before the third set can be completed.
• This results in a warning that there were not enough rows to populate all samples as requested.
• This warning is seen in the BTEQ environment, but not in ODBC.
• The third sample gets the only remaining row.