Social Icons

Interview Questions Part-1

What are the different  types of locks in Teradata?
Exclusive Lock:
Exclusive locks are only applied to databases or tables, never to rows.
They are the most restrictive type of lock; all other users are locked out.
Exclusive locks are used rarely, most often when structural changes are being made to the database.

Read Lock:
Read locks are used to ensure consistency during read operations.
Several users may hold concurrent read locks on the same data, during which no modification of the data is permitted.

Write Lock:Write locks enable users to modify data while locking out all other users except readers not concerned about data consistency (Access lock readers).
Until a Write lock is released, no new read or write locks are allowed.

Access Lock:Access locks can be specified by users who are not concerned about data consistency.
The use of an access lock allows for reading data while modifications are in process. Access locks are designed for decision support on large tables that are updated only by small single row changes. Access locks are sometimes called stale read locks, i.e. you may get stale data that hasnot been updated

Lock types are automatically applied based on the SQL command:
      SELECT -->applies a Read lock
      UPDATE --> applies a Write lock
      CREATE TABLE --> applies an Exclusive lock


What is multiple statement processing?
Multiple statement processing increases the performance when loading into large tables. All statements are sent to parser simultaneously and all statements are executed parallel.
In sql Assistance, once you place all queries and press F5. All will be executed parallel. Make sure each sql statement ends semicolon(;).


What is TENACITY and SLEEP ?
TENACITY to specify the number of hours FastLoad will try to establish a connection. The default is no tenacity. The statement must be placed before LOGON.

SLEEP to specify the number of minutes FastLoad waits before retrying a logon.The default is 6 minutes. The statement must be placed before LOGON.


What is soft  Referential Integrity and Batch  Referential Integrity?
Soft  Referential Integrity:
It provides a mechanism to allow user-specified Referential Integrity (RI) constraints that are not
enforced by the database.
Enables optimization techniques such as Join Elimination.

Batch  Referential Integrity:
Tests an entire insert, delete, or update batch operation for referential integrity. If insertion, deletion, or update of any row in the batch violates referential integrity, then parsing engine software rolls back the entire batch and returns an abort message.

What Teradata statistics ?

Statistics collection is essential for the optimal performance of the Teradata query optimizer. 
The query optimizer relies on statistics to help it determine the best way to access data. 
Statistics also help the optimizer ascertain how many rows exist in tables being queried and predict how many rows will qualify for given conditions. 
Lack of statistics, or out-dated statistics, might result in the optimizer choosing a less-than-optimal method for accessing data tables.

1: Once a collect stats is done on the table(on index or column) where is this information stored so that the optimizer can refer this?
Ans: Collected statistics are stored in DBC.TVFields or DBC.Indexes. However, you cannot query these two tables. 

2: How often collect stats has to be made for a table that is frequently updated?
Answer: You need to refresh stats when 5 to 10% of table's rows have changed. Collect stats could be pretty resource consuming for large tables. So it is always advisable to schedule the job at off peak period and normally after approximately 10% of data changes. 

3: Once a collect stats has been done on the table how can i be sure that the optimizer is considering this before execution ? i.e; until the next collect stats has been done will the optimizer refer this?

Ans: Yes, optimizer will use stats data for query execution plan if available. That's why stale stats is dangerous as that may mislead the optimizer. 

What is a HOT AMP?
When the workload is not distributed across all the AMPs, only a few AMPs end up overburdened with the work. This is a hot AMP condition.
This typically occurs when the volume of data you are dealing with is high and 
(a). You are trying to retrieve the data in a TERADATA table which is not well distributed across the AMPs on the system (bad Primary Index) 
(b). When you are trying to join on column with highly non unique values

(c). When you apply the DISTINCT operator on a column with highly non unique values

What are different  SPACES available in Teradata?
 There are mainly 3 types of spaces available.
 Perm Space 
 Temp Space
 Spool space

Perm Space :All databases have a defined upper limit of permanent space. Permanent space is used for storing the data rows of tables. Perm space is not pre-allocated. It        represents a maximum limit. 

Spool Space :All databases also have an upper limit of spool space. If there is no limit defined for a particular database or user, limits are inherited from parents. 
Theoretically, a user could use all unallocated pace in the system for their query. Spool space is temporary space used to hold intermediate query results or formatted answer sets to queries. Once the query is complete, the spool space is released. 
Example: You have a database with total disk space of 100GB. You have 10GB of user data and an additional 10GB of overhead. What is the maximum amount of spool space available for queries? 

Answer: 80GB. All of the remaining space in the system is available for spool

Temp Space :The third type of space is temporary space. Temp space is used for Global and Volatile temporary tables, and these results remain available to the user until the session is terminated. Tables created in temp space will survive a restart. 
How Does Teradata Store Rows?
  • Teradata uses hash partitioning and distribution to randomly and evenly distribute data across all AMPs.
  • The rows of every table are distributed among all AMPs - and ideally will be evenly distributed among all AMPs.
  • Each AMP is responsible for a subset of the rows of each table. 
  • Evenly distributed tables result in evenly distributed workloads. 

Which one will take care when an AMP goes down?

Down amp recovery journal will start when AMP goes down to restore the data for the down amp.

Fall back is like it has redundant data,if one amp goes down in the cluster also it wont affect your queries.the query will use data from fall back rows.the down amp wont be updated use the data from fall back.

Which one will take care when a NODE goes down?
In the event of node failure, all virtual processors can migrate to another available node in the clique. 

All nodes in the clique must have access to the same disk arrays

What is the use of EXPLIN plan?

The EXPLAIN facility allows you to preview how Teradata will execute a requested query. It returns a summary of the steps the Teradata RDBMS would perform to execute the request.
EXPLAIN also discloses the strategy and access method to be used, how many rows will be involved, and its cost in minutes and seconds. Use EXPLAIN to evaluate a query performance
and to develop an alternative processing strategy that may be more efficient. EXPLAIN works on any SQL request. The request is fully parsed and optimized, but not run. The complete plan is returned  to the user in readable English statements. 

EXPLAIN provides information about locking, sorting, row selection criteria, join strategy and conditions, access method, and parallel step processing.  EXPLAIN is useful for performance tuning, debugging, pre-validation of requests, and for technical training.


What COALESCE function?

The newer ANSI standard COALESCE can also convert a NULL to a zero. However, it can convert a NULL value to any data value as well.
The COALESCE searches a value list, ranging from one to many values, and returns the first Non-NULL value it finds.
At the same time, it returns a NULL if all values in the list are NULL.
To use the COALESCE, the SQL must pass the name of a column to the function. The data in the column is then compared for a NULL. Although one column name is all that is required, normally more than one column is   normally passed to it. Additionally, a literal value, which is never NULL, can be returned to provide a default value if   all of the previous column values are NULL

The syntax for the COALESCE follows:

SELECT        COALESCE ( [, ] )
              ,( COALESCE([,] ) )
When will you create PPI and when will you create secondary indexes?
Partitioned Primary Indexes are Created so as to divide the table onto partitions based on Range or Values as Required. This is effective for Larger Tables partitioned on the Date and integer columns. There is no extra Overhead on the System (no Spl Tables Created ect )

Secondary Indexes are created on the table for an alternate way to access data. This is the second fastest method to retrieve data from a table next to the primary index. Sub tables are created. 

PPI and secondary indexes do not perform full table scans but they access only a defined st of data in the AMP's.

When you chose primary index and when will you choose secondary index?
Primary index will be chosen at the time of table creation. This will help us in data distribution, data retrieval and join operations.
Secondary indexes can be created and dropped at any time. They are used as an alternate path to access data other than the primary index.

Related Posts:
Interview Questions Part-2                                                                     Interview Questions Part-3

No comments:

Post a Comment