Total Page Preview:   000000001701

Data warehouse Interview Questions Answers for freshers and experienced

1. What is a Fact Table?
  •  The primary table in a dimensional model where the numerical performance measurements (or facts) of the business are stored so they can be summarized to provide information about the history of the operation of an organization.
  • We use the term fact to represent a business measure. The level of granularity defines the grain of the fact table.
2. What is a Dimension Table?
  • Dimension tables are highly denormalized tables that contain the textual descriptions of the business and facts in their fact table.
  • Since it is not uncommon for a dimension table to have 50 to 100 attributes and dimension tables tend to be relatively shallow in terms of the number of rows, they are also called a wide table.
  • A dimension table has to have a surrogate key as its primary key and has to have a business/alternate key to link between the OLTP and OLAP.
3. What are the types of Measures?
  • Additive: measures that can be added across all dimensions (cost, sales).
  • Semi-Additive: measures that can be added across few dimensions and not with others.
  • Non-Additive: measures that cannot be added across all dimensions (stock rates).
4 What is a Star Schema?
  • It is a data warehouse design where all the dimensions tables in the warehouse are directly connected to the fact table.
  • The number of foreign keys in the fact table is equal to the number of dimensions. ? It is a simple design and hence faster query.
5. What is a Snowflake Schema?
  • It is a data warehouse design where at least one or more multiple dimensions are further normalized. ? Number of dimensions > number of fact table foreign keys
  • Normalization reduces redundancy so storage wise it is better but querying can be affected due to the excessive joins that need to be performed.
6. What is granularity?
  • The lowest level of information that is stored in the fact table. ? Usually determined by the time dimension table.
  • The best granularity level would be per transaction but it would require a lot of memory.
7. What is a Surrogate Key?
  • It is a system generated key that is an identity column with the initial value and incremental value and ensures the uniqueness of the data in the dimension table.
  • Every dimension table must have a surrogate key to identify each record!!!
8. What are some advantages of using the Surrogate Key in a Data Warehouse?
1. Using a SK, you can separate the Data Warehouse and the OLTP: to integrate data coming from heterogeneous sources, we need to differentiate between similar business keys from the OLTP. The
keys in OLTP are the alternate key (business key).
2. Performance: The fact table will have a composite key. If surrogate keys are used, then in the fact table, we will have integers for its foreign keys.
This requires less storage than VARCHAR.
The queries will run faster when you join on integers rather than VARCHAR.
The partitioning done on SK will be faster as these are in sequence.
3. Historical Preservation: A data warehouse acts as a repository of historical data so there will be various versions of the same record and in order to differentiate between them, we need a SK then we can keep the history of data.
4. Special Situations (Late Arriving Dimension): Fact table has a record that doesn’t have a match yet in the dimension table. Surrogate key usage enables the use of such a ‘not found’ record as a SK is not dependent on the
ETL process.
9. What is the datatype difference between a fact and dimension tables?
1. Fact Tables
  1. They hold numeric data.
  2. They contain measures.
  3. They are deep.
2. Dimensional Tables
  1. They hold textual data.
  2. They contain attributes of their fact tables.
  3. They are wide.

10. How many types of dimension tables available in data warehouse?

11. What is your strategy for the incremental load?
Answer: The combination of different techniques for the incremental load in my previous projects; time stamps, CDC (Change Data Capture), MERGE statement and CHECKSUM() in TSQL, LEFT OUTER JOIN, TRIGGER, the Lookup Transformation in SSIS.
12. What is CDC? advantages and disadvantages.
CDC (Change Data Capture) is a method to capture data changes, such as INSERT, UPDATE and DELETE, happening in a source table by reading transaction log files. Using CDC in the process of an incremental  load, you are going to be able to store the changes in a SQL table, enabling us to apply the changes to a target table incrementally.
In data warehousing, CDC is used for propagating changes in the source system into your data warehouse, updating dimensions in a data mart, propagating standing data changes into your data warehouse and such.
The advantages of CDC are:
  •  It is almost real time ETL.
  •  It can handle small volume of data.
  •  It can be more efficient than replication.
  •  It can be auditable.
  •  It can be used to configurable clean up.
Disadvantages of CDC are:
  •  Lots of change tables and functions
  •  Bad for big changes e.g. truncate & reload Optimization of CDC:
  •  Stop the capture job during load
  •  When applying changes to target, it is ideal to use merge.
13. What is the difference between a connection and session ?
Connection: It is the number of instance connected to the database. An instance is modelized soon as the application is open again.
Session: A session run queries.In one connection, it allowed multiple sessions for one connection.
14. What are all different types of collation sensitivity?
Answer: Following are different types of collation sensitivity -
  • Case Sensitivity - A and a and B and b.
  • Accent Sensitivity.
  • Kana Sensitivity - Japanese Kana characters.
  • Width Sensitivity - Single byte character and double byte character.
15. What is CLAUSE?
Answer:  SQL clause is defined to limit the result set by providing condition to the query. This usually filters some rows from the whole set of records.
Example - Query that has WHERE condition Query that has HAVING condition.
16. What is Union, minus and Interact commands?
  • UNION operator is used to combine the results of two tables, and it eliminates duplicate rows from the tables.
  • MINUS operator is used to return rows from the first query but not from the second query. Matching records of first and second query and other rows from the first query will be displayed as a result set.
  • INTERSECT operator is used to return rows returned by both the queries.
17. How to select unique records from a table?
Answer:  Select unique records from a table by using DISTINCT keyword.
Select DISTINCT StudentID, StudentName from Student.
18. How to remove duplicate rows from table?
Step 1: Selecting Duplicate rows from table:
Select rollno FROM Student WHERE ROWID <>
(Select max (rowid) from Student b where rollno=b.rollno);
Step 2: Delete duplicate rows:
Delete FROM Student WHERE ROWID <>
(Select max (rowid) from Student b where rollno=b.rollno);
19. How to calculate number of rows in table without using count function?
Answer: Select table_name, num_rows from user_tables where table_name=’Employee’;
Tip: User needs to use the system tables for the same. So using user_tables user will get the number of rows in the table
 20. What is the difference between a connection and session ?
Connection: It is the number of instance connected to the database. An instance is modelized soon as the application is open again.
Session: A session run queries.In one connection, it allowed multiple sessions for one connection.

Thank You

About Author

Brijesh Kumar

Database Developer

I have more then 6 years Experience in Microsoft Technologies - SQL Server Database, ETL Azure Cloud - Azure SQL Database, CosmosDB, Azure Data Factory, PowerBI, Web Job, Azure Function, Azure Storage, Web Apps, Powershall and Database Migration On-Premise to Azure Cloud.
LinkedIn :





Facebook User: