MSSQL DBA Interview Questions

 MSSQL DBA Interview Questions 


SECTION 1 — SQL SERVER ARCHITECTURE & INTERNALS (30 Questions)

  1. Explain SQL Server’s architecture: SQLOS, storage engine, relational engine, query processor.
  2. What happens internally when SQL Server receives a query?
  3. Describe how worker threads, schedulers, and tasks work together.
  4. What is the role of SOS_SCHEDULER and non-preemptive scheduling in SQL Server?
  5. Explain buffer pool internals and lazy writer behavior.
  6. What are memory clerks, and how do they help in memory troubleshooting?
  7. Explain the difference between latch and lock.
  8. What are spinlocks? When do they become bottlenecks?
  9. What is the role of Resource Monitor in SQL Server?
  10. Explain NUMA architecture optimization in SQL Server.
  11. What is a checkpoint and what triggers it?
  12. Explain how the transaction log ensures durability.
  13. What is a log block and log sequence number (LSN)?
  14. Where is the plan cache stored and how is it cleaned up?
  15. Explain Query Optimizer phases.
  16. What is the “probe residual” in execution plans?
  17. How do memory grants work internally?
  18. What is Columnstore batch mode execution?
  19. How does SQL Server handle hash spills?
  20. What does the cardinality estimator do?
  21. What changed in the new CE vs legacy CE?
  22. What is parameter sniffing and why does it occur?
  23. Explain auto soft-NUMA and when it helps.
  24. What is DBCC MEMORYSTATUS used for?
  25. Explain the SQL Server page structure (8 KB pages, IAM, GAM, PFS).
  26. What is the purpose of the differential bitmap?
  27. What are ghost records and how are they cleaned?
  28. What is the Virtual Address Space (VAS) reservation?
  29. Explain timetable for lock escalation.
  30. What is the difference between synchronous & asynchronous I/O in SQL Server?

🟦 SECTION 2 — STORAGE ENGINE, FILES, LOG, RECOVERY MODELS (25 Questions)

  1. What causes VLF fragmentation? How do you fix it?
  2. How does SQL Server manage log reuse (log_reuse_wait_desc)?
  3. Why does the log grow even if the database is in SIMPLE recovery?
  4. What is the difference between physical and logical log?
  5. How is proportional fill algorithm used across data files?
  6. What are the best practices for designing filegroups?
  7. How does instant file initialization work?
  8. What is Accelerated Database Recovery (ADR)?
  9. What is the crash recovery cycle?
  10. How do you diagnose a database stuck in "In Recovery"?
  11. What conditions cause a file to go “offline”?
  12. What is differential backup base mismatch?
  13. How do torn page protections work?
  14. What is the difference between row-overflow and LOB storage?
  15. How do FILESTREAM and FileTable differ?
  16. Explain TempDB’s role in version store.
  17. How much log is generated for different operations (minimal logging)?
  18. When can minimal logging occur under BULK_LOGGED mode?
  19. What is a page restore?
  20. What happens during transaction rollback in SQL Server?
  21. What causes the log to be in a “recovery pending” state?
  22. How do you detect and fix corrupted page header?
  23. What is log truncation and how does it differ from shrinking?
  24. How are extents allocated?
  25. What is the IAM chain?

🟦 SECTION 3 — INDEXING, STATISTICS & COLUMNSTORE (35 Questions)

  1. What is the difference between clustered and nonclustered indexes physically?
  2. When should a heap be used instead of a clustered index?
  3. What are forwarded records and when do they occur?
  4. Explain the impact of page split and how to prevent it.
  5. What is fill factor and how do you choose the value?
  6. What is index fragmentation? What fragmentation really matters?
  7. Why may SQL Server ignore an index?
  8. How do statistics impact execution plans?
  9. What causes stale statistics?
  10. How does auto-update statistics work for large tables?
  11. When should AUTO_CREATE_STATS / AUTO_UPDATE_STATS be OFF?
  12. What is synchronous statistics update?
  13. How do filtered indexes work?
  14. How do filtered statistics differ from filtered indexes?
  15. What is the importance of key column order?
  16. What is the bookmark lookup?
  17. How does covering index reduce lookups?
  18. What causes index spool operations?
  19. What are missing index recommendations? Why are they risky?
  20. Explain hash indexes in memory-optimized tables.
  21. What is the difference between columnstore rowgroup & delta rowgroup?
  22. What is the significance of rowgroup trim threshold?
  23. Why do columnstore indexes switch to row mode execution?
  24. How does columnstore compression work?
  25. What causes dictionary corruption in columnstore?
  26. How do segment elimination and batch mode work?
  27. Explain online index rebuild internals.
  28. When is REORGANIZE better than REBUILD?
  29. Why can rebuilding indexes hurt performance sometimes?
  30. How do you detect unused indexes?
  31. How do you detect duplicate indexes?
  32. When should you use computed column indexes?

🟦 SECTION 4 — QUERY OPTIMIZATION & PERFORMANCE TUNING (35 Questions)

  1. What are the most important DMVs for performance troubleshooting?
  2. Explain SOS_SCHEDULER_YIELD waits.
  3. What causes THREADPOOL exhaustion?
  4. What causes ASYNC_NETWORK_IO waits?
  5. What is the difference between CPU-bound vs IO-bound workload?
  6. How to detect memory spills using execution plans?
  7. What is the significance of query_hash and query_plan_hash?
  8. How do you detect plans causing parameter sniffing?
  9. What is DBCC FREEPROCCACHE? When is it dangerous?
  10. How do you find top resource-consuming queries?
  11. How does Query Store capture regressions?
  12. What are key Query Store views for performance analysis?
  13. Compare forced parameterization vs optimize for ad hoc workloads.
  14. What is Cardinality Estimator feedback?
  15. What is a plan regression?
  16. How do multi-statement TVFs affect performance?
  17. How do scalar UDFs affect performance pre-2019 and post-2019?
  18. What is adaptive joins?
  19. What is memory grant feedback?
  20. How does row mode vs batch mode differ?
  21. How do you identify missing indexes with actual execution plans?
  22. Why does SQL Server choose nested loop join even when table is huge?
  23. What causes HASH MATCH (spills)?
  24. How does table variable behavior differ from temp tables?
  25. What is the impact of implicit conversions?
  26. Why do data type mismatches break index seeks?
  27. How do OPTION(RECOMPILE) and OPTION(OPTIMIZE FOR …) differ?
  28. What are XEvent sessions useful for?
  29. How to detect memory pressure using Ring Buffers?
  30. Explain physical join types and when each is used.
  31. What is feedback-driven optimization in SQL 2022?
  32. How do window functions affect performance?
  33. How do TOP and OFFSET-FETCH influence plans?

🟦 SECTION 5 — LOCKING, BLOCKING, DEADLOCKS & ISOLATION (25 Questions)

  1. Compare READ COMMITTED vs RCSI vs SNAPSHOT.
  2. What is TempDB version store?
  3. How do you identify the longest blocking chain?
  4. What is lock escalation and how to prevent it?
  5. What is a schema modification lock?
  6. What is a latch wait? How to diagnose?
  7. How do you fix PAGELATCH_UP contention?
  8. What is the difference between XACT_STATE and @@TRANCOUNT?
  9. How do you capture deadlocks via Extended Events?
  10. Why do deadlocks increase after adding indexes?
  11. How do you fix deadlocks caused by range locks?
  12. What causes SERIALIZABLE isolation deadlocks?
  13. What is an update conflict in snapshot isolation?
  14. What is the difference between LCK_M_S, LCK_M_X, LCK_M_IX?
  15. What is trace flag 1222 used for?
  16. What causes key-range locks?
  17. What is lock compatibility matrix?
  18. What is the impact of NOLOCK?
  19. Why do NOLOCK reads still block on schema locks?
  20. What is a retention lock?
  21. What causes HADR_SYNC_COMMIT waits?
  22. How does long open transactions impact log truncation?
  23. Why tempdb metadata can lock under heavy load?
  24. Detecting latch vs lock contention differences.

🟦 SECTION 6 — BACKUP / RESTORE / PITR / HIGH AVAILABILITY (30 Questions)

  1. What breaks the backup chain?
  2. When do differential backups stop being useful?
  3. How do striped backups improve performance?
  4. Explain tail-log backup and when it's needed.
  5. Restore sequence for full + diff + log backups.
  6. What is a piecemeal restore?
  7. What is TDE and why is certificate backup mandatory?
  8. How to restore a Page Restore scenario?
  9. How do you recover when log backups are missing?
  10. How to restore msdb without losing SQL Agent jobs?
  11. Explain Always On AG architecture.
  12. When does automatic seeding fail?
  13. What is synchronous commit vs synchronous-commit with automatic failover?
  14. What is primary replica vs secondary vs read-intent replica?
  15. Why does AG cause longer transactions under sync commit?
  16. What causes redo queue to grow?
  17. What are distributed AGs and when needed?
  18. How to diagnose AG failover issues?
  19. Difference between AG vs Log Shipping.
  20. Difference between AG vs Database Mirroring.
  21. What is quorum and node weighting?
  22. What causes AG “Not Synchronizing” errors?
  23. Explain automatic failover vs manual failover scenarios.
  24. How do you test AG failover without downtime?
  25. Log shipping: what causes secondary to lag?
  26. How to perform DR drill for 10TB database?

🟦 SECTION 7 — CORRUPTION, DBCC CHECKDB, REPAIR (25 Questions)

  1. Types of corruption SQL Server can detect.
  2. DBCC CHECKDB phases.
  3. How does DBCC CHECKDB detect physical corruption?
  4. What causes page checksum errors?
  5. When does torn page occur?
  6. What do you do when CHECKDB shows “repair_allow_data_loss”?
  7. What is DBCC CHECKFILEGROUP?
  8. How to recover a corrupt nonclustered index?
  9. How to interpret error 824 vs 825?
  10. What causes 824 errors?
  11. How do you isolate corruption to specific table/index?
  12. How do you restore only one table after corruption?
  13. How do you detect corruption early via alerts?
  14. How much overhead does CHECKDB introduce?
  15. Why does corruption often appear after power failure?
  16. What are “suspect pages”?
  17. How to view suspect pages from msdb?
  18. What is torn page detection?
  19. Why CHECKDB runs slower on large CLR or LOB objects?
  20. How do you detect corruption caused by storage subsystem?

🟦 SECTION 8 — TEMPDB, MEMORY, STORAGE, I/O (20 Questions)

  1. How to design optimal tempdb layout?
  2. What causes tempdb PAGELATCH contention?
  3. What is allocation bottleneck?
  4. How do you diagnose disk bottlenecks using DMVs?
  5. How to detect excessive checkpoint I/O storms?
  6. How to diagnose WRITELOG waits?
  7. What is indirect checkpoint?
  8. What is Buffer Pool Extension?
  9. How do you configure SQL Server for SSD/NVMe?
  10. How do you find which queries spill to tempdb?

🟦 SECTION 9 — SQL AGENT, AUTOMATION, MONITORING (15 Questions)

  1. What causes SQL Agent jobs to hang?
  2. How do you secure SQL Agent proxies?
  3. How to detect failed SQL Agent subsystems?
  4. How does SQL Agent behave in AG failover scenarios?
  5. Why do SQL jobs disappear after reinstalling SSMS?
  6. How to capture job execution time metrics?
  7. How do you audit job history automatically?
  8. How to implement retry logic for failed jobs?
  9. How do you handle SSIS package failures at scale?
  10. How do you monitor SQL Server using Extended Events?

🟦 SECTION 10 — SQL SERVER ON LINUX, DOCKER, KUBERNETES (15 Questions)

  1. Differences between SQL on Linux vs Windows.
  2. How does Active Directory authentication work on Linux?
  3. How do you configure persistent storage for SQL containers?
  4. What is mssql-conf?
  5. What are the limitations of SQL on Linux?
  6. How do you configure HADR on Linux-only clusters?
  7. How do you monitor SQL on containers?
  8. Why do permissions often fail on SQL Linux file paths?
  9. How do you diagnose I/O latency on Linux SQL Server?
  10. How do you secure SQL Server in Kubernetes?

🟦 SECTION 11 — CLOUD SQL SERVER (AZURE SQL / AWS RDS / MI) — 20 Questions

  1. Differences between SQL Server on VM vs Azure SQL DB.
  2. Why does Azure SQL throttle CPU even if low load?
  3. What is DTU vs vCore?
  4. How do you troubleshoot slow RDS failover?
  5. How do you secure SQL in Azure with Key Vault?
  6. How do long running transactions behave in cloud SQL?
  7. What causes log IO throttling in RDS SQL?
  8. What is the difference between Geo-replication vs Failover Groups?
  9. What limitations Azure SQL DB has vs SQL Server on VM?
  10. How do you migrate on-prem SQL to Azure SQL MI with near zero downtime?

🟦 SECTION 12 — REAL-WORLD SCENARIOS ROUND (MUST-KNOW) (70 Questions)

  1. SQL Server CPU 95% during peak — step-by-step diagnosis?
  2. Query slow only during business hours — where do you start?
  3. A log file is growing continuously — how do you stop it safely?
  4. Always On secondary lagging by hours — diagnosis steps?
  5. AG failover occurred but clients cannot reconnect — why?
  6. SQL Server taking 20 minutes to restart — what's happening internally?
  7. Index rebuild job increased blocking — what went wrong?
  8. Query uses index scan instead of seek — find the cause?
  9. Large ETL causing tempdb to explode — how to fix?
  10. SQL Server reports deadlocks every minute — root cause analysis?
  11. A table has millions of forwarded records — approach to fix?
  12. Statistics update caused plan regression — what's next?
  13. A query’s estimated rows = 1 but actual = millions — cause?
  14. Heavy waits on WRITELOG — how to tune?
  15. Always On automatic seeding keeps failing — diagnosis?
  16. AG replica stuck in “Not Synchronizing” — how to fix?
  17. Blocking chain has 120 sessions — approach?
  18. Tempdb 100% full — what immediate actions do you take?
  19. SQL Agent jobs stopped executing — where to check?
  20. Columnstore index performance degraded — why?
  21. High memory utilization but low workload — cause?
  22. SQL Server using only 50% CPU even under load — investigate?
  23. Query performs differently on primary vs secondary — why?
  24. Slow cross-database queries — approach?
  25. Database in “Recovery Pending” — how do you fix?
  26. Sudden plan regression after Cumulative Update — what to check?
  27. LOG SHIPPING COPY job failing — next steps?
  28. DBCC CHECKDB takes 6 hours — optimize?
  29. Log chain broken due to accidental full backup — handle?
  30. AG not failing over during outage — why?
  31. Frequent threadpool waits — diagnosis?
  32. Blocking caused by ghost cleanup — why?
  33. Backup performance degraded — how to fix?
  34. Deadlocks increased after schema changes — why?
  35. High PFS latch waits — what to do?
  36. Slow deletes due to page splits — fix?
  37. Slow merge replication — diagnosis?
  38. SQL Server memory dump generated — next steps?
  39. Columnstore delta stores huge — cause?
  40. Slow metadata queries on sys tables — why?
  41. SQL Server freezing during autoshrink — impact?
  42. Stale statistics on partitioned tables — fix?
  43. Foreign key cascading causing slow writes — how to redesign?
  44. Server fails to register Kerberos SPN — fix?
  45. SSIS package memory leak — troubleshooting?
  46. SQL Server login storms from app pool recycle — detect?
  47. Server shows “non-yielding scheduler” — analysis?
  48. Long waits on HADR_SYNC_COMMIT — root cause?
  49. Very slow linked server queries — fix?
  50. TempDB frequent autogrowth events — reason?
  51. Rebuild index blocking RCSI readers — why?
  52. Stack dump due to spinlock — diagnosis?
  53. Bulk insert extremely slow — analyze bottleneck?
  54. Weekly patching causes slow restart — root cause analysis?
  55. High latch waits on allocation maps — fix?
  56. Rewriting query fixes plan regression — why?
  57. SQL Server high memory but low workload — cause?
  58. High latency on data file reads — detect and fix?
  59. Wrap-around of identity column — safe repair?
  60. Credential corruption — resolve?
  61. SQL Agent crashes after Windows patch — root cause?
  62. Always On listener not registering DNS — troubleshoot?
  63. Temp table contention under parallel workloads — solve?
  64. Application using NOLOCK causing phantom reads — explain?
  65. Auto stats update kills plan stability — fix?
  66. Non-yielding resource monitor — diagnosis?
  67. Slow execution after cardinality estimator change — fix?
  68. Corrupted nonclustered index — recover?
  69. Identity jumped by 1000 — reason?
  70. Stored procedure slow first time, fast second time — why?

Comments