Tools

Tools: SQL SELECT with COUNT: Syntax, Examples, and Guide

2026-05-09 0 views admin

By Safa Mulani and Vinayak Baranwal A SELECT query with COUNT(...) returns how many rows or values satisfy the query. The three forms are COUNT(*) for total rows, COUNT(expression) for non-NULL values of an expression, and COUNT(DISTINCT expression) for unique non-NULL values. COUNT pairs with FROM, WHERE, GROUP BY, and HAVING to answer questions like “how many completed orders does each customer have.” This tutorial covers syntax, NULL handling, performance trade-offs, conditional counting with CASE WHEN, joins, subqueries, and dialect-specific behavior on MySQL 8.x, PostgreSQL 15+, SQL Server 2022, and Oracle 19c. Every example runs against a shared two-table schema you can copy into your own database. Run this DDL and DML once. NULL status and amount values demonstrate NULL handling, shared cities support GROUP BY examples, and customer Hank has no orders to demonstrate LEFT JOIN behavior. The seed script uses plain string date literals so it runs unchanged on MySQL 8.x, PostgreSQL 15+, SQL Server 2022, and Oracle 19c. See also SQL JOINs and SUM, AVG, and COUNT. Use COUNT when you need to know how many rows exist, how many non-NULL values a column has, or how many unique values appear. It is the most common aggregate function in SQL and shows up in dashboards, validation queries, pagination logic, and reporting jobs across every relational database. It runs after WHERE filters, pairs cleanly with window frames, and still trips teams when someone mixes the three forms without checking NULL rules first. COUNT has three forms. Pick the form that matches the question you are asking. Rules that save debugging time later: Filters belong in WHERE when applied before aggregation, or in HAVING when applied after GROUP BY. The short answer: COUNT(*) counts rows regardless of NULL, COUNT(column) skips rows where that column is NULL, and COUNT(DISTINCT column) skips both NULLs and duplicates. If one screen shows 8 customers and another shows 7, compare whether each query used COUNT(*) versus COUNT(status) before you chase ghosts in the warehouse. Run the query against the sample schema: customer_rows is 8 because there are 8 rows in customers. non_null_status is 7 because Bob’s status is NULL, so COUNT(status) skips that row. Legacy Oracle and DB2 codebases often use COUNT(1); the literal is never NULL, so it counts every row just like COUNT(*). Note: COUNT(*) and COUNT(1) produce the same query plan in MySQL 8.x, PostgreSQL 15+, SQL Server 2019+, and Oracle 19c+. Both express row cardinality without inspecting column payloads. The historical belief that COUNT(1) is faster traces back to an Oracle 7 optimizer quirk fixed decades ago. Use COUNT(*) in new code; it is the SQL standard form. A common follow-up question: how do you count rows where a column is NULL? COUNT cannot do this directly, but two patterns work: Output (both queries): CASE scales to multiple columns; subtraction stays shorter with selective indexes. The query below asks all three questions of the same orders table at once: how many rows total, how many have a non-NULL amount, and how many unique buyers placed orders. all_orders is 13 because there are 13 rows in orders. orders_with_amount is 12 because order 102 has a NULL amount, so COUNT(amount) skips it. distinct_buyers is 7 because seven different customers placed orders (Hank has none). The short answer: COUNT(*) and COUNT(1) are fast and equivalent. COUNT(column) is similar but skips NULLs. COUNT(DISTINCT column) is the expensive one because the planner has to deduplicate before counting, and that cost grows superlinearly with row count when no covering index is available. Both ask for row cardinality without reading column payloads. Planners on MySQL 8.x, PostgreSQL 15+, SQL Server 2019+, and Oracle 19c+ pick the same plans for the two and usually prefer the smallest index that answers the question. Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN FORMAT=TREE (MySQL 8.x) when a large-table COUNT(*) suddenly regresses. Index-only paths help when the column is indexed, and NULL rules mean the number can sit below COUNT(*). The win over COUNT(*) shows up mainly on very wide rows and large tables. Deduplication forces a sort or hash. Without a covering index, expect minutes on hundred-million-row scans. Three strategies when COUNT(DISTINCT) hurts: Note: EXPLAIN ANALYZE SELECT COUNT(DISTINCT customer_id) FROM orders; shows whether PostgreSQL chose Aggregate -> Sort (indexed) or Aggregate -> HashAggregate (heap heavy). Read the plan before tuning. WHERE filters rows before aggregation, so COUNT only sees rows that match the predicate. A single WHERE predicate filters the input before COUNT evaluates it. The query below counts how many orders have a status of 'completed': Seven of the thirteen rows in orders have status = 'completed'; the other six are split between 'pending' and 'cancelled'. Combine predicates with AND and OR to count rows that satisfy compound conditions. The query below counts completed orders with amount greater than 50: Six of the seven completed orders clear the threshold. Order 103 is excluded because its amount is exactly 50, and the predicate uses strict greater-than. Dashboards often ask how many events landed in the last week. Use CURRENT_DATE (PostgreSQL, Oracle), CURDATE() (MySQL), or GETDATE() (SQL Server) inside the predicate: All three queries return the same result against the sample data when the current date is 2024-10-13: Add a b-tree on order_date if this predicate runs hot in production. GROUP BY emits one row per distinct value in the grouping column, with COUNT reporting the row count for each group. The query below counts customers per city: Each city shows 2 because the seed data deliberately places two customers per city. On a real dataset the counts would vary, and ORDER BY count DESC is the common pattern for ranking groups by size. HAVING filters after grouping, unlike WHERE, which filters raw rows before aggregation. Joining customers to orders and filtering on the join count shows the difference clearly. Chicago and Denver are excluded because their order counts (3 and 2) fall below the HAVING threshold. DISTINCT removes duplicate values before COUNT evaluates the set. COUNT(DISTINCT column) counts how many unique non-NULL values appear in a column. Use it when the question is “how many different X are there,” not “how many rows reference X.” The query below counts how many distinct cities appear in customers: Eight customers share four cities (Austin, Boston, Chicago, Denver), so DISTINCT collapses the duplicates and COUNT returns 4. To count distinct combinations of two or more columns, wrap a DISTINCT projection in a derived table and count the result. The orders table has multiple rows per customer, so (customer_id, status) pairs actually deduplicate, which makes them good for showing the pattern. Thirteen rows collapse to twelve distinct (customer_id, status) pairs (Carol’s duplicate 'completed' rows merge). Standard SQL only allows one expression inside COUNT(DISTINCT ...). PostgreSQL and Oracle reject the multi-column form outright. SQL Server and MySQL allow it but with caveats around NULL handling that change between versions. The derived-table form is portable and behaves consistently everywhere. COUNT(DISTINCT ...) and the derived-table form both pay deduplication cost. Indexes that cover every column inside DISTINCT keep sorts cheap; otherwise expect hash or external sort plans. Confirm with EXPLAIN ANALYZE (PostgreSQL), EXPLAIN FORMAT=TREE (MySQL 8.x), or SET STATISTICS PROFILE ON (SQL Server). COUNT(CASE WHEN ... THEN 1 END) feeds COUNT a non-NULL marker only when the predicate passes, which lets one query report several conditional totals in a single table scan. The query below produces a status breakdown across all orders: The three counts add up to 13, which matches COUNT(*) FROM orders. Running this as three separate WHERE-filtered queries would scan the table three times; the CASE WHEN form scans once. Each CASE arm can reference a different column or combine predicates with AND and OR, so a single table scan can produce several conditional totals at once. The pattern below counts completed orders that have an amount alongside pending orders that are missing one: Joins multiply rows before COUNT runs. If you forget that, every dashboard looks fine in QA and drifts in production. Counting customers while joining to orders to filter on status is the textbook mistake: That 7 is the number of completed order rows, not distinct customers. Carol alone contributes two of those rows because she has two 'completed' orders. The fix is COUNT(DISTINCT) on the dimension key: When a one-to-many join feeds an aggregate, decide whether you care about rows on the many side (COUNT(*), COUNT(many_table.id)) or identities on the one side (COUNT(DISTINCT one_table.id)). Mixing the two ships quiet bugs. INNER JOIN keeps only customers who have at least one matching order, so its row totals differ from outer-join variants on the same data. The query below groups by customer and counts each one’s orders: Hank does not appear in the output because he has no rows in orders. INNER JOIN drops him entirely. The next section shows LEFT JOIN, which keeps Hank and forces a decision about how to count him. LEFT JOIN keeps customers without orders. COUNT(*) counts the padded row where the right side is NULL; COUNT(o.order_id) ignores NULL order ids. Warning: After a LEFT JOIN, COUNT(*) counts the joined row even when all right-side columns are NULL; COUNT(o.order_id) counts only matched orders. Mixing the two forms changes totals for customers without orders. Look at Hank’s row. rows_after_join is 1 because the LEFT JOIN produced a single padded row for him with all orders columns set to NULL. matched_orders is 0 because COUNT(o.order_id) skips that NULL. Picking the wrong form silently shifts Hank’s total between zero and one, which is how dashboards drift from reality. A correlated subquery runs once per outer row and uses COUNT to compare each customer to their own aggregate. The query below returns customers who have more than two orders: Carol is the only match because she has three orders (105, 106, 113). Every other customer has two or fewer. Correlated subqueries are easy to write but expensive at scale because the inner query repeats per outer row; the FAQ at the end of this tutorial covers when to switch to EXISTS instead. Derived tables expose aggregates to outer WHERE clauses cleanly. Note: APPROX_COUNT_DISTINCT and PostgreSQL hll trade accuracy for speed; keep them out of ledgers that require exact balances. COUNT(*) on InnoDB is cheap when the planner can walk a narrow secondary index instead of the clustered primary tree. Clustered leaves hold full rows; secondary leaves hold keys plus pointers, so COUNT(*) often prefers the smallest secondary index on wide tables. Use EXPLAIN to confirm which index the planner picked: Sample output (columns vary slightly by MySQL release): The key column shows the index the planner chose. A non-PRIMARY entry like idx_customer_id means the secondary-index shortcut fired. Using index in the Extra column confirms the engine answered the query directly from the index without touching row data. If you are maintaining a legacy MySQL schema and COUNT(*) returned instantly on a billion-row table, check the storage engine before assuming the optimizer is doing something clever: MyISAM cached the exact row count in the table header and returned COUNT(*) without scanning anything. InnoDB does not, because MVCC means the “true” row count depends on the calling transaction’s snapshot. Migrations from MyISAM to InnoDB are where teams first notice their dashboard totals slowing down overnight, and the engine column in information_schema.tables is the fastest way to confirm the cause. PostgreSQL supports COUNT as a window function via COUNT(*) OVER (PARTITION BY ...). Unlike GROUP BY, which collapses each partition into a single row, the window form keeps every detail row and adds the partition count alongside it. This is what you want when a report needs both per-row data and per-group totals in the same result set: Every row keeps its full detail and gains a customers_in_city column showing the partition total. The value is 2 for every row because each city has two customers in the seed data; on real data the column would vary by partition. For heavy distinct workloads on PostgreSQL, the hll extension trades exact answers for constant memory. Install it once per database: The query below approximates the distinct buyer count using HyperLogLog. The three nested calls hash each customer_id, aggregate the hashes into an hll sketch, then read the cardinality estimate from the sketch: The result matches the exact COUNT(DISTINCT customer_id) because HyperLogLog falls back to linear counting at small cardinalities. On larger datasets, expect roughly 2% error at constant low-kilobyte memory. Use hll for dashboards and telemetry; keep billing and reconciliation on exact COUNT(DISTINCT ...). Oracle matches PostgreSQL on basic COUNT(*) OVER (PARTITION BY ...), so focus on habits you only see on Oracle: DUAL, uppercase identifiers unless quoted, and richer analytic frames. DUAL is Oracle’s built-in single-row table. It is the standard way to evaluate an expression without touching real data, which makes it useful for smoke-testing logic in stored procedures and migration scripts: COUNT(*) against DUAL always returns 1 because DUAL always has exactly one row. The query shows up in real codebases as a way to verify that a connection works and that a procedure compiles. Oracle also exposes COUNT as an analytic function with explicit frame clauses, which lets you build running totals row by row. The frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW tells the engine to count from the first row of the partition up through the current row, ordered by order_date: Unquoted identifiers surface as uppercase (ORDER_ID). For approximate cardinality, Oracle 12c (12.1.0.2) and later expose APPROX_COUNT_DISTINCT(column) directly, including 19c and 23c. Use it when exact COUNT(DISTINCT ...) runs too long, and materialize rollups when the same approximate count powers repeated dashboard queries. SQL Server 2019 introduced APPROX_COUNT_DISTINCT as a built-in HyperLogLog-based alternative to COUNT(DISTINCT ...). It is the right tool when a dashboard needs distinct counts on tables in the hundreds of millions of rows and can tolerate roughly 2% error in exchange for constant memory and predictable response time: The result is exactly 7 here because SQL Server, like PostgreSQL hll, switches to linear counting for small cardinalities. On a billion-row table the result would be within roughly 2% of the true distinct count and would return in seconds rather than minutes. Reach for this in monitoring and capacity-planning queries; keep exact COUNT(DISTINCT ...) for anything that ends up on a financial report. Docs: COUNT, APPROX_COUNT_DISTINCT. COUNT(*) counts every row in the result set, including rows with NULL values in any column. COUNT(column_name) counts only rows where that column is not NULL. Use COUNT(*) for whole-row totals. COUNT(*) counts rows that contain NULL somewhere. COUNT(column_name) skips NULL in that column. COUNT(DISTINCT column_name) drops NULL and duplicates before counting. Use WHERE before COUNT, for example SELECT COUNT(*) FROM orders WHERE status = 'completed';. To count rows under several conditions in one query, nest CASE WHEN inside COUNT. GROUP BY defines partitions; COUNT returns one total per group. Non-aggregated SELECT columns must repeat in GROUP BY or sit inside aggregates. COUNT(*) skips deduplication. COUNT(DISTINCT column) pays sort or hash costs unless a covering index helps. Yes. One-to-many joins duplicate rows before grouping. With LEFT JOIN, use COUNT(*) only when unmatched dimension rows should register as one padded row; otherwise count a non-NULL fact key. Run SELECT COUNT(DISTINCT column_name) FROM table_name; for unique non-NULL values. Syntax matches across MySQL 8.x, PostgreSQL 15+, SQL Server 2022, and Oracle 19c. Core COUNT forms match for ANSI-shaped queries. Engines diverge on window syntax, APPROX_COUNT_DISTINCT, InnoDB plans for bare COUNT(*), and PostgreSQL hll. Prefer EXISTS when you only need a yes or no. WHERE (SELECT COUNT(*) ...) > 0 always walks every match; EXISTS stops at the first hit. Both return seven customers here (everyone except Hank). The gap widens on wide fact tables. COUNT returns 0 on empty input; SUM, AVG, MIN, and MAX return NULL. Drop redundant COALESCE wrappers around COUNT. This tutorial covered the three forms of the COUNT aggregate function and the NULL and duplicate rules that distinguish them. It walked through filtering before aggregation with WHERE, filtering grouped results with HAVING, per-group totals with GROUP BY, conditional counting with CASE WHEN, fan-out behavior across INNER JOIN and LEFT JOIN, subquery and derived-table patterns, and dialect-specific behavior on MySQL 8.x, PostgreSQL 15+, SQL Server 2022, and Oracle 19c, including approximate-count helpers like APPROX_COUNT_DISTINCT and the PostgreSQL hll extension. You can now choose the right COUNT form for any question, count NULL values directly, avoid duplicate overcounting after one-to-many joins, distinguish row totals from distinct-entity totals at a join boundary, read planner output before tuning slow COUNT(DISTINCT ...) queries, and move between portable ANSI SQL and engine-specific helpers without surprises. To go deeper, read the GROUP BY, JOIN, and DISTINCT tutorials, and rehearse the fundamentals in An Introduction to Queries in MySQL. When you are ready to run these patterns against real workloads, a DigitalOcean Managed Database keeps practice and test traffic isolated from production. Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases. Learn more about our products Building future-ready infrastructure with Linux, Cloud, and DevOps. Full Stack Developer & System Administrator. Technical Writer @ DigitalOcean | GitHub Contributor | Passionate about Docker, PostgreSQL, and Open Source | Exploring NLP & AI-TensorFlow | Nailed over 50+ deployments across production environments. Please complete your information! Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation. Full documentation for every DigitalOcean product. The Wave has everything you need to know about building a business, from raising funding to marketing your product. Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter. New accounts only. By submitting your email you agree to our Privacy Policy Scale up as you grow — whether you're running one virtual machine or ten thousand. From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.