Tuesday, June 16, 2015

SQL Server v2012 - Running Totals by Date - SUM() OVER()

In a previous post, I provided an example for calculating running totals on orders traded, using a correlated subquery and the OVER() clause, which was introduced in v2005. You'll see that here: http://www.sqlfingers.com/2014/10/calculate-running-totals-in-sql-server.html

In this post I am going to show you a much sexier version of the same, using the SUM() OVER() Windows functions, introduced in v2012. Remarkably simple, take a look:

     /* to  load our data */
     DECLARE @Orders TABLE(OrderID INT,OrderDate DATETIME,OrderAmount INT)

     INSERT @Orders VALUES
          (1,'06/15/2015 09:30',20.00),
           (2,'06/15/2015 09:35',22.125),
          (3,'06/15/2015 10:01',50.00),
          (4,'06/15/2015 12:32',-10.00),
          (5,'06/15/2015 13:55',12.75),
          (6,'06/16/2015 09:30',20.00),
          (7,'06/16/2015 09:00',22.25),
          (8,'06/16/2015 10:17',-5.00),
          (9,'06/16/2015 11:44',100.23),
           (10,'06/16/2015 12:36',77.00),
          (11,'06/16/2015 13:22',20.00),
          (12,'06/16/2015 14:58',20.25);

     /* single query */
     SELECT 
          OrderDate,
          OrderAmount,
          RunningTotal = SUM(OrderAmount) OVER (
       PARTITION BY DAY(OrderDate) 
       ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) 
     FROM 
         @Orders
    ORDER BY 
         OrderID;

Here is our result-set:

















Two things to note;  First, if you look at the RunningTotal from 6/15 to 6/16, you will see the total ends at the end of one day, and begins anew the next. I did this by partitioning by DAY(OrderDate). If you want to calculate the running total from the beginning to the end of your orders, across all of your trade dates, then you would use this statement:

      SUM(OrderAmount) OVER (ORDER BY OrderDate ROWS UNBOUNDED PRECEDING)

These are the results, where you'll see the Running Total is not partitioned by date:
















Secondly, a key factor of these new Windows functions is 'framing', which lets us further define our collection using ROWS or RANGE. The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which just means to calculate all previous rows up to your current row of data. Similarly, you can use ROWS in your frame, as I have in my first query. The results are the same, but the optimizer actually creates the work table in memory... and improves the performance!!

Not that big of a deal with a 12 record sample, but you can still see it. Run this query in comparison to the top one. SET STATISTICS IO on both, and you will see what I'm saying. Exactly the same results, but better performance using ROWS rather than RANGE.

     SELECT 
          OrderDate,
      OrderAmount,
          RunningTotal SUM(OrderAmount) OVER (
PARTITION BY DAY(OrderDate
ORDER BY OrderDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)  
     FROM 
          @Orders
     ORDER BY 
          OrderDate;

STATISTICS IO for ROWS:
(12 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#B7361CD1'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

STATISTICS IO for RANGE:
(12 row(s) affected)
Table 'Worktable'. Scan count 14, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#B54DD45F'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

No comments:

Post a Comment