Monday, October 13, 2014

More SQL Server Aggregate functions

I mentioned that I would try to get back in here with some more examples using SQL Server's Aggregate functions. I think this is a good one. One of the requests I used to get again and again, was the current net position for all symbols traded on the Equity exchanges. The Ops desk (the Team supporting the traders) needed to know if positions were open for any symbol across these exchanges:    ISLD, BRUT, ARCA, CHX, NSDQ, NYSE.

To handle that, I just created this procedure which was run on the back-end, and would alert them if an open position was found.

  CREATE PROCEDURE dbo.usp_NetPosBySymbol
  AS
  SET NOCOUNT ON
  /*
   Returns the net position by symbol, across the equity exchanges, if exists.
   EXEC dbo.usp_NetPosBySymbol 

  Auth:  Me
  Date:  10/13/2014
  */
  BEGIN
        -- use this method to avoid an IN list in the WHERE clause
DECLARE @Exchanges TABLE (ExchangeName VARCHAR(15))
INSERT @Exchanges (ExchangeName)
VALUES 
('ARCA'),
('BRUT'),
('CHX'),
('DNET'),
('ISLD'),
('NSDQ'),
('NYSE');

IF EXISTS(
SELECT buys-sells AS Difference
FROM (
SELECT Symbol,SUM(Quantity) [buys] 
FROM dbo.Orders o
WHERE Side = 'B' 
         AND EXISTS(
                    SELECT 1 FROM @Exchanges e WHERE e.ExchangeName = o.Exchange)
GROUP BY Symbol) buys JOIN
(
SELECT Symbol,SUM(Quantity) [sells]
FROM dbo.Orders o 
WHERE Side = 'S' 
AND EXISTS(
                     SELECT 1 FROM @Exchanges e WHERE e.ExchangeName = o.Exchange)
GROUP BY Symbol) sells
         ON buys.Symbol = sells.Symbol
WHERE 
  buys <> sells)

BEGIN
SELECT buys.Symbol,Buys,Sells,buys-sells AS Difference
FROM(
SELECT Symbol,SUM(Quantity) [buys] 
FROM dbo.Orders o
WHERE Side = 'B' 
           AND EXISTS(
                      SELECT 1 FROM @Exchanges e WHERE e.ExchangeName = o.Exchange)
GROUP BY Symbol) buys JOIN
(
SELECT Symbol,SUM(Quantity) [sells]
FROM dbo.Orders o 
WHERE Side = 'S' 
             AND EXISTS(
                       SELECT 1 FROM @Exchanges e WHERE e.ExchangeName = o.Exchange)
GROUP BY Symbol) sells
              ON buys.Symbol = sells.Symbol
WHERE 
                   buys <> sells
ORDER BY 
                   buys.Symbol
       END
    END

  SET NOCOUNT OFF
  GO

We're still using the Orders data that we wrote in this post, SQL Aggregate functions, and these are our results:

   Symbol   Buys  Sells  Difference
   BEAS      55    100     -45
   YHOO      5   105     -55

That is just a simple example using SUM.  We first do an existence check, and then we return the net position back to the front end, if exists.

See this for more information, and for a couple more examples using the SUM function:
     http://msdn.microsoft.com/en-us/library/ms187810.aspx

SQL Aggregate functions for count & quantity traded

You're managing a lot of trading data, and you've been asked to provide some statistics back to the trading desks. There are numerous different types of statistics that you could provide, all using SQL Server aggregate functions such as COUNT, SUM, AVG, etc. Today I am going to show a very simple example for reporting how many buys and sells they are trading on each symbol.

First, we need an Orders table that all of trades are written to.  Very simple design, like this:

CREATE TABLE dbo.Orders (
OrderID INT IDENTITY(1,1),
TraderID CHAR (12) NOT NULL,
Exchange CHAR (8) NOT NULL,
Symbol CHAR (32) NOT NULL,
Side CHAR (1) NOT NULL,
Quantity BIGINT NOT NULL ,
Price DECIMAL(18, 8) NOT NULL ,
OrderDate DATETIME NOT NULL 
) ON YourDB_Data
GO

I should note, MUCH better practices would be a numeric identifier for Exchange, Symbol and even Side. But... then I'd have to do all kinds of JOINs to get the data back to you meaningfully. For this example, we can ignore the performance impediments of character strings vs integers.  ;-)

For the insert into Orders, I am only going to use a small chunk of orders by only one trader, John Doe.

-- load some orders
SET IDENTITY_INSERT Orders ON
INSERT dbo.Orders (OrderID,TraderID,Exchange,Symbol,Side,Quantity,Price,OrderDate)
VALUES
(1,10,'NYSE','AAPL','B',55,12.45,'10/13/2012 09:30'),
(2,10,'NSDQ','YHOO','B',50,61.75,'10/13/2012 10:30'),
(3,10,'CME','AAPL','B',5,17.50,'10/13/2012 09:00'),
(4,10,'NYSE','DDM9','S',5,1.75,'10/13/2012 08:30'),
(5,10,'NSDQ','ABS','B',3,66.502,'10/13/2012 09:35'),
(61,10,'NYSE','BEAS','B',55,1.75,'10/13/2012 12:20'),
(7,10,'NSDQ','BEAS','S',100,1.025,'10/13/2012 12:55'),
(8,10,'CME','CAG','B',10,11.53,'10/13/2012 13:30'),
(9,10,'CSX','YHOO','S',10,17.705,'10/13/2012 16:55'), 
(10,10,'NYSE','YHOO','S',50,10.45,'10/13/2012 10:30'),
(11,10,'NSDQ','YHOO','S',55,60.75,'10/13/2012 11:30'),
(12,10,'CME','YHOO','S',5,19.50,'10/13/2012 10:00');
SET IDENTITY_INSERT Orders OFF

Ok. We've got some order data for trader John Doe. With the query below, you will return a count for how many orders he's traded, and how much quantity was for buys vs sells. Basically, counts and quantities for each trader, by symbol and side ('side' is trading speak for buy or sell).

  DECLARE 
      @OrderDate DATE = '10/13/2012',
@TraderID INT =  10

  -- collect the trade count (#) and Qty for each trader, by symbol
  SELECT 
CONVERT(VARCHAR(10),o.OrderDate,101)  [Date],
t.TraderName,
o.Symbol,
COUNT(*) [#Trades],
LTRIM(RTRIM(LEFT(CONVERT(CHAR(16),CAST(SUM(CASE WHEN o.Side = 'b'
          THEN o.Quantity ELSE 0 END) AS MONEY),1),13))) [BuyQty],
LTRIM(RTRIM(LEFT(CONVERT(CHAR(16),CAST(SUM(CASE WHEN o.Side = 's'
          THEN o.Quantity ELSE 0 END) AS MONEY),1),13))) [SellQty]
  FROM 
dbo.Orders o INNER JOIN dbo.Traders t
         ON o.TraderID = t.TraderID
  WHERE 
CONVERT(VARCHAR(10),o.OrderDate,101) = @OrderDate
AND o.TraderID = @TraderID
  GROUP BY 
CONVERT(VARCHAR(10),o.OrderDate,101),
t.TraderName,
o.Symbol
  ORDER BY
CONVERT(VARCHAR(10),o.OrderDate,101),
o.Symbol

The results:

     OrderDate   Trader   Symbol   #Trades  BuyQty   SellQty
     10/13/2012  John Doe   AAPL     2      60        0
     10/13/2012  John Doe    ABS     1       3        0
     10/13/2012  John Doe   BEAS     2      55      100
     10/13/2012  John Doe    CAG     1      10        0
     10/13/2012  John Doe   DDM9     1       0        5
     10/13/2012  John Doe   YHOO     5      50      120

I added these two statements after the one above just to help me confirm that it's accurate. Looking at the raw and the agg data side by side makes it easier to validate:

    SELECT COUNT(*) FROM dbo.Orders WHERE TraderID = 10 AND Symbol = 'AAPL'
    SELECT * FROM dbo.Orders WHERE TraderID = 10 AND Symbol = 'ABS' AND Side = 'b'

Like I said, there are many different ways to report the data using SQL Server's aggregate functions. This is just one example. I'll try to come back soon with more. Until then, take a look at this, for more detail:


Calculate running totals in SQL Server

You want to show details for the individual orders traded (or transactions), but you also need to show the total orders traded by each trader, in the same result set. This means you need to calculate running totals. There are numerous ways to calculate running totals with tSQL, but you've got to be cautious, or your performance could be less than acceptable. In this post, I'm giving you just a quick example for calculating the running total by using a correlated sub query... and we're going to return it for each trader, by date.

-- first let's create a table for our traders
IF OBJECT_ID('tempdb..#traders') <> 0
DROP TABLE #traders;
CREATE TABLE #Traders (
    TraderID INT IDENTITY(1,1),
    TraderName VARCHAR(45)
 )
SET IDENTITY_INSERT #Traders ON
INSERT #Traders (TraderID,TraderName)
VALUES (10, 'John Doe'),
    (11, 'Jane Smith');
SET IDENTITY_INSERT #Traders OFF;

-- create our orders table 
CREATE TABLE #orders (
OrderID INT IDENTITY(1,1), 
OrderDate DATETIME
TraderID INT,
Quantity INT,
RunningTotal INT NULL )

-- populate #orders
SET IDENTITY_INSERT #orders ON
INSERT #orders (OrderID,OrderDate,TraderID,Quantity)
VALUES (1,'10/12/2014 09:00', 10, 100),
            (2, '10/12/2014 11:00', 10, 200),
           (3, '10/12/2014 12:00', 10, 300),
      (4, '10/12/2014 14:00', 11, 400),
           (5, '10/12/2014 14:30', 11, 500),
          (6,'10/13/2014 09:00', 10, 100),
          (7, '10/13/2014 11:00', 10, 90),
          (8, '10/13/2014 12:00', 11, 300),
          (9, '10/13/2014 14:00', 11, 4500),
        (10, '10/13/2014 14:30', 11, 500);

-- take a look at the orders data that you just loaded
SELECT * FROM #orders

   OrderID   OrderDate             TraderID Quantity RunningTotal
     1    2014-10-12 09:00:00.000   10       100       NULL
     2    2014-10-12 11:00:00.000   10       200       NULL
     3    2014-10-12 12:00:00.000   10       300       NULL
     4    2014-10-12 14:00:00.000   11       400       NULL 
     5    2014-10-12 14:30:00.000   11       500       NULL
     6    2014-10-13 09:00:00.000   10       100       NULL
   .............................
   ..................

Now we'll use a correlated subquery to bring it back out for each trader, and each order date.

   -- bring back the running total
   SELECT
          CONVERT(CHAR(10),o.OrderDate,101) [OrderDate],
          t.TraderName,
          o.Quantity,
         (
              SELECT SUM(Quantity) FROM #orders 
              WHERE OrderID <= O.OrderID 
              AND CONVERT(CHAR(10),OrderDate,101) = CONVERT(CHAR(10),o.OrderDate,101)
              AND TraderID = o.TraderID
         ) [Running Total] 
    FROM
          #orders o INNER JOIN #Traders t
          ON o.TraderID = t.TraderID
    ORDER BY
       OrderDate,
       t.TraderName


Here's your full result-set :

    OrderDate    TraderName   Quantity  RunningTotal
    10/12/2014 Jane Smith 400 400
    10/12/2014 Jane Smith 500 900
    10/12/2014 John Doe 100 100
    10/12/2014 John Doe 200 300
    10/12/2014 John Doe 300 600
    10/13/2014 Jane Smith 300 300
    10/13/2014 Jane Smith 4500 4800
    10/13/2014 Jane Smith 500 5300
    10/13/2014 John Doe 100 100
    10/13/2014 John Doe 90 190


That's it!  As I said, there are numerous ways to calculate running totals. There are new functions with the newer releases, such as the OVER clause that became available in v2005.  Here is a quick example using the OVER clause:

   -- using the OVER clause
   SELECT 
      OrderDate,
      Quantity,
      SUM(Quantity) OVER (ORDER BY o.OrderID) [RunningTotal]
   FROM 
      #orders o
   ORDER BY 
      o.OrderID;


Here is a little more information on the correlated subquery:
 http://technet.microsoft.com/en-us/library/ms187638(v=sql.105).aspx

And take a look at this for more detail on the OVER clause:
  http://msdn.microsoft.com/en-us/library/ms189461(v=sql.105).aspx



Sunday, October 5, 2014

SQL query for the total PnL for each trader, by symbol

I am cleaning up my laptop tonight, preparing to buy a new one... hopefully before this one dies. Today I found something that I had intended to include the other day when I posted this example for calculating sums across two tables: http://www.sqlfingers.com/2014/09/calculate-sum-across-two-tables.html

Just like that example, this was something the same customer needed very quickly after the close, to report the PnL for each trader account, by symbol.  I've trimmed it down a bit, including only the objects necessary to collect the PnL, but here it is: 

   -- first let's create a table for our traders
   CREATE TABLE dbo.Traders (
      TraderID INT IDENTITY(1,1),
      TraderName VARCHAR(45)
   )
   SET IDENTITY_INSERT dbo.Traders ON
   INSERT dbo.Traders (TraderID,TraderName)
   VALUES (10,'John Doe'),
              (11,'Jane Smith');
   SET IDENTITY_INSERT dbo.Traders OFF;

-- now let's create our daily trades table
CREATE TABLE dbo.Trades (
    TraderID INT,
    TradeID INT,
    Symbol VARCHAR(35),
    Volume BIGINT,
    Side CHAR(1)
)
   -- write some trades
   INSERT dbo.Trades (TraderID,TradeID,Symbol,Volume,Side)
   VALUES ( 10,1, 'AAPL', 10000,1),
                 ( 10,2, 'AAPL', 10000,2),
( 10,3, 'AAPL', -10000,2);

   -- check your data
   SELECT * FROM dbo.Trades

Ok.  Let's say closing price is 1.5.  To return the eod(end-of-day) PnL for each trader by symbol, we'd multiply each symbol's net volume by the closing price.  In this case, it's just AAPL:

  -- declarations
   DECLARE 
@price MONEY = 1.5,
@pnl MONEY

   -- collect the @pnl for AAPL
   SELECT @pnl = @price * SUM(Volume) 
   FROM dbo.Trades 
   WHERE Symbol = 'AAPL'

   -- bring it back out for each trader  - in this example we only have one trader... :-).   
   SELECT t1.TraderName, t2.Symbol, @pnl [PnL]
   FROM dbo.Traders t1 INNER JOIN dbo.Trades t2
          ON t1.TraderID = t2.TraderID
   GROUP BY t1.TraderName, t2.Symbol

Your results:

   TraderName   Symbol    PnL     
   John Doe     AAPL   15000.00

There you go.  Nothing too elaborate... just a SQL query for the closing profit and loss for each trader, by symbol.

I've mentioned before that I have virtually lived with the stock market data for almost 20 years. I have miles of tSQL; current, daily and month-to-date statistics, message ratios, PnL, open order reports, clearing... and more.  If any of you are managing trader volumes and pnl, you know what a pain it can be!  I have decided to share some of this, so you may begin seeing more logic like the one above. Let me know if you're looking for something that you don't see. I may have it already, or I can help you put it together.


List all empty tables in your SQL Server database

Here’s a simple query to find all empty tables in your SQL Server database.  We are using the dm_db_partition_stats DMV, which returns page and row-count information for every partition in the current database.  

   /*   Use the dm_db_partition_stats DMV to list all empty tables in your database.   */
   ;WITH Empty AS     
   (
    SELECT 
OBJECT_NAME(OBJECT_ID) [Table],
SUM(row_count) [Records]
    FROM 
sys.dm_db_partition_stats      
    WHERE 
index_id = 0 OR index_id = 1      
    GROUP BY 
OBJECT_ID      
   )      

   -- pull it back out
   SELECT [Table],Records 
   FROM [Empty]      
   WHERE [Records] = 0

   -- your results
   Table           Records
   sysclones          0
   sysseobjvalues     0
   syspriorities      0
   sysdbfrag       0
   sysfgfrag       0
   .........
   ......
   ..

This will provide you a little more information on the sys.dm_db_partition_stats DMV:

Please remember, this is not guaranteed to be an accurate record count.  These counts are based on what is cached in memory, or stored on disk in varied system tables. Something I often refer people to is this post from Kalen Delaney. She shows different methods to get the row counts, and makes us ask ourselves what 'accurate' really means:

Wait. What about the tables that aren't used anymore, but are not empty?  We often have unused tables in the database that still contain data. In my book, if they're not used, get rid of them. Thus, preserving server resources for what is required, rather than old/unused objects that are no longer needed.




Saturday, September 27, 2014

Calculate a SUM across two tables

One of my customers called me in a panic last night.  They are a prop trader based in Chicago, and the bulk of their data is the trades that they execute throughout the day. Well, Friday was a pretty crazy day in the market -- stocks rose very high after the government's 2nd quarter growth estimate. Unfortunately, their systems queued up crazy, and they had to do a lot of manual adjustments post trade.  Not uncommon, and also not very fun.  

Anyway, because they struggled to handle the data-flow, they had to split their traffic into two target tables.  This was good intraday, because they got to trade all the way to the close.  At the end of the day, however, they needed one volume count -- not two.

That's all this is.  I've used their problem as an example.  This is just quick means to calculate a SUM across two tables. 


   -- Trader table, so we can translate the trdID
   CREATE TABLE dbo.Trader (
     trdID INT IDENTITY (1,1),
     trdName VARCHAR(25),
     trdDesk VARCHAR(25)
   )
   SET IDENTITY_INSERT dbo.Trader ON
   INSERT dbo.Trader (trdID,trdName,trdDesk)
   SELECT 1, 'John Doe','Equities'
   UNION
   SELECT 12, 'Edgar Smith','Forex'
   UNION
   SELECT 207, 'Jane Peters','Fixed Income'
   UNION
   SELECT 208, 'Jack Black','Futures'

   -- Their 1st trade table
   CREATE TABLE #table1 (trdDate DATE,trdID INT,Volume BIGINT)
   INSERT #table1 (trdDate,trdID,Volume)
   SELECT '9/26/2014',12,18005
   UNION
   SELECT '9/26/2014',1,100
   UNION
   SELECT '9/26/2014',207,12500
   UNION
   SELECT '9/26/2014',208,750

   -- Their 2nd trade table
   CREATE TABLE #table2 (trdDate DATE,trdID INT,Volume BIGINT)
   INSERT #table2 (trdDate,trdID,Volume)
   SELECT '9/26/2014',12,100
   UNION
   SELECT '9/26/2014',1,5500
   UNION
   SELECT '9/26/2014',207,1000
   UNION
   SELECT '9/26/2014',208,0

-- See data from both tables
SELECT * FROM #table1
SELECT * FROM #table2

      -- results
   trdDate        trdID    Volume
   2014-09-26 1 100
   2014-09-26 12 18005
   2014-09-26 207     12500
   2014-09-26 208 750

   trdDate        trdID    Volume
   2014-09-26 1 5500
   2014-09-26 12         100
   2014-09-26 207 1000
   2014-09-26 208 0


   -- Now we'll pull it all back out in one result-set, with the volumes summed
   SELECT trdDate,Trader,SUM(Volume) [Vol Traded]
   FROM
   (
       SELECT t1.trdDate,t2.trdName [Trader],t1.Volume
       FROM #table1 t1 INNER JOIN dbo.Trader t2
         ON t1.trdID = t2.trdID
       UNION ALL
       SELECT t1.trdDate,t2.trdName [Trader],t1.Volume
       FROM #table2 t1 INNER JOIN dbo.Trader t2
     ON t1.trdID = t2.trdID
   ) t
   GROUP BY trdDate,Trader
   ORDER BY trdDate,Trader


Our final result-set, volumes from both tables were summed into a single 'Vol Traded' --

   trdDate         Trader          Vol Traded
   2014-09-26    Edgar Smith        18105
   2014-09-26    Jack Black           750
   2014-09-26    Jane Peters        13500
   2014-09-26    John Doe            5600

Sunday, September 21, 2014

SQL Server Virtual Log Files

Each SQL Server database transaction log is composed of one or more physical files. Internal to each of these physical files are structures known as Virtual Log Files, or VLFs. Having too many or too few VLFs will impact the performance of your databases.

In this post I am just giving you a quick method to return the VLF count for each of your databases.  I will come back in the near future to help you determine whether you have too many or too few VLFs, and to provide you a method for correcting them.


/*
VLF count retrieval - 
Each transaction log file is divided logically into smaller segments called virtual log files.  The number of VLFs in each database will grow based on the autogrowth settings for the log file,  and how often transactions write to disk.   Too many VLFs will slow your log backups, and can   even slow down database recovery.

The VLF count is normal and expected in every database.  Larger VLF counts, however, are an 
impediment, and must be cleaned up.

This query returns the VLF count per database.   */


USE master;

-- Variables
DECLARE 
@query VARCHAR(555), -- updated for those incredibly long database names
@dbname SYSNAME,
@filecount INT
  
-- Table variable 
  DECLARE @databases Table (dbname SYSNAME)  
  INSERT @databases  (DBNAME)
  SELECT name
  FROM sys.databases 
  WHERE state = 0  --  << only online databases
  
-- Table variable for results  
  DECLARE @Results Table  (DBNAME SYSNAME, VLFCount INT)
 
  DECLARE @MajorVersion INT
  SET @MajorVersion
LEFT(CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(max)),CHARINDEX('.',CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(max)))-1

-- Table variable to capture DBCC loginfo output 
  IF @MajorVersion < 11 -- vSQL2012 
  BEGIN
      DECLARE @dbccloginfoA table  
      (  
        FileID tinyint,  
        FileSize bigint,  
        StartOffset bigint,  
        FSeqno int,  
        [Status] tinyint,  
        Parity tinyint,  
        CreateLSN numeric(25,0)  
      )  

     WHILE EXISTS(SELECT TOP 1 DBNAME FROM @databases)  
     BEGIN
        SET @dbname = (SELECT TOP 1 DBNAME FROM @databases)
        SET @query = 'DBCC LOGINFO (' + '''' + @dbname + ''') '  

        INSERT @dbccloginfoA (FileID,FileSize,StartOffset,FSeqno,Status,Parity,CreateLSN)
        EXEC (@query)  

        SET @filecount = @@rowcount  

        INSERT @Results (DBNAME,VLFCount)
        VALUES (@DBNAME, @filecount)

        DELETE @databases 
        WHERE dbname = @dbname  

      END
  END
  ELSE 
  BEGIN 
      DECLARE @dbccloginfoB TABLE
      (  
          RecoveryUnitID int
          FileID tinyint,  
          FileSize bigint,  
          StartOffset bigint,  
          FSeqno int,  
          [Status] tinyint,  
          Parity tinyint,  
          CreateLSN numeric(25,0)  
      ) 

    WHILE EXISTS(SELECT TOP 1 DBNAME FROM @databases)
    BEGIN
        SET @dbname = (SELECT TOP 1 DBNAME FROM @databases)
        SET @query = 'DBCC LOGINFO (' + '''' + @dbname + ''') '  

        INSERT @dbccloginfoB (RecoveryUnitId,FileID,FileSize,StartOffset,FSeqno,Status,Parity,CreateLSN
        EXEC (@query)

        SET @filecount = @@rowcount

        INSERT @Results
        VALUES (@dbname, @filecount)  

        DELETE @databases WHERE dbname = @dbname
    END
  END 
  
-- output results
SELECT
dbname [Database], 
VLFCount [VLF Count]
FROM 
@Results
ORDER BY
dbname


Take a look at each of these links for much more information regarding the VLF's:
     Transaction Log Physical Architecture
     High Virtual Log File (VLF) Count
     Transaction Log VLFs – too many or too few?