Saturday, September 27, 2014

Calculate a SUM across two tables

One of my customers called me in a panic last night.  They are a prop trader based in Chicago, and the bulk of their data is the trades that they execute throughout the day. Well, Friday was a pretty crazy day in the market -- stocks rose very high after the government's 2nd quarter growth estimate. Unfortunately, their systems queued up crazy, and they had to do a lot of manual adjustments post trade.  Not uncommon, and also not very fun.  

Anyway, because they struggled to handle the data-flow, they had to split their traffic into two target tables.  This was good intraday, because they got to trade all the way to the close.  At the end of the day, however, they needed one volume count -- not two.

That's all this is.  I've used their problem as an example.  This is just quick means to calculate a SUM across two tables. 


   -- Trader table, so we can translate the trdID
   CREATE TABLE dbo.Trader (
     trdID INT IDENTITY (1,1),
     trdName VARCHAR(25),
     trdDesk VARCHAR(25)
   )
   SET IDENTITY_INSERT dbo.Trader ON
   INSERT dbo.Trader (trdID,trdName,trdDesk)
   SELECT 1, 'John Doe','Equities'
   UNION
   SELECT 12, 'Edgar Smith','Forex'
   UNION
   SELECT 207, 'Jane Peters','Fixed Income'
   UNION
   SELECT 208, 'Jack Black','Futures'

   -- Their 1st trade table
   CREATE TABLE #table1 (trdDate DATE,trdID INT,Volume BIGINT)
   INSERT #table1 (trdDate,trdID,Volume)
   SELECT '9/26/2014',12,18005
   UNION
   SELECT '9/26/2014',1,100
   UNION
   SELECT '9/26/2014',207,12500
   UNION
   SELECT '9/26/2014',208,750

   -- Their 2nd trade table
   CREATE TABLE #table2 (trdDate DATE,trdID INT,Volume BIGINT)
   INSERT #table2 (trdDate,trdID,Volume)
   SELECT '9/26/2014',12,100
   UNION
   SELECT '9/26/2014',1,5500
   UNION
   SELECT '9/26/2014',207,1000
   UNION
   SELECT '9/26/2014',208,0

-- See data from both tables
SELECT * FROM #table1
SELECT * FROM #table2

      -- results
   trdDate        trdID    Volume
   2014-09-26 1 100
   2014-09-26 12 18005
   2014-09-26 207     12500
   2014-09-26 208 750

   trdDate        trdID    Volume
   2014-09-26 1 5500
   2014-09-26 12         100
   2014-09-26 207 1000
   2014-09-26 208 0


   -- Now we'll pull it all back out in one result-set, with the volumes summed
   SELECT trdDate,Trader,SUM(Volume) [Vol Traded]
   FROM
   (
       SELECT t1.trdDate,t2.trdName [Trader],t1.Volume
       FROM #table1 t1 INNER JOIN dbo.Trader t2
         ON t1.trdID = t2.trdID
       UNION ALL
       SELECT t1.trdDate,t2.trdName [Trader],t1.Volume
       FROM #table2 t1 INNER JOIN dbo.Trader t2
     ON t1.trdID = t2.trdID
   ) t
   GROUP BY trdDate,Trader
   ORDER BY trdDate,Trader


Our final result-set, volumes from both tables were summed into a single 'Vol Traded' --

   trdDate         Trader          Vol Traded
   2014-09-26    Edgar Smith        18105
   2014-09-26    Jack Black           750
   2014-09-26    Jane Peters        13500
   2014-09-26    John Doe            5600

Sunday, September 21, 2014

SQL Server Virtual Log Files

Each SQL Server database transaction log is composed of one or more physical files. Internal to each of these physical files are structures known as Virtual Log Files, or VLFs. Having too many or too few VLFs will impact the performance of your databases.

In this post I am just giving you a quick method to return the VLF count for each of your databases.  I will come back in the near future to help you determine whether you have too many or too few VLFs, and to provide you a method for correcting them.


/*
VLF count retrieval - 
Each transaction log file is divided logically into smaller segments called virtual log files.  The number of VLFs in each database will grow based on the autogrowth settings for the log file,  and how often transactions write to disk.   Too many VLFs will slow your log backups, and can   even slow down database recovery.

The VLF count is normal and expected in every database.  Larger VLF counts, however, are an 
impediment, and must be cleaned up.

This query returns the VLF count per database.   */


USE master;

-- Variables
DECLARE 
@query VARCHAR(555), -- updated for those incredibly long database names
@dbname SYSNAME,
@filecount INT
  
-- Table variable 
  DECLARE @databases Table (dbname SYSNAME)  
  INSERT @databases  (DBNAME)
  SELECT name
  FROM sys.databases 
  WHERE state = 0  --  << only online databases
  
-- Table variable for results  
  DECLARE @Results Table  (DBNAME SYSNAME, VLFCount INT)
 
  DECLARE @MajorVersion INT
  SET @MajorVersion
LEFT(CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(max)),CHARINDEX('.',CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(max)))-1

-- Table variable to capture DBCC loginfo output 
  IF @MajorVersion < 11 -- vSQL2012 
  BEGIN
      DECLARE @dbccloginfoA table  
      (  
        FileID tinyint,  
        FileSize bigint,  
        StartOffset bigint,  
        FSeqno int,  
        [Status] tinyint,  
        Parity tinyint,  
        CreateLSN numeric(25,0)  
      )  

     WHILE EXISTS(SELECT TOP 1 DBNAME FROM @databases)  
     BEGIN
        SET @dbname = (SELECT TOP 1 DBNAME FROM @databases)
        SET @query = 'DBCC LOGINFO (' + '''' + @dbname + ''') '  

        INSERT @dbccloginfoA (FileID,FileSize,StartOffset,FSeqno,Status,Parity,CreateLSN)
        EXEC (@query)  

        SET @filecount = @@rowcount  

        INSERT @Results (DBNAME,VLFCount)
        VALUES (@DBNAME, @filecount)

        DELETE @databases 
        WHERE dbname = @dbname  

      END
  END
  ELSE 
  BEGIN 
      DECLARE @dbccloginfoB TABLE
      (  
          RecoveryUnitID int
          FileID tinyint,  
          FileSize bigint,  
          StartOffset bigint,  
          FSeqno int,  
          [Status] tinyint,  
          Parity tinyint,  
          CreateLSN numeric(25,0)  
      ) 

    WHILE EXISTS(SELECT TOP 1 DBNAME FROM @databases)
    BEGIN
        SET @dbname = (SELECT TOP 1 DBNAME FROM @databases)
        SET @query = 'DBCC LOGINFO (' + '''' + @dbname + ''') '  

        INSERT @dbccloginfoB (RecoveryUnitId,FileID,FileSize,StartOffset,FSeqno,Status,Parity,CreateLSN
        EXEC (@query)

        SET @filecount = @@rowcount

        INSERT @Results
        VALUES (@dbname, @filecount)  

        DELETE @databases WHERE dbname = @dbname
    END
  END 
  
-- output results
SELECT
dbname [Database], 
VLFCount [VLF Count]
FROM 
@Results
ORDER BY
dbname


Take a look at each of these links for much more information regarding the VLF's:
     Transaction Log Physical Architecture
     High Virtual Log File (VLF) Count
     Transaction Log VLFs – too many or too few?

Friday, September 19, 2014

How to find the model name or serial number of a Windows server

Ok, ok...  Not necessarily tSQL, but definitely quite useful!  Today, for example, I was evaluating a customer's server, attempting to answer the question -- is the server adequate for the firm's enterprise reporting solution?  As-is, not even close.  But, I wanted to know the model of the machine, so that I could determine whether we could add more disk, and what our other resource limitations were.

Easy sneasy.  Open a command prompt on your server and type the following to return your computer model:

         wmic csproduct get name

Or, type in this at the command prompt to return the serial number for the server:
       
         wmic bios get serialnumber

Why is this helpful to you, as a DBA?  Well, with that information, you can get online and and look up the server specs.  In this case, it's an HP Proliant SL390s G7, which DOES allow for additional drives, based on the model, be it 1U, 2U or 4U.  For example, if it is a 4U, you can have up to eight 2.5" hot plug hard drive bays.



In my book, as a DBA you need to know you hardware layer just as much as you do the software layer.  This is a little tip that will help you do exactly that.

Monday, September 15, 2014

Using data to support the business, or to drive the decisions?

This is probably one of the best articles I have read in some time.  Written by Daniel Waisberg, published this month on www.thinkwithgoogle.com ~

No need to re-write it.  Please read it, see how Daniel suggests that we scope the project, answer a few very key questions, and build the framework.  Somewhat ironic, is the comparison to Architecture. I actually began my studies years ago as an Architect for homes, and he is right! Turning abstract design into reality is very much like structuring the data model. I think back to countless hours of reporting, and data analysis within the trading industry. It wasn't about the trade, or a specific data point. It was about the pre and the post, or the trending and the patterns. For example, the realtime PnL or the daily, month-to-date and even yearly statistics. Every time range provided a very different visual on the trading activity. I learned the market and the individual product definitions. I then provided one awareness to the traders, and another to upper management. That insight was used to predict gains (and losses), and drive the trade.  

Anyway, I was very impressed with Daniel's article, and I wanted to share it here as well. I have been on both sides of the fence, per se.  Supporting business operations with the data, and driving the business decisions. The former is overly complicated, very problematic, and not nearly advantageous as it need be.  Electronic tradding, or any other industry -- the latter is where we want to be.  I am certain.

Modeling our data with meaning.  Thinking with it, and gaining from it.  What else is there?

Friday, September 5, 2014

What are SQL Server Statistics?

Statistics store information about the distribution of data in your tables.  The optimizer uses this metadata to estimate the selectivity of a given statement, and create the optimal query plan.  To manage SQL Server efficiently, you really need a good understanding of the statistics.  In this post I will just provide some information that may help you to better understand, and manage your statistics

Statistics are automatically created on each of the index columns, when an index is created, and they are created automatically by the optimizer, when AUTO_CREATE_STATISTICS is enabled. Auto-created stats names include the column name and object ID in hexadecimal format, like this:     _WA_Sys_00000002_79FD19BE


Statistics can also be created manually, via the CREATE STATISTICS statement... though I, personally, have never had a need.  Generally speaking, the optimizer typically does quite fine. 

That's how they are created, but where are they stored? Statistics are stored in the sysindexes table.  We can also find additional information within the sys.stats and sys.indexes metadata views.  

By default, the statistics are updated automatically.  This occurs when 20% of the data in a table changes. It's important to know that this threshold can sometimes be a little high on the larger tables.  For this reason, it's not a bad idea to create a custom routine specific to the larger tables alone.  Or, even to just update the stats manually on those larger tables, on a periodic basis.  

How do you know if your statistics are being updated?  Run this statement to see if AUTO UPDATE is enabled:

       EXEC sp_helpdb YourDBName

The status line in the result-set should include 'IsAutoUpdateStatistics'.  If it doesn't, your statistics are not being updated.... ouch.  You'll want to correct that with this next statement, but remember -- if stats haven't been updated for a while, this 1st run can be pretty timely, and invasive.  Be sure to do this after hours:

      USE YourDBName;
      -- Enable Auto Update of Statistics
      ALTER DATABASE YourDBName
      SET AUTO_UPDATE_STATISTICS ON;
      GO
      -- Update Statistics for whole database 
      EXEC sp_updatestats
      GO

So, how does SQL Server actually use the statistics?  The optimizer uses this metadata to estimate how many rows each operator in the query plan will operate on.  That estimation is used to calculate the cost of each operator, and identify the cheapest query plan for the statement you are running.  Outdated, inaccurate or missing statistics can dramatically impact this process.  If the optimizer cannot accurately estimate this cost, you are going to see substantial performance degradation.

Performance not quite what it should be?  Well... when were your statistics last updated? Real easy to check:  

     SELECT 
          OBJECT_NAME(object_id) [Table],
          STATS_DATE(object_id, stats_id) [LastUpdated],
          name [Statistic]
     FROM 
         sys.stats
    WHERE 
         OBJECT_NAME(object_id) NOT LIKE 'sys%'
    ORDER BY 
         STATS_DATE(object_id, stats_id) DESC

    TIP:   Keep that one in your quick list. Somebody says 'why is it so slow?'.... start by 
             checking the stats.

Alright, so how do you read your stats?  I've listed several different methods below.  Run them on your system, take a look at your output.  This should help to better familiarize yourself with the statistics overall.  

     -- using sys.partitions and sys.indexes
     SELECT
     OBJECT_NAME(p.object_id) [Table],
             p.index_id [IndexID],
             i.name [Index],
             p.rows [#Rows]
      FROM
     sys.partitions p INNER JOIN sys.indexes i 
        ON p.object_id = i.object_id
                AND p.index_id = i.index_id
      WHERE   
      p.object_id = OBJECT_ID(N'Production.WorkOrder');

     -- using sys.dm_db_index_physical_stats
     SELECT  
       OBJECT_NAME(ips.object_id) [Table],
             ips.index_id [IndexID],
             i.name [Index],
             ips.record_count [#Rows]
     FROM
 sys.dm_db_index_physical_stats(DB_ID(N'AdventureWorks2012'),OBJECT_ID(N'Production.WorkOrder'),NULL,NULL,'DETAILED') ips JOIN sys.indexes i 
 ON ips.object_id = i.object_id
                  AND ips.index_id = i.index_id
     WHERE
    ips.index_level = 0;


     -- using sysindexes, sysobjects, sysusers, and information_schema.tables
     SELECT
         schemas.name [Schema],
         tbls.name [Table],
         i.name [Index],
         i.id [TableID],
         i.indid [IndexID],
         i.GroupID,
         i.rowmodctr [ModifiedRows],
          (SELECT MAX(rowcnt) 
           FROM sysindexes i2 
           WHERE i.id = i2.id
           AND i2.indid < 2) [RowCnt],
             CONVERT(DECIMAL(18,8),CONVERT(DECIMAL(18,8),i.rowmodctr) /
              CONVERT(DECIMAL(18,8),
(SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) ) )      
           [ModifiedPct],
         stats_date( i.id, i.indid ) [LastStatsUpdate],
    'False' [Processed]
     FROM
          sysindexes i INNER JOIN sysobjects tbls
             ON i.id = tbls.id INNER JOIN sysusers schemas 
                 ON tbls.uid = schemas.uid INNER JOIN information_schema.tables tl
                    ON tbls.name = tl.table_name
                    AND schemas.name = tl.table_schema
                    AND tl.table_type = 'BASE TABLE'
     WHERE
          0 < i.indid 
         AND i.indid < 255
         AND table_schema <> 'sys'
         AND i.rowmodctr <> 0
AND (SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) > 0


     -- using sys.stats and sys.dm_db_stats_properties
     SELECT 
          st.object_id [TableID],
          OBJECT_NAME(st.object_id) [Table],
          st.name [Index],
          STATS_DATE(st.object_id, st.stats_id) [LastUpdated],
          modification_counter [RowsModified]
     FROM
          sys.stats st CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp 
    WHERE
          STATS_DATE(st.object_id, st.stats_id) <= DATEADD(DAY,-1,GETDATE())
         AND modification_counter > 0
         AND OBJECTPROPERTY(st.object_id'IsUserTable' ) = 1


That's it for now.  It's pretty simple -- if your statistics are out of date, or missing, the optimizer is not able to make optimal query plan choices.  This will impede query performance.  In some cases very dramatically.  I have given you a lot of choices for viewing your statistics, and I have even sampled an actual statistic update.  I encourage you to take a look at Ola Hallengren's database maintenance solution, which includes a good piece for maintaining SQL Server statistics.   See here:   

       https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html

Another good reference is at Brent Ozar's site, the sp_Blitz script.  Among many other things, this procedure will tell you where your auto stats updates are disabled.  I encourage you to take a look at this page, but -- don't stop with the statistics.  You can use sp_Blitz to perform a very fast, yet hugely informative health check on your systems. Check it out.

      http://www.brentozar.com/blitz/auto-update-stats-disabled/


Lastly, take a look here for more in depth information regarding the statistics:
   
      http://msdn.microsoft.com/en-us/library/ms190397.aspx