sqlfingers.com: September 2014

Saturday, September 27, 2014

Calculate a SUM across two tables

One of my customers called me in a panic last night. They are a prop trader based in Chicago, and the bulk of their data is the trades that they execute throughout the day. Well, Friday was a pretty crazy day in the market -- stocks rose very high after the government's 2nd quarter growth estimate. Unfortunately, their systems queued up crazy, and they had to do a lot of manual adjustments post trade. Not uncommon, and also not very fun.

Anyway, because they struggled to handle the data-flow, they had to split their traffic into two target tables. This was good intraday, because they got to trade all the way to the close. At the end of the day, however, they needed one volume count -- not two.

That's all this is. I've used their problem as an example. This is just quick means to calculate a SUM across two tables.

-- Trader table, so we can translate the trdID

CREATE TABLE dbo.Trader (

trdID INT IDENTITY (1,1),

trdName VARCHAR(25),

trdDesk VARCHAR(25)

)

SET IDENTITY_INSERT dbo.Trader ON

INSERT dbo.Trader (trdID,trdName,trdDesk)

SELECT 1, 'John Doe','Equities'

UNION

SELECT 12, 'Edgar Smith','Forex'

UNION

SELECT 207, 'Jane Peters','Fixed Income'

UNION

SELECT 208, 'Jack Black','Futures'

-- Their 1st trade table

CREATE TABLE #table1 (trdDate DATE,trdID INT,Volume BIGINT)

INSERT #table1 (trdDate,trdID,Volume)

SELECT '9/26/2014',12,18005

UNION

SELECT '9/26/2014',1,100

UNION

SELECT '9/26/2014',207,12500

UNION

SELECT '9/26/2014',208,750

-- Their 2nd trade table

CREATE TABLE #table2 (trdDate DATE,trdID INT,Volume BIGINT)

INSERT #table2 (trdDate,trdID,Volume)

SELECT '9/26/2014',12,100

UNION

SELECT '9/26/2014',1,5500

UNION

SELECT '9/26/2014',207,1000

UNION

SELECT '9/26/2014',208,0

-- See data from both tables

SELECT * FROM #table1

SELECT * FROM #table2

-- results

trdDate trdID Volume

2014-09-26 1 100

2014-09-26 12 18005

2014-09-26 207 12500

2014-09-26 208 750

trdDate trdID Volume

2014-09-26 1 5500

2014-09-26 12 100

2014-09-26 207 1000

2014-09-26 208 0

-- Now we'll pull it all back out in one result-set, with the volumes summed

SELECT trdDate,Trader,SUM(Volume) [Vol Traded]

FROM

(

SELECT t1.trdDate,t2.trdName [Trader],t1.Volume

FROM #table1 t1 INNER JOIN dbo.Trader t2

ON t1.trdID = t2.trdID

UNION ALL

SELECT t1.trdDate,t2.trdName [Trader],t1.Volume

FROM #table2 t1 INNER JOIN dbo.Trader t2

ON t1.trdID = t2.trdID

) t

GROUP BY trdDate,Trader

ORDER BY trdDate,Trader

Our final result-set, volumes from both tables were summed into a single 'Vol Traded' --

trdDate Trader Vol Traded

2014-09-26 Edgar Smith 18105

2014-09-26 Jack Black 750

2014-09-26 Jane Peters 13500

2014-09-26 John Doe 5600

Sunday, September 21, 2014

SQL Server Virtual Log Files

Each SQL Server database transaction log is composed of one or more physical files. Internal to each of these physical files are structures known as Virtual Log Files, or VLFs. Having too many or too few VLFs will impact the performance of your databases.

In this post I am just giving you a quick method to return the VLF count for each of your databases. I will come back in the near future to help you determine whether you have too many or too few VLFs, and to provide you a method for correcting them.

/*
VLF count retrieval -
Each transaction log file is divided logically into smaller segments called virtual log files. The number of VLFs in each database will grow based on the autogrowth settings for the log file, and how often transactions write to disk. Too many VLFs will slow your log backups, and can even slow down database recovery.

The VLF count is normal and expected in every database. Larger VLF counts, however, are an
impediment, and must be cleaned up.

This query returns the VLF count per database. */

USE master;

-- Variables
DECLARE
@query VARCHAR(555), -- updated for those incredibly long database names
@dbname SYSNAME,
@filecount INT

-- Table variable
DECLARE @databases Table (dbname SYSNAME)
INSERT @databases (DBNAME)
SELECT name
FROM sys.databases
WHERE state = 0 -- << only online databases

-- Table variable for results
DECLARE @Results Table (DBNAME SYSNAME, VLFCount INT)

DECLARE @MajorVersion INT
SET @MajorVersion =
LEFT(CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(max)),CHARINDEX('.',CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(max)))-1)

-- Table variable to capture DBCC loginfo output
IF @MajorVersion < 11 -- vSQL2012
BEGIN
DECLARE @dbccloginfoA table
(
  FileID tinyint,
  FileSize bigint,
StartOffset bigint,
FSeqno int,
[Status] tinyint,
  Parity tinyint,
CreateLSN numeric(25,0)
)

WHILE EXISTS(SELECT TOP 1 DBNAME FROM @databases)
BEGIN
SET @dbname = (SELECT TOP 1 DBNAME FROM @databases)
SET @query = 'DBCC LOGINFO (' + '''' + @dbname + ''') '

INSERT @dbccloginfoA (FileID,FileSize,StartOffset,FSeqno,Status,Parity,CreateLSN)
EXEC (@query)

SET @filecount = @@rowcount

INSERT @Results (DBNAME,VLFCount)
VALUES (@DBNAME, @filecount)

DELETE @databases
WHERE dbname = @dbname

END
END
ELSE
BEGIN
DECLARE @dbccloginfoB TABLE
(
RecoveryUnitID int,
FileID tinyint,
FileSize bigint,
StartOffset bigint,
FSeqno int,
[Status] tinyint,
Parity tinyint,
CreateLSN numeric(25,0)
)

WHILE EXISTS(SELECT TOP 1 DBNAME FROM @databases)
BEGIN
SET @dbname = (SELECT TOP 1 DBNAME FROM @databases)
SET @query = 'DBCC LOGINFO (' + '''' + @dbname + ''') '

INSERT @dbccloginfoB (RecoveryUnitId,FileID,FileSize,StartOffset,FSeqno,Status,Parity,CreateLSN)
EXEC (@query)

SET @filecount = @@rowcount

INSERT @Results
VALUES (@dbname, @filecount)

DELETE @databases WHERE dbname = @dbname
END
END

-- output results
SELECT
dbname [Database],
VLFCount [VLF Count]
FROM
@Results
ORDER BY
dbname

Take a look at each of these links for much more information regarding the VLF's:
Transaction Log Physical Architecture
High Virtual Log File (VLF) Count
Transaction Log VLFs – too many or too few?

Friday, September 19, 2014

How to find the model name or serial number of a Windows server

Ok, ok... Not necessarily tSQL, but definitely quite useful! Today, for example, I was evaluating a customer's server, attempting to answer the question -- is the server adequate for the firm's enterprise reporting solution? As-is, not even close. But, I wanted to know the model of the machine, so that I could determine whether we could add more disk, and what our other resource limitations were.

Easy sneasy. Open a command prompt on your server and type the following to return your computer model:

wmic csproduct get name

Or, type in this at the command prompt to return the serial number for the server:

wmic bios get serialnumber

Why is this helpful to you, as a DBA? Well, with that information, you can get online and and look up the server specs. In this case, it's an HP Proliant SL390s G7, which DOES allow for additional drives, based on the model, be it 1U, 2U or 4U. For example, if it is a 4U, you can have up to eight 2.5" hot plug hard drive bays.

In my book, as a DBA you need to know you hardware layer just as much as you do the software layer. This is a little tip that will help you do exactly that.

Monday, September 15, 2014

Using data to support the business, or to drive the decisions?

This is probably one of the best articles I have read in some time. Written by Daniel Waisberg, published this month on www.thinkwithgoogle.com ~

https://www.thinkwithgoogle.com/articles/data-to-insights-blueprint-for-your-business.html

No need to re-write it. Please read it, see how Daniel suggests that we scope the project, answer a few very key questions, and build the framework. Somewhat ironic, is the comparison to Architecture. I actually began my studies years ago as an Architect for homes, and he is right! Turning abstract design into reality is very much like structuring the data model. I think back to countless hours of reporting, and data analysis within the trading industry. It wasn't about the trade, or a specific data point. It was about the pre and the post, or the trending and the patterns. For example, the realtime PnL or the daily, month-to-date and even yearly statistics. Every time range provided a very different visual on the trading activity. I learned the market and the individual product definitions. I then provided one awareness to the traders, and another to upper management. That insight was used to predict gains (and losses), and drive the trade.

Anyway, I was very impressed with Daniel's article, and I wanted to share it here as well. I have been on both sides of the fence, per se. Supporting business operations with the data, and driving the business decisions. The former is overly complicated, very problematic, and not nearly advantageous as it need be. Electronic tradding, or any other industry -- the latter is where we want to be. I am certain.

Modeling our data with meaning. Thinking with it, and gaining from it. What else is there?

Friday, September 5, 2014

What are SQL Server Statistics?

Statistics store information about the distribution of data in your tables. The optimizer uses this metadata to estimate the selectivity of a given statement, and create the optimal query plan. To manage SQL Server efficiently, you really need a good understanding of the statistics. In this post I will just provide some information that may help you to better understand, and manage your statistics

Statistics are automatically created on each of the index columns, when an index is created, and they are created automatically by the optimizer, when AUTO_CREATE_STATISTICS is enabled. Auto-created stats names include the column name and object ID in hexadecimal format, like this: _WA_Sys_00000002_79FD19BE

Statistics can also be created manually, via the CREATE STATISTICS statement... though I, personally, have never had a need. Generally speaking, the optimizer typically does quite fine.

That's how they are created, but where are they stored? Statistics are stored in the sysindexes table. We can also find additional information within the sys.stats and sys.indexes metadata views.

By default, the statistics are updated automatically. This occurs when 20% of the data in a table changes. It's important to know that this threshold can sometimes be a little high on the larger tables. For this reason, it's not a bad idea to create a custom routine specific to the larger tables alone. Or, even to just update the stats manually on those larger tables, on a periodic basis.

How do you know if your statistics are being updated? Run this statement to see if AUTO UPDATE is enabled:

EXEC sp_helpdb YourDBName

The status line in the result-set should include 'IsAutoUpdateStatistics'. If it doesn't, your statistics are not being updated.... ouch. You'll want to correct that with this next statement, but remember -- if stats haven't been updated for a while, this 1st run can be pretty timely, and invasive. Be sure to do this after hours:

USE YourDBName;

-- Enable Auto Update of Statistics

ALTER DATABASE YourDBName

SET AUTO_UPDATE_STATISTICS ON;

-- Update Statistics for whole database

EXEC sp_updatestats

So, how does SQL Server actually use the statistics? The optimizer uses this metadata to estimate how many rows each operator in the query plan will operate on. That estimation is used to calculate the cost of each operator, and identify the cheapest query plan for the statement you are running. Outdated, inaccurate or missing statistics can dramatically impact this process. If the optimizer cannot accurately estimate this cost, you are going to see substantial performance degradation.

Performance not quite what it should be? Well... when were your statistics last updated? Real easy to check:

SELECT

OBJECT_NAME(object_id) [Table],

STATS_DATE(object_id, stats_id) [LastUpdated],

name [Statistic]

FROM

sys.stats

WHERE

OBJECT_NAME(object_id) NOT LIKE 'sys%'

ORDER BY

STATS_DATE(object_id, stats_id) DESC

TIP: Keep that one in your quick list. Somebody says 'why is it so slow?'.... start by

checking the stats.

Alright, so how do you read your stats? I've listed several different methods below. Run them on your system, take a look at your output. This should help to better familiarize yourself with the statistics overall.

-- using sys.partitions and sys.indexes

SELECT
OBJECT_NAME(p.object_id) [Table],
p.index_id [IndexID],
i.name [Index],
p.rows [#Rows]
FROM
sys.partitions p INNER JOIN sys.indexes i
ON p.object_id = i.object_id
AND p.index_id = i.index_id
WHERE
  p.object_id = OBJECT_ID(N'Production.WorkOrder');

-- using sys.dm_db_index_physical_stats
SELECT
OBJECT_NAME(ips.object_id) [Table],
ips.index_id [IndexID],
i.name [Index],
ips.record_count [#Rows]
FROM
sys.dm_db_index_physical_stats(DB_ID(N'AdventureWorks2012'),OBJECT_ID(N'Production.WorkOrder'),NULL,NULL,'DETAILED') ips JOIN sys.indexes i
ON ips.object_id = i.object_id
  AND ips.index_id = i.index_id
WHERE
  ips.index_level = 0;

-- using sysindexes, sysobjects, sysusers, and information_schema.tables
SELECT
schemas.name [Schema],
tbls.name [Table],
i.name [Index],
i.id [TableID],
i.indid [IndexID],
i.GroupID,
i.rowmodctr [ModifiedRows],
(SELECT MAX(rowcnt)
FROM sysindexes i2
WHERE i.id = i2.id
AND i2.indid < 2) [RowCnt],
   CONVERT(DECIMAL(18,8),CONVERT(DECIMAL(18,8),i.rowmodctr) /
   CONVERT(DECIMAL(18,8),
(SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) ) )
[ModifiedPct],
stats_date( i.id, i.indid ) [LastStatsUpdate],
  'False' [Processed]
FROM
  sysindexes i INNER JOIN sysobjects tbls
ON i.id = tbls.id INNER JOIN sysusers schemas
ON tbls.uid = schemas.uid INNER JOIN information_schema.tables tl
  ON tbls.name = tl.table_name
  AND schemas.name = tl.table_schema
  AND tl.table_type = 'BASE TABLE'
WHERE
0 < i.indid
AND i.indid < 255
AND table_schema <> 'sys'
AND i.rowmodctr <> 0
AND (SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) > 0

-- using sys.stats and sys.dm_db_stats_properties
SELECT
  st.object_id [TableID],
  OBJECT_NAME(st.object_id) [Table],
  st.name [Index],
  STATS_DATE(st.object_id, st.stats_id) [LastUpdated],
  modification_counter [RowsModified]
FROM
  sys.stats st CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp
WHERE
  STATS_DATE(st.object_id, st.stats_id) <= DATEADD(DAY,-1,GETDATE())
AND modification_counter > 0
AND OBJECTPROPERTY(st.object_id, 'IsUserTable' ) = 1

That's it for now. It's pretty simple -- if your statistics are out of date, or missing, the optimizer is not able to make optimal query plan choices. This will impede query performance. In some cases very dramatically. I have given you a lot of choices for viewing your statistics, and I have even sampled an actual statistic update. I encourage you to take a look at Ola Hallengren's database maintenance solution, which includes a good piece for maintaining SQL Server statistics. See here:

https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html

Another good reference is at Brent Ozar's site, the sp_Blitz script. Among many other things, this procedure will tell you where your auto stats updates are disabled. I encourage you to take a look at this page, but -- don't stop with the statistics. You can use sp_Blitz to perform a very fast, yet hugely informative health check on your systems. Check it out.

http://www.brentozar.com/blitz/auto-update-stats-disabled/

Lastly, take a look here for more in depth information regarding the statistics:

http://msdn.microsoft.com/en-us/library/ms190397.aspx

SQL Server Consulting

Saturday, September 27, 2014

Calculate a SUM across two tables

Sunday, September 21, 2014

SQL Server Virtual Log Files

Friday, September 19, 2014

How to find the model name or serial number of a Windows server

Monday, September 15, 2014

Using data to support the business, or to drive the decisions?

Friday, September 5, 2014

What are SQL Server Statistics?