Saturday, April 30, 2011

Disable an Index

The option to disable an index became available in v2005.  There are multiple reasons for disabling an index, but I only do it for a large bulk upload, or to determine whether the index is necessary, and even useful at all.  I want to show you quickly how to disable and re-enable an index, but there are some key points to be aware of:

  • Disabling a clustered index on a view will DELETE the data.  ouch!
  • Disable a clustered index, the data is still present, but it cannot be used.  This is because the leaf level of the clustered index is the actual table data itself. 
  • Disable a non-clustered index, the index actually remains on disk, and there is no physical disk change at all.  
  • A primary key cannot be disabled if the associated table is published within transactional replication.  (Makes sense.)
  • To ENABLE the index, it has to be rebuilt.  This is because when we DISABLE an index, the definition remains in the meta data, but the index updates are not performed.  It has to be rebuilt, and brought current, before it can be enabled.

Indexes can be disabled and enabled within the SSMS gui, but this is the tSQL method:

  DISABLE Index:
ALTER INDEX IndexName ON dbo.TableName DISABLE;

 ENABLE Index:
   ALTER INDEX IndexName ON dbo.TableName REBUILD;
  (IndexKeyName ASC)

This is just a quick reference, but I encourage you to also read these guidelines:
Guidelines for Disabling Indexes and Constraints

Wednesday, April 27, 2011

Find LOWERCASE values

Most of us use default SQL Server collation, which is SQL_Latin1_General_CP1_CI_AS, and is case insensitive.  That means that 'aa' is treated the same as 'AA'.  Unfortunately, even with this collation, we often wind up with situations where we need to actually differentiate between the two values.  Check this out:

USE MyDatabase;

CREATE TABLE dbo.ShortLocates (
  LocateDate DATE,Symbol VARCHAR(25),Shares INT
INSERT dbo.ShortLocates (LocateDate,Symbol,Shares)
SELECT '04/27/2010','aa',50000
SELECT '04/27/2011','AA',50000

SELECT * FROM dbo.ShortLocates
WHERE Symbol = 'aa'

    LocateDate     Symbol   Shares

   2010-04-27         aa     50000
   2011-04-27        AA     50000

Try this to differentiate between lower and upper case:

SELECT LocateDate,Symbol,Shares
FROM dbo.ShortLocates

   LocateDate     Symbol   Shares

   2010-04-27         aa     50000

Friday, April 22, 2011

Concatenate values with FOR XML PATH

In this example I will show you how to use the FOR XML PATH clause to perform row concatenation, in v2005 and v2008.  Let's first create a 'Media' table, and insert a few records:

CREATE TABLE dbo.Media (
  media_id INT PRIMARY KEY,
  media_type VARCHAR(35)

INSERT dbo.Media VALUES (1,'Catalog');
INSERT dbo.Media VALUES (2,'Book');
INSERT dbo.Media VALUES (3,'Magazine');
INSERT dbo.Media VALUES (4,'DVD');
INSERT dbo.Media VALUES (5,'Poster');

Of course, our data is in the table in this format:

  media_id       media_type
   1                   Catalog
   2                   Book
   3                   Magazine
   4                   DVD
   5                   Poster

We want to return all media type values back in a single line, so we just select media_type from the table, and order the result set by media_type, using FOR XML PATH.  We don't want to actually generate the XML, so we place an empty element in the PATH, and because we've added the comma separator, it forces SQL to concatenate the XML as a plain string:

   SELECT ', ' + media_type
   FROM dbo.Media
   ORDER BY media_type
   FOR XML PATH('');

Results:             , Book, Catalog, DVD, Magazine, Poster

Now we just need to use the STUFF function to get rid of that first comma, like this:

    SELECT ', ' + media_type
    FROM dbo.Media
    ORDER BY media_type
    FOR XML PATH('')),1, 1, '');

Results:               Book, Catalog, DVD, Magazine, Poster

Wednesday, April 20, 2011

SSIS File System Copy Task, Sample

One of my contracts uses an online backup solution to handle backups. Let's call it 'BackupSoftware'. It freezes I/O on the database files, and it has no way of distinguishing between old and new backup files. Basically, it attempts to collect ALL files every time it runs, and then fails due to storage limitations.  

I decided to alter my database maintenance routine to work around this problem.  The SQL backups were performed in the last step of the existing routine, and I decided I'd need two more steps to ensure the backup client only targets one day of files, at any time:

  New steps to maintenance job:
    1) Delete any/all files in NewDirectory
    2) Copy the most current .bak/.trn files into this directory

I wanted to keep things simple, and just add another step to my current SQL Server Agent maintenance job, so I decided to use SQL Server Integration Services.

First, I developed this statement to find the most current .bak files, and any associated .trn files:  Most recent backup files

It works great to find them, but how do we then COPY those files into the NewDirectory?  I'm a big fan of keeping it in SQL, so I chose the SSIS File System Copy task, which is written to manipulate files and directories in the file system.   See here, File System Task.

Let me provide a little detail on the package, and then I'll let you take it from there.  You can download the package here, and test it out yourself: CopyFiles.dtsx

These are the components in the control flow:
  One File System Delete directory content task, 'NewDirectory Purge'
  One SQL Task, 'Identify Current Files'
  One File System Copy task, 'Copy Files to NewDirectory', which resides in a Foreach Loop Container
The package has three variables:
  target_directory, string, C:\MSSQL\Backup\NewDirectory\
  physical_device_name, object, System.Object
  current_file, string, ''  (no kidding, that is what it is)

The Most recent backup files statement is the SQLStatement in my SQL Task.
The SQL Task editor has no parameter mapping (because there are no input parms).
The SQL Task Editor result set, Result Name = 0, Variable Name = User::physical_device_name.

The Foreach Loop enumerator is Foreach ADO Enumerator, the configuration has User::physical_device_name as the ADO object source variable, and 'Rows in the first table' as the enumeration mode.  The User::current_file is defined within the Foreach Loop variable mappings as variable, Index = 0.

The File System Task in the Foreach Loop is Operation = Copy File.
IsDestinationPathVariable = True, the DestinationVariable = User::target_directory.
IsSourcePathVariable = True, the SourceVariable = User::current_file.
Initally my File System Task errored with this:
    'Variable "current_file" is used as a source or destination and is empty'

This was because my current_file variable had no value, and didn't know how to get started.  I assumed it was going to get its values from the Foreach Loop, and didn't think anything else was necessary.  When I changed it to '', the validator was happy enough to start going through the loop.  Didn't matter what the value was, because once it started, it got the real filenames, and kept moving through the loop. 
I hope my details are in the right order, and make sense.  There are good annotations in the package, so it may be best just to open it up and take a look.  I think this is a very simple, yet effective way to manipulate the file system, through SSIS (SQL Server Integration Services).  

Please check it out, and let me know if you have any questions or problems.  Please also let me know if I can provide more assistance on this, or any of your other SQL Server projects.

Tuesday, April 19, 2011

Execute sp_spaceused on ALL tables

Per BOL, sp_spaceused displays the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database.  Very handy, but unfortunately, it has to be executed selectively, one table at a time.  It can be run without the tablename (@objname), but it will return different, general details for the entire database.  This is just a handy method to run it for ALL tables in the database:

   tablename    sysname,
   [rows]       int,
   reserved_S   varchar(32),
   data_S       varchar(32),
   index_size_S varchar(32),
   unused_S     varchar(32),
   reserved_KB  as cast(left(reserved_s,charindex(' ',reserved_s)-1) as int),
   data_KB as cast(left(data_S,charindex(' ',data_S)-1) as int),
   index_size_KB as cast(left(index_size_S,charindex('',index_size_S)-1) as int),
   unused_KB as cast(left(unused_S,charindex(' ',unused_S)-1) as int)

  INSERT #Space(tablename,[rows],reserved_s,data_s,index_size_s,unused_s)
  EXEC sp_msforeachtable 'exec sp_spaceused [?]'

  SELECT tablename,[rows],reserved_KB,data_KB,index_size_KB,unused_KB
  FROM #space
  ORDER BY reserved_KB DESC

  DROP TABLE #space

These are the first few results when run against AdventureWorks:

   tablename              rows    reserved_KB   data_KB     index_size_KB     unused_KB 
   Individual                            18484      80984    29608     50616      760 
   SalesOrderDetail                 121317   19528     12216       6560      752 
   TransactionHistory               113443   12336       7840       3904       592 
   SalesOrderHeader               31465     10280       6888      2992        400 
   TransactionHistoryArchive    89253       9856       6176      3136        544 
   WorkOrderRouting               67131       8464       6816      1248        400  

Locate the most current database backup files

I can't count the times I've needed to go to find my most current .bak file, or even .trn file.  Maybe it was a regularly scheduled restore of the most recent database backup, or I just needed to copy database .bak files from one location to another.  (File System Copy Task)  As you know, you don't always have the timestamp in the database and transaction log backup file names.  This is handy little piece to quickly identify the most recent backup files:

USE msdb;

   database_name varchar(35),latest_date datetime )
INSERT @Latest (database_name,latest_date)
SELECT database_name,MAX(backup_finish_date) [latest] 
FROM dbo.backupset
GROUP BY database_name
ORDER BY database_name

    dbo.backupset b JOIN dbo.backupmediafamily m 
    ON b.media_set_id = m.media_set_id JOIN @Latest l
      ON b.database_name = l.database_name
      AND b.backup_finish_date = l.latest_date

NOTE:  See that reference in the JOIN to b.backup_finish_date = l.latest_date.  As is, that will only return the files associated with the most current backupset.backup_finish_date.  That won't be adequate if you have database AND transaction log backups.  In that case, you'd only return the most current .trn file.  I replaced it with this, in order to retrieve the most current .bak AND associated .trn files, for all databases, per instance:

   AND CONVERT(VARCHAR(10),b.backup_finish_date,121) =     

Tuesday, April 12, 2011

Tablediff.exe (Compare Two Datasources)

Need to compare two tables and determine what is missing from table #2?  Or, maybe you’ve got two tables on two different servers, and they need to be in synch?  The Tablediff utility was introduced with SQL v2005, and it can be used to reconcile data between two tables, or even views.  Tablediff.exe can be used to report variances in data AND schema.  Per BOL, it is used to compare the data in two tables for non-convergence, and is particularly useful for troubleshooting non-convergence in a replication topology.

It can be used to perform several different comparison type tasks.  You can look at the data row by row, or you can just compare the row counts.  You can perform just the schema comparison, or the data AND schema.  In my opinion, the best feature of tablediff is that you can generate the script to actually reconcile the differences between the datasources.

I’ve read that it was actually intended for replication, but it can be easily applied to any situation where you’re trying to compare data, schema, or both.  I can’t even begin to count the hours I have spent on ‘recon’ logic within the Chicago HFT (high frequency trading)  arena…  L.

Tablediff.exe can be found at \Program Files\Microsoft SQL Server\90\COM\TableDiff.exe, where SQL Server is installed.  This example will compare two tables in two different databases on two different servers, and it will generate the file that can ultimately be used to reconcile the differences:

"C:\Program Files\Microsoft SQL Server\90\COM\tablediff.exe" -sourceserver sourceservername
                                                                                        -sourcedatabase sourcedatabase
                                                                                        -sourcetable sourcetable
                                                                                        -destinationserver destinationservername 
                                                                                        -destinationdatabase destinationdatabase
                                                                                        -destinationtable destinationtable
                                                                                        -et ChangeScript
                                                                                        -f C:\Temp\ChangeScript.sql

Please take a look at BOL for more information:

Forced Parameterization

Forced Parameterization is a fabulous tool for those DBAs fighting all of the ad-hoc coming from the middle tier.  Those developers continually insisting that it wasn’t them…  ;-).

Forced parameterization is one of the best on the list of SQL Server improvements, in my opinion.  It became available in v2005, and it allows SQL to take advantage of query plan reuse, and reduce the overhead required by the compilation of repeat query executions.  For cases with high query compilations, this option can help response time and performance within your applications.  It is very useful in systems with a lot of ad-hoc (non-procedural executions), and it can reduce the memory required by the procedure cache.

I, personally, have seen improvements greater than 28%, simply due to this option.  It allows queries to be parameterized, and cached for reuse.  It practically proceduralizes them!  This reduces the overhead typically used by these queries, by replacing the literal values with parameters. 


Forced Parameterization can be enabled with the above statement, and also manually, within SSMS\Database Properties\Miscellaneous – Parameterization.  It is typically enabled at the database level, but it can also be enabled it within individual queries, rather than the entire database.  You would do this by setting a plan guide for the query you are targeting. 

I won’t do it justice, so please take a look at Forced Parameterization in Books Online, and let me know what you think: