Thursday, May 5, 2011

CURSOR through tables & columns

I recently started a new contract, and inherited a new set of databases.  The firm never really had a DBA in place, so I needed to evaluate the overall object (and data) design.  Many, many moons ago, I developed a procedure for this type of assessment.  I analyze the table definition, and the data within each table, in order to confirm whether the table design actually suits the data being stored.

First, we will look at the min, max and average data lengths of the existing data, in comparison to the actual column datalength definition.  See here:

/*  GETS MIN,MAX,AVG DATALENGTHS OF ALL COLUMNS IN A GIVEN TABLE.  */
DECLARE @table SYSNAME
SET @table = 'customer'
SET NOCOUNT ON;

DECLARE @i INT
DECLARE @cnt INT
DECLARE @str VARCHAR(8000)

DECLARE @column_name VARCHAR(50)
SET @i=0
SELECT IDENTITY(SMALLINT,1,1) AS col_num, @table TableName, column_name,null [MIN],NULL [MAX], NULL [AVG],data_type,character_maximum_length into #t 
FROM information_schema.columns 
WHERE table_name=@table 

SELECT @cnt=count(column_name) FROM #t
WHILE @i< @cnt+1
BEGIN
 SELECT @column_name=column_name FROM #t WHERE col_num=@i
 SET @str = 'UPDATE #t  
 SET [MIN]=a.[MIN],
  [MAX]=a.[MAX],
  [AVG]=a.[AVG] 
 FROM 
(SELECT min(len(isnull(' + @column_name+ ',0))) [MIN] , 
MAX(len(isnull(' + @column_name+ ',0))) [MAX],AVG(len(isnull(' + @column_name+ ',0)))  [AVG] from ' + @table + ' ) a  , #t b  where b.col_num='+ cast(@i as varchar)

 EXEC (@str)
 SET @i=@i+1
END

SELECT * FROM #t

SET NOCOUNT OFF;

DROP TABLE #t

In the above, I am reviewing the 'customer' table.  I have returned only a few of the attributes, which you will see were created all of the same data type and length (yuck):

col_num   tablename     column_name    min     max     avg     datatype  character_maximum_length
1 customer   MerchantID             5        5       5 varchar 255
2 customer              MerchStoreID         5      11       6 varchar 255
3 customer              CustID                     7       8        7 varchar 255
4 customer              CustSSN                 6       9        8 varchar 255
5 customer              CustName              8      38      14 varchar 255

As you will see, the min, max and avg lengths are returned, and you can see your VARCHAR(255) datalengths are just a bit extreme.  Even more, take a look at that MerchantID --- all values are a datalength of 5.  Wouldn't this be better typed as CHAR(5)?  The overhead of the variable datatype is just not worth it here, where all values are of the same datalength.

Basically, I use the above method to assess the current table definition, and the data within.  Another step that is necessary is to check for NULL or blank values.  I often find attributes where all values are NULL.  In that case, why keep the column?  If you don't use it, get rid of it.  Unused attributes are not free -- the physical data storage as well as the data manipulation around the NULLs.  I use this method to cursor through each table column, returning counts on any records that are found to be NULL:

SET NOCOUNT ON;


USE MyDatabase;
DECLARE @tbl VARCHAR(35),@col VARCHAR(1000),@sql NVARCHAR(4000)


DECLARE Cur1 CURSOR FAST_FORWARD FOR
SELECT o.name tbl, c.name col
FROM sys.sysobjects o INNER JOIN sys.syscolumns c ON o.id = c.id
WHERE o.xtype='U'
ORDER BY o.name


OPEN Cur1
FETCH NEXT FROM Cur1 INTO @tbl, @col
WHILE @@fetch_status = 0
BEGIN


SET @sql = '
IF(SELECT COUNT(*) FROM dbo.'+@tbl+' WITH (NOLOCK) WHERE '+@col+' IS NULL) > 0
SELECT COUNT(*) AS '''+@tbl+','+@col+''' FROM dbo.'+@tbl+' WITH (NOLOCK) WHERE '+@col+' IS NULL
'


   PRINT @sql
-- EXECUTE sp_executesql @sql --UNCOMMENT TO ACTUALLY RUN
FETCH NEXT FROM Cur1 INTO @tbl, @col
END
CLOSE Cur1
DEALLOCATE Cur1


SET NOCOUNT OFF;


The way it is written, you will print the output, rather than actually execute it, like this:

  IF(SELECT COUNT(*) FROM dbo.company WHERE companyID IS NULL) > 0
  SELECT COUNT(*) AS 'company,companyID' FROM dbo.company WHERE companyID IS NULL

  IF(SELECT COUNT(*) FROM dbo.company WHERE name IS NULL) > 0
  SELECT COUNT(*) AS 'company,name' FROM dbo.company WHERE name IS NULL

You can either copy/paste that into another window, to run it, or you can uncomment the EXEC statement, and execute again.  I encourage you to try it out in the dev bed, or after hours.  Let me know if you have any questions.

No comments:

Post a Comment