I recently started a new contract, and inherited a new set of databases. The firm never really had a DBA in place, so I needed to evaluate the overall object (and data) design. Many, many moons ago, I developed a procedure for this type of assessment. I analyze the table definition, and the data within each table, in order to confirm whether the table design actually suits the data being stored.
First, we will look at the min, max and average data lengths of the existing data, in comparison to the actual column datalength definition. See here:
/* GETS MIN,MAX,AVG DATALENGTHS OF ALL COLUMNS IN A GIVEN TABLE. */
DECLARE @table SYSNAME
SET @table = 'customer'
SET NOCOUNT ON;
DECLARE @i INT
DECLARE @cnt INT
DECLARE @str VARCHAR(8000)
DECLARE @column_name VARCHAR(50)
SET @i=0
SELECT IDENTITY(SMALLINT,1,1) AS col_num, @table TableName, column_name,null [MIN],NULL [MAX], NULL [AVG],data_type,character_maximum_length into #t
FROM information_schema.columns
WHERE table_name=@table
SELECT @cnt=count(column_name) FROM #t
WHILE @i< @cnt+1
BEGIN
SELECT @column_name=column_name FROM #t WHERE col_num=@i
SET @str = 'UPDATE #t
SET [MIN]=a.[MIN],
[MAX]=a.[MAX],
[AVG]=a.[AVG]
FROM
(SELECT min(len(isnull(' + @column_name+ ',0))) [MIN] ,
MAX(len(isnull(' + @column_name+ ',0))) [MAX],AVG(len(isnull(' + @column_name+ ',0))) [AVG] from ' + @table + ' ) a , #t b where b.col_num='+ cast(@i as varchar)
EXEC (@str)
SET @i=@i+1
END
SELECT * FROM #t
SET NOCOUNT OFF;
DROP TABLE #t
In the above, I am reviewing the 'customer' table. I have returned only a few of the attributes, which you will see were created all of the same data type and length (yuck):
col_num tablename column_name min max avg datatype character_maximum_length
1 customer MerchantID 5 5 5 varchar 2552 customer MerchStoreID 5 11 6 varchar 255
3 customer CustID 7 8 7 varchar 255
4 customer CustSSN 6 9 8 varchar 255
5 customer CustName 8 38 14 varchar 255
As you will see, the min, max and avg lengths are returned, and you can see your VARCHAR(255) datalengths are just a bit extreme. Even more, take a look at that MerchantID --- all values are a datalength of 5. Wouldn't this be better typed as CHAR(5)? The overhead of the variable datatype is just not worth it here, where all values are of the same datalength.
Basically, I use the above method to assess the current table definition, and the data within. Another step that is necessary is to check for NULL or blank values. I often find attributes where all values are NULL. In that case, why keep the column? If you don't use it, get rid of it. Unused attributes are not free -- the physical data storage as well as the data manipulation around the NULLs. I use this method to cursor through each table column, returning counts on any records that are found to be NULL:
SET NOCOUNT ON;
USE MyDatabase;
DECLARE @tbl VARCHAR(35),@col VARCHAR(1000),@sql NVARCHAR(4000)
DECLARE Cur1 CURSOR FAST_FORWARD FOR
SELECT o.name tbl, c.name col
FROM sys.sysobjects o INNER JOIN sys.syscolumns c ON o.id = c.id
WHERE o.xtype='U'
ORDER BY o.name
OPEN Cur1
FETCH NEXT FROM Cur1 INTO @tbl, @col
WHILE @@fetch_status = 0
BEGIN
SET @sql = '
IF(SELECT COUNT(*) FROM dbo.'+@tbl+' WITH (NOLOCK) WHERE '+@col+' IS NULL) > 0
SELECT COUNT(*) AS '''+@tbl+','+@col+''' FROM dbo.'+@tbl+' WITH (NOLOCK) WHERE '+@col+' IS NULL
'
PRINT @sql
-- EXECUTE sp_executesql @sql --UNCOMMENT TO ACTUALLY RUN
FETCH NEXT FROM Cur1 INTO @tbl, @col
END
CLOSE Cur1
DEALLOCATE Cur1
SET NOCOUNT OFF;
The way it is written, you will print the output, rather than actually execute it, like this:
IF(SELECT COUNT(*) FROM dbo.company WHERE companyID IS NULL) > 0
SELECT COUNT(*) AS 'company,companyID' FROM dbo.company WHERE companyID IS NULL
IF(SELECT COUNT(*) FROM dbo.company WHERE name IS NULL) > 0
SELECT COUNT(*) AS 'company,name' FROM dbo.company WHERE name IS NULL
You can either copy/paste that into another window, to run it, or you can uncomment the EXEC statement, and execute again. I encourage you to try it out in the dev bed, or after hours. Let me know if you have any questions.
No comments:
Post a Comment