Sunday, March 6, 2016

How to dis-aggregate summarized data in SQL Server ?

Now that is just weird to me.  Why would you want to 'dis-aggregate' data? Who wants to look at the details?  If you're not an accountant, I mean... For example, instead of this:







You want to see this:











Turns out it's fairly common to need to see the details behind a summarized report. This is a quick example for dis-aggregating data. 

-- load your summarized data
CREATE TABLE #aggregated (
  [filename] VARCHAR(25),downloads INT,downloaded date,data1 VARCHAR(15),
  data2 VARCHAR(15),data3 VARCHAR(15),processed bit
)
INSERT #aggregated ([filename],downloads,downloaded,data1,data2,data3,processed)
VALUES
       ('filename1.txt',5,'11/15/2015','md1','md2','md3',0),
       ('filename1.txt',79,'12/31/2015','md1','md2','md3',0),
       ('filename1.txt',24,'1/15/2016','md1','md2','md3',0),
       ('filename2.xlsx',49,'10/15/2015','md5','string','string2',0),
       ('filename2.xlsx',25,'2/15/2016','md6','string','string3',0),
       ('filename2.xlsx',39,'12/15/2015','md7','string','string4',0),
       ('filename2.xlsx',8,'1/15/2016','md','string','string5',0);

-- table for the disaggregated details
CREATE TABLE #disaggregated (
  [filename] VARCHAR(25),[action] CHAR(8),downloaded date,data1 VARCHAR(15),
  data2 VARCHAR(15), data3 VARCHAR(15)
)

-- set based approach (rather than cursor)
WHILE EXISTS(SELECT 1 FROM #aggregated WHERE processed = 0)
BEGIN
   DECLARE
      @next INT = (SELECT TOP 1 downloads FROM #aggregated WHERE processed = 0),
      @count INT = 0
      
      WHILE @count < @next
      BEGIN
         INSERT #disaggregated ([filename],[action],downloaded,data1,data2,data3)
         SELECT [filename],'download',downloaded,data1,data2,data3
         FROM #aggregated
         WHERE downloads = @next

         SELECT @count = @count + 1
     END

     UPDATE #aggregated SET processed = 1 WHERE downloads = @next
END

Take a look at the details in your your #disaggregated table:
















Or just run this to see that the count of what we just disaggregated corresponds to the numeric 'downloads' value in the aggregated data set.

--  query to show the counts of what was inserted
SELECT a.[filename],a.downloads,count(b.[action]) [details]
FROM #aggregated a INNER JOIN #disaggregated b
  ON a.[filename] = b.[filename]
  AND a.downloaded = b.downloaded
  AND a.data1 = b.data1
  AND a.data2 = b.data2
  AND a.data3 = b.data3
GROUP BY
  a.[filename],
  a.downloads

Should return this:











Hope it helps!  Let me know what you think.

No comments:

Post a Comment