Wednesday, March 7, 2012

Question about an aggregate query

Hi All
Using SQL Server 2000.
Given an hypothetical table (this is similar to a table that exists but
I am omitting details such as contraints and indexes and other fields
for simplicity) of:
Table1
[Field_Index] {int32, unique, not null}
[Field_A] {float, not null, (-999)}
[Field_B] {float, not null, (-999)}
[Field_C] {float, not null, (-999)}
Say I want to create an aggregate view such as the following (I
actually want to use AVG but for simplicity will use SUM):
SELECT SUM([Field_A]) AS SUM_A FROM Table1 WHERE [Field_Index] IN
(1,2,3) AND [Field_A] <> -999;
I am using -999 to mark "bad" data. This works fine and dandy as
expected.
Suppose now I have data that looks like this:
[Field_Index] [Field_A] [Field_B] [Field_C]
1 1 10 100
2 -999 20 200
3 2 -999 300
Now my query above would return SUM_A=3. Absolutely correct.
BUT, Say I want to execute this query:
SELECT SUM([Field_A]) AS SUM_A, SUM([Field_B]) AS SUM_B, SUM([Field_B])
AS SUM_B FROM Table1 WHERE [Field_Index] IN (1,2,3) AND ?;
I have placed the (?) because I am not sure what to do. In words I
want per column the sum of all rows that <> -999 to give the following
output in one results:
3 30 600
My question is how can I do this? If I use a where clause then I can
either "AND" or "OR" the wheres which would result in:
"AND"
1 10 100
"OR"
Not sure since I don't have access or sql server here at work, but I
think this would not work either.
Now, I have many fields (about 200) that I would want to aggregate this
over.
Can anyone provide suggestions or comments?
thanks
dabenTry using a "case" expression.
select
sum(case when field_a != -999 then field_a else 0 end) as sum_a,
sum(case when field_b != -999 then field_b else 0 end) as sum_b,
sum(case when field_c != -999 then field_c else 0 end) as sum_c
from
table1
WHERE
[Field_Index] IN (1,2,3)
go
AMB
"daben" wrote:

> Hi All
> Using SQL Server 2000.
> Given an hypothetical table (this is similar to a table that exists but
> I am omitting details such as contraints and indexes and other fields
> for simplicity) of:
> Table1
> [Field_Index] {int32, unique, not null}
> [Field_A] {float, not null, (-999)}
> [Field_B] {float, not null, (-999)}
> [Field_C] {float, not null, (-999)}
>
> Say I want to create an aggregate view such as the following (I
> actually want to use AVG but for simplicity will use SUM):
> SELECT SUM([Field_A]) AS SUM_A FROM Table1 WHERE [Field_Index] IN
> (1,2,3) AND [Field_A] <> -999;
> I am using -999 to mark "bad" data. This works fine and dandy as
> expected.
> Suppose now I have data that looks like this:
> [Field_Index] [Field_A] [Field_B] [Field_C]
> 1 1 10 100
> 2 -999 20 200
> 3 2 -999 300
> Now my query above would return SUM_A=3. Absolutely correct.
> BUT, Say I want to execute this query:
> SELECT SUM([Field_A]) AS SUM_A, SUM([Field_B]) AS SUM_B, SUM([Field_B])
> AS SUM_B FROM Table1 WHERE [Field_Index] IN (1,2,3) AND ?;
> I have placed the (?) because I am not sure what to do. In words I
> want per column the sum of all rows that <> -999 to give the following
> output in one results:
> 3 30 600
> My question is how can I do this? If I use a where clause then I can
> either "AND" or "OR" the wheres which would result in:
> "AND"
> 1 10 100
> "OR"
> Not sure since I don't have access or sql server here at work, but I
> think this would not work either.
>
> Now, I have many fields (about 200) that I would want to aggregate this
> over.
> Can anyone provide suggestions or comments?
> thanks
> daben
>|||Daben,
Try:
SELECT AVG(NULLIF(FIELD_A,-999))AS 'AVG FIELD1',AVG(NULLIF(FIELD_B,-999))AS
'AVG FIELD2',AVG(NULLIF(FIELD_C,-999))AS 'AVG FIELD3'
FROM YOURTABLENAME
HTH
Jerry
"daben" <dabenpb@.yahoo.com> wrote in message
news:1129058732.374912.227020@.o13g2000cwo.googlegroups.com...
> Hi All
> Using SQL Server 2000.
> Given an hypothetical table (this is similar to a table that exists but
> I am omitting details such as contraints and indexes and other fields
> for simplicity) of:
> Table1
> [Field_Index] {int32, unique, not null}
> [Field_A] {float, not null, (-999)}
> [Field_B] {float, not null, (-999)}
> [Field_C] {float, not null, (-999)}
>
> Say I want to create an aggregate view such as the following (I
> actually want to use AVG but for simplicity will use SUM):
> SELECT SUM([Field_A]) AS SUM_A FROM Table1 WHERE [Field_Index] IN
> (1,2,3) AND [Field_A] <> -999;
> I am using -999 to mark "bad" data. This works fine and dandy as
> expected.
> Suppose now I have data that looks like this:
> [Field_Index] [Field_A] [Field_B] [Field_C]
> 1 1 10 100
> 2 -999 20 200
> 3 2 -999 300
> Now my query above would return SUM_A=3. Absolutely correct.
> BUT, Say I want to execute this query:
> SELECT SUM([Field_A]) AS SUM_A, SUM([Field_B]) AS SUM_B, SUM([Field_B])
> AS SUM_B FROM Table1 WHERE [Field_Index] IN (1,2,3) AND ?;
> I have placed the (?) because I am not sure what to do. In words I
> want per column the sum of all rows that <> -999 to give the following
> output in one results:
> 3 30 600
> My question is how can I do this? If I use a where clause then I can
> either "AND" or "OR" the wheres which would result in:
> "AND"
> 1 10 100
> "OR"
> Not sure since I don't have access or sql server here at work, but I
> think this would not work either.
>
> Now, I have many fields (about 200) that I would want to aggregate this
> over.
> Can anyone provide suggestions or comments?
> thanks
> daben
>|||Try this:
SELECT
SUM(CASE WHEN col_a <> -999 THEN col_a END),
SUM(CASE WHEN col_b <> -999 THEN col_b END),
SUM(CASE WHEN col_c <> -999 THEN col_c END)
FROM table1 ;
You specified you want an average. Does that mean you want to treat the
-999 values as nulls or zeros for the purposes of the average? That
makes a big difference. Compare the following results:
SELECT
AVG(CASE WHEN col_a <> -999 THEN col_a ELSE 0 END),
AVG(CASE WHEN col_b <> -999 THEN col_b ELSE 0 END),
AVG(CASE WHEN col_c <> -999 THEN col_c ELSE 0 END)
FROM table1 ;
SELECT
AVG(CASE WHEN col_a <> -999 THEN col_a END),
AVG(CASE WHEN col_b <> -999 THEN col_b END),
AVG(CASE WHEN col_c <> -999 THEN col_c END)
FROM table1 ;
That last query can also be written:
SELECT
AVG(NULLIF(col_a,-999)),
AVG(NULLIF(col_b,-999)),
AVG(NULLIF(col_c,-999))
FROM table1 ;
By the way, 200 columns of numerics suggests a very poor table design.
Are you sure this is properly normalized?
Also, it's more conventional to refer to "columns" rather than "fields"
in SQL Server. Fields are usually understood to be a quite different
concept.
David Portas
SQL Server MVP
--|||What about this
I am not sure if it will create problems (I am sure one of the MVP's can
answer that)
Take a look at the code below
--testing
declare @.c float
select @.c =-999
select @.c,replace(Field_A,-999,0)
--try this
SELECT SUM(replace(Field_A,-999,0)) AS SUM_A,
SUM(replace(Field_B,-999,0)) AS SUM_B,
SUM(replace(Field_C,-999,0)) AS SUM_C
FROM Table1
WHERE [Field_Index] IN (1,2,3)
----
--
“I sense many useless updates in you... Useless updates lead to
defragmentation... Defragmentation leads to downtime...Downtime leads to
suffering..Defragmentation is the path to the darkside.. DBCC INDEXDEFRAG an
d
DBCC DBREINDEX are the force...May the force be with you"
-- http://sqlservercode.blogspot.com/
"daben" wrote:

> Hi All
> Using SQL Server 2000.
> Given an hypothetical table (this is similar to a table that exists but
> I am omitting details such as contraints and indexes and other fields
> for simplicity) of:
> Table1
> [Field_Index] {int32, unique, not null}
> [Field_A] {float, not null, (-999)}
> [Field_B] {float, not null, (-999)}
> [Field_C] {float, not null, (-999)}
>
> Say I want to create an aggregate view such as the following (I
> actually want to use AVG but for simplicity will use SUM):
> SELECT SUM([Field_A]) AS SUM_A FROM Table1 WHERE [Field_Index] IN
> (1,2,3) AND [Field_A] <> -999;
> I am using -999 to mark "bad" data. This works fine and dandy as
> expected.
> Suppose now I have data that looks like this:
> [Field_Index] [Field_A] [Field_B] [Field_C]
> 1 1 10 100
> 2 -999 20 200
> 3 2 -999 300
> Now my query above would return SUM_A=3. Absolutely correct.
> BUT, Say I want to execute this query:
> SELECT SUM([Field_A]) AS SUM_A, SUM([Field_B]) AS SUM_B, SUM([Field_B])
> AS SUM_B FROM Table1 WHERE [Field_Index] IN (1,2,3) AND ?;
> I have placed the (?) because I am not sure what to do. In words I
> want per column the sum of all rows that <> -999 to give the following
> output in one results:
> 3 30 600
> My question is how can I do this? If I use a where clause then I can
> either "AND" or "OR" the wheres which would result in:
> "AND"
> 1 10 100
> "OR"
> Not sure since I don't have access or sql server here at work, but I
> think this would not work either.
>
> Now, I have many fields (about 200) that I would want to aggregate this
> over.
> Can anyone provide suggestions or comments?
> thanks
> daben
>|||Hi David
Thanks for the response.
The answer to your first question is; I want to treat -999 values as
Null: ie, I don't want them to influence the average.
In regards to you second statement: Well, yes I am sure I am not
properly normalized. But I think that I am to best that the structure
of the data allows. The data is Earth Science data and a table
represents measurements for a given instrument. Each instrument can
have multiple channels of data: these are the columns. Probably a
little too much info, but the data needs dictate the following type of
logic:
Project Table
Station Table (multiple stations per project)
Event Table (multiple events per station, at its simplest each
event corresponds to a unique measurement suite)
Meas Table Log (each event can be replicated)
Meas Table Data (this is where the data is as columns,
rows are depth in this case or just time)
Now the query I am specifying essentially aggregates an avg of all meas
table data per time for each station. Such as:
Select AVG(Meas_1) FROM tblMeasData (IGNORING JOINS) GROUP BY Project,
Station, Depth WHERE Project='x', Station='y', Depth='z'
I use -999 to indicate data that is essentially bad for whatever
reason. I want to avg all data that is not -999.
daben|||Hi Jerry
Am I correct in interpreting that NULLS will not be included in
averaging then based on your post and the other posts?
ie AVG(2,4,NULL) = 3
NOT 2
daben|||Daben,
Per your data:
Field_Index] [Field_A] [Field_B] [Field_C]
1 1 10 100
2 -999 20 200
3 2 -999 300
The resultset of the query would be:
1 15 200
HTH
Jerry
"daben" <dabenpb@.yahoo.com> wrote in message
news:1129063783.398342.67020@.z14g2000cwz.googlegroups.com...
> Hi Jerry
> Am I correct in interpreting that NULLS will not be included in
> averaging then based on your post and the other posts?
> ie AVG(2,4,NULL) = 3
> NOT 2
> daben
>|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. Your personal narrative languagfe is nice and I am sure
that you can read it and compile it, but It is very
hard to debug code when you do not let us see it.
Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files. If someone can guess what you
meant, that is nice, but you are every programmers nightmare. You have
no specs and do not even know the vocabulary for RDBMS.
Yeah, right (snort, snort). Why is there any bad data in the first
place? Missing data I understand and I woudl use a NULL that is
dropped out of an aggregate by definition, but why are you keeping data
that is known to be bad? This insane design will need to use CASE
expressions to clean out the dirty data you never should have in the
first place.
Also why don't you know enough SQL to use "<>" instead of "!=" for
non-equality? And why do you use proprietary square brackets? In 20+
years of SQL programming, I have seldom used FLOAT or REAL data types
-- are you doing an engineering application where that choice would
make sense?|||Um. Where to begin. Not sure why this needed your "input", since it
was absoutely nothing, but to answer a few of your statements in no
particular order:
+ I never used "!=" in any of my posts. If you are going to post a
flame get your facts right.
+ I never said a table was a file.
+ In real world data (ie data collected in the field -- ie out of the
lab, out of the house, in this case definetly not a column) you can
collect data with a sensor and it may be a stream of data with 3
channels (fields, columns, results). Well Channel 1 and 2 can be
"good" but a power spike could make channel 3 bad for that given record
(row, data for that given time). One would obviously not throw away
channels 1 and 2 of good data now would they, but if you left a NULL
there who knows why you did it. So, -999 indicates that one is a smart
monkey and knows for a fact that that piece of data is bad.
+ I believe I started with the statement "Given an hypothetical table
(this is similar to a table that exists but
I am omitting details such as contraints and indexes and other fields
for simplicity) of:" which clearly indicated that this was an academic
question should be treated as such. It seems that noone else had
problems interpretting, maybe it is in the eye of the beholder.
+ I use square brackets to indicate to me that this is a field of
abstract data (as opposed to a value -- the keyword was being
abstract). Clearly you (as well as everyone else) understood what I
meant.
+ As far as float data type. We use float because it is easy for the
scale of our date to have a large range in magnitude. And yes, I
belive once again that I did specify that this was a scientific
application. Maybe you should have spent 20+ years learning to read.
Now, some may be wondering why I even responding to you post. Good
question. The main reason is to answer in case SOMEONE else was
interested. And finally, in case anyone is still interested, it seems
like Joe likes to fly off the hook:
http://groups.google.com/group/micr...b38c28f3f3a22fd
http://groups.google.com/group/micr...b38c28f3f3a22fd

No comments:

Post a Comment