Monday, March 19, 2012

Degenerate dimension vs synthetic lookup

I'm trying to get a handle on a modeling question of sorts. Basically I have a fact table with ~ 10 million rows. This table has a column called lot_id for each of several components (lets say componentn_lot_id to be generic). My componentn_lot_id has ~ 1.5 million distinct values. Theoretically speaking is it better to treat this attribute as a degenerate dimension of the fact table or would it be better to synthesize a lookup by creating a named query like:

select component1_lot_id from fact_table group by 1

and using this as a lookup table for a regular dimension? I have about 12 dimensions like this, so the simplicity of the degenerate dimension method is nice, but seems to have a pretty high price to pay when processing the cube (though the lookup method may also be costly). Are there other issues to keep in mind here?

I'm seeking advice as to rules of thumb (or techniques) that should be used when faced with this decision. I am currently using AS2005.

Thanks,

Keehan

Are you using ROLAP or MOLAP?

For MOLAP storage, very probably there's no difference regarding processing time.

You might want to take some measurements anyway, if only to make sure scalability and sizing requirements are met...

Hope this helps

No comments:

Post a Comment