How is data stored or indexed in Clustered and nonclustered Columnstore Index?

  • hi,

    I would like to know what the underlying data structure of the Nonclustered and clustered Columnstore Index (In-Memory) is.

    I mean the column entries are somehow compressed and loaded into memory in both cases, but how is the data itself structure in the memory?

    Are there indexes created first from columnvalues, or is there a hashtable being constructed from each of the column values ?

    and from the columnvalues, how are the rows built together again? I mean are they numbered sequentielly in parallel for each column? so if I ask 20th entry from column x, then it looks automatically for the 20th column entry in y ?

  • Columnstore Indexes Described

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • I already looked into Microsoft Documentation but that wasnt helpful -.-

    the inner structure of the Columnstore Index and how data in memory is represented is not written there... hope someone can provide

    more sources or informations ....

  • Columnstores aren't in-memory structures. They're disk-based.

    Have you read all the whitepapers on columnstore? If not, hit google, a search for whitepaper columnstore finds them.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • There's not much on the detailed internals of the in-memory indexes yet. The non-clustered index is still a b-tree, sort of. They also call it a bw-tree. The bw stands for buzz word because it's not really a b-tree the way we think of them. There's also the hash indexes. The basic structures there are well defined. Your key is hashed and put into the number of buckets that you define for that index. Define too many buckets, you have a very wide index that gets hurt on performance because it has to look through so much stuff to find your data. Define too few, you get a lot of scans within a bucket because of hash collisions, again, hurting performance.

    But, the nitty-gritty, these are the compression algorithms we're using, kind of information, I haven't seen it.

    I'm addressing just the in-memory stuff. Columnstore is different.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..

    I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...

    I didn't really get behind it yet... I will do some more research about it

  • freddyism00 (9/12/2014)


    hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..

    I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...

    I didn't really get behind it yet... I will do some more research about it

    Yes I agree. I think all the "in memory" references really are not in memory only it is just a different way Microsoft stores the data on disk.

    From article

    http://blogs.msdn.com/b/mvpawardprogram/archive/2014/06/16/updateable-column-store-indexes-in-sql-server-2014.aspx

    The column store indexes are part of Microsoft In-Memory Technologies because they use xVelocity engine for data compression optimization and its implementation is based on a columnar structure such as PowerPivot and SSAS Tabular. Data in column store indexes are organized by column, each memory page stores data from a single column, so each column can be accessed independently. This means that SQL Server Storage Engine will be able to fetch the only columns it needs. In addition, data is highly compressed, so more data will fit in memory and the I/O operations can greatly decrease.

  • freddyism00 (9/12/2014)


    hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..

    I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...

    I didn't really get behind it yet... I will do some more research about it

    Well, they're "in memory" as much as a standard index is there as well. But they're not a part of the in-memory tables and indexes introduced in SQL Server 2014 (formerly code named Hekaton).

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • That's what you get when marketing gets to name features.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ok I think I kind of figured it out, it's sort of memory-optimized.... they compress the columns into segments and store them on disk as BLOB's, the columnstore index itself is in-memory just as usual indexes, but the data itself is still on disk, but when needed in Querying or analysis services, the data (or rather the segment BLOB's to be more precise) can be leveraged partially as needed into memory for further processing, am I right?

    and I totally agree on the marketing part, this is very confusing, and I guess that's not the only thing on Microsoft technologies with confusing names... PowerPivot, PowerPivot, DTS, BIDS, SSDT, SSDT-BI...etc. for beginners this is really hard to go through.. often there is not even a clear documentary or specification about their technology.. like in the SSDT/-BI confusion:

    http://www.jamesserra.com/archive/2012/04/ssdt-installation-confusion/

    If Steve Jobs would have dont this to their customers.. they probably wouldn't exist anymore today

  • Quick note, Alberto Ferrari gave a very good talk on optimizing DAX queries at the last SQLBits where he explained in good details the Vertipaq/xVelocity compression, rowgroups, deltastores etc. Also Stephane Haby wrote a nice blog piece on the matter. Not certain that this what you are after but it's relative and informative.

    😎

  • freddyism00 (9/13/2014)


    ok I think I kind of figured it out, it's sort of memory-optimized....

    Not really.

    they compress the columns into segments and store them on disk as BLOB's, the columnstore index itself is in-memory just as usual indexes, but the data itself is still on disk, but when needed in Querying or analysis services, the data (or rather the segment BLOB's to be more precise) can be leveraged partially as needed into memory for further processing, am I right?

    Normal indexes are read into memory to be processed, they have to be, a computer's CPU can't operate directly on data on disk, it has to be in memory to be processed at all. We don't call normal clustered and nonclustered indexes 'in memory' just because they're read into the buffer pool before processing.

    Stop trying to classify column store as 'in memory' and you'll probably find it easier to understand how it works.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply