Is Big Data good for Data Professionals?

  • Sergiy - Tuesday, March 28, 2017 10:48 PM

    Semantics or no semantics - can you answer the simple question?Why columnar storage (any implementation of it) is not updateable?

    When ColumnStore tables were first introduced in SQL Server 2012, it wasn't updatable, but starting with version 2014 it can now be inserted, updated, and deleted just like a traditional RowStore table.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Google BigQuery also supports Updates now too.

  • When you add a word to an existing file and then immediately search for that word in the file you get a positive response.

    This is a true data scan.

    Whatever changed in the data is immediately reflected in the scan output.

    Does it work like that with columnstore updates?

    No.

    Which means - there is no source data scan in Big Data queries.

    The source data needs to be "prepared" before it's made available for scans.

    You don't like the term "normalization"? Ok, let it be "compression".

    You don't like "index"? Ok, let's name it "columnar store".

    Whatever sells.

    _____________
    Code for TallyGenerator

  • Sergiy - Thursday, March 30, 2017 5:03 AM

    When you add a word to an existing file and then immediately search for that word in the file you get a positive response.This is a true data scan.Whatever changed in the data is immediately reflected in the scan output.Does it work like that with columnstore updates?No.Which means - there is no source data scan in Big Data queries.The source data needs to be "prepared" before it's made available for scans.You don't like the term "normalization"? Ok, let it be "compression".You don't like "index"? Ok, let's name it "columnar store".Whatever sells.

    When a ColumnStore table is inserted/updated/deleted, the modification is first persisted temporarily to a row based DeltaStore and then depending on the COMPRESSION_DELAY <MINUTES> setting or when the DeltaStore has reached it full point (1 million rows by default), the Tuple Mover process will compress the rows into a ColumnStore block. Is this what you're referring to? Row modifications contained in the DeltaStore are still query-able and seamlessly integrated with the ColumnStore, but only with a higher degree of latency.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • However you name it - the source data requires preparing/restructuring after being loaded and before it's made available for "scanning".

    And however you name that preparation process, it's still nothing else but indexing.

    Because the BigTable and all its descendants are nothing more that big indexes.

    https://courses.cs.washington.edu/courses/cse444/14sp/lectures/lecture26-bigtable.pdf

    You may use words "map", "key", "block", etc., but it's all other names for "index".

    Big data systems go through your data, normalise it (sorry, should have said "compress") and automatically create for you that index you failed to identify and create in a relational database.

    _____________
    Code for TallyGenerator

Viewing 5 posts - 46 through 49 (of 49 total)

You must be logged in to reply to this topic. Login to reply