Is Big Data good for Data Professionals?

Question

Is Big Data good for Data Professionals?

Viewing 5 posts - 46 through 49 (of 49 total)

You must be logged in to reply to this topic. Login to reply

Eric M Russell SSC Guru Points: 125255 More actions · Answer 1

Sergiy - Tuesday, March 28, 2017 10:48 PM
Semantics or no semantics - can you answer the simple question?Why columnar storage (any implementation of it) is not updateable?

When ColumnStore tables were first introduced in SQL Server 2012, it wasn't updatable, but starting with version 2014 it can now be inserted, updated, and deleted just like a traditional RowStore table.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 2

xsevensinzx

One Orange Chip

Points: 25560

March 29, 2017 at 7:41 pm

#1935626

Google BigQuery also supports Updates now too.

Sergiy SSC Guru Points: 110192 More actions · Answer 3

When you add a word to an existing file and then immediately search for that word in the file you get a positive response.

This is a true data scan.

Whatever changed in the data is immediately reflected in the scan output.

Does it work like that with columnstore updates?

No.

Which means - there is no source data scan in Big Data queries.

The source data needs to be "prepared" before it's made available for scans.

You don't like the term "normalization"? Ok, let it be "compression".

You don't like "index"? Ok, let's name it "columnar store".

Whatever sells.

_____________
Code for TallyGenerator

Eric M Russell SSC Guru Points: 125255 More actions · Answer 4

Sergiy - Thursday, March 30, 2017 5:03 AM
When you add a word to an existing file and then immediately search for that word in the file you get a positive response.This is a true data scan.Whatever changed in the data is immediately reflected in the scan output.Does it work like that with columnstore updates?No.Which means - there is no source data scan in Big Data queries.The source data needs to be "prepared" before it's made available for scans.You don't like the term "normalization"? Ok, let it be "compression".You don't like "index"? Ok, let's name it "columnar store".Whatever sells.

When a ColumnStore table is inserted/updated/deleted, the modification is first persisted temporarily to a row based DeltaStore and then depending on the COMPRESSION_DELAY <MINUTES> setting or when the DeltaStore has reached it full point (1 million rows by default), the Tuple Mover process will compress the rows into a ColumnStore block. Is this what you're referring to? Row modifications contained in the DeltaStore are still query-able and seamlessly integrated with the ColumnStore, but only with a higher degree of latency.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Sergiy SSC Guru Points: 110192 More actions · Answer 5

However you name it - the source data requires preparing/restructuring after being loaded and before it's made available for "scanning".

And however you name that preparation process, it's still nothing else but indexing.

Because the BigTable and all its descendants are nothing more that big indexes.

https://courses.cs.washington.edu/courses/cse444/14sp/lectures/lecture26-bigtable.pdf

You may use words "map", "key", "block", etc., but it's all other names for "index".

Big data systems go through your data, normalise it (sorry, should have said "compress") and automatically create for you that index you failed to identify and create in a relational database.

_____________
Code for TallyGenerator