Serialising tabular and relational data.

The way that we move or store tabular data in files isn't particularly clever. Forget, for a moment, the conceptual gulf between the Relational and Object-oriented view of data, for we can't even dump data to disk in an open-standard way that can subsequently be read reliably by a range of other applications. We can't transfer data from one relational database to another with any sort of elegance and efficiency. We strain to copy data to or from spreadsheet programs or statistical analysis tools. When we do so, we represent data in unreliable ways that would have been met with a grin of familiar recognition by the geeks of our parents' generation. We load data into SQL Server from third-party databases, table by blooming table, using methods that seem futuristic until we look under the covers and find the bulk load engine of BCP clicking along much as it has for the past twenty or so years.

There isn't a good existing standard format for tabular data. The CSV format goes back to the sixties, and is yet to receive an agreed international standard. It can't be reliably used without an accompanying metadata file. There are proposals for a standard for representing fielded text which has everything that is required except for widespread adoption. It is just about possible to use YAML to copy small databases between two different relational systems but it isn't easy and impossibly verbose.

What's required? It must be Unicode-text based rather than binary: There must be no chance of delimiter collision: The metadata must be part of the file: It should be reasonably compact and shouldn't rely on being zipped: The order of columns, as well as the column labels, must be described in the metadata along with a description of the datatype: It should be possible to supply some indication of keys and constraints: It should be able to support binary data and blobs efficiently: It should cope with column values that are XML or JSON: It should, if possible be a superset of an existing de-facto standard so that it can be used even when not explicitly supported. It should be easy to implement. Well, something like that would suit me, though I'd probably add the rather unkind condition that it mustn't use XML in any of its guises.

This isn't a good standard for a database vendor to initiate. The universities would distain such a practical task. Surely, the definition of such a standard is down to the community of users. I'd be fascinated to read other views on the subject.

One Milly-yon IOPS - Database Weekly (Sept 1, 2008)

by Steve Jones

SQLServerCentral.com

Editorial

IBM is testing a new hardware disk array that vastly outperforms any current arrays. What does this mean for the database world?

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2008-09-01

169 reads

Discuss

Solid State Replacements - Database Weekly (August, 18, 2008)

by Steve Jones

SQLServerCentral.com

Editorial

New SSD (Solid State Device) hard drives are infiltrating enterprise servers more and more, but are they a good fit for database instances? Steve Jones comments a bit on some news from this past week.