Defining a Database

  • roger.plowman (12/20/2009)


    Access, for all that it's been maligned, is also a database, and with 90% of the features of SQL server. I've used it to develop and run the backbone of an SMB for nine years now. Tables, indexes, transactions, "stored procedures" (albeit in the front end), and so on.

    SQL Server is simply a more powerful version of the spreadsheet, of Acess. In other words, it's a glorified card catalog and it behooves us to remember that. At the end of the day SQL Server does the same thing a shoebox full of 3x5 cards does. It's just piled higher and deeper is all. (laughing)

    To be pedantic for a moment, no piece of software is a database. The database is simply the collection of data itself. SQL Server is a Database Management System (DBMS). Access is an application development tool. Both very different things from the databases they manage and use. By way of comparison, MS Word is not the same thing as a document, PowerPoint is not a presentation and Excel is not a spreadsheet!

    Much more importantly, there are fundamental differences between a spreadsheet and a table (at least a table in the relational sense). As conventionally understood a spreadsheet is a two dimensional grid in which data is accessed by positional co-ordinates. A relational table (AKA "relation") on the other hand can have any finite, non-negative number of dimensions and its data points (tuples or rows) are accessed by the values of those dimensions and never by position.

    Spreadsheet and relational table are two incompatible concepts. A spreadsheet is of course a database of sorts, but it's obviously not a relational one and it doesn't have much in common with a table except in the superficial sense that both are commonly depicted on the screen in the form of a grid.

  • David Portas (12/20/2009)


    Much more importantly, there are fundamental differences between a spreadsheet and a table (at least a table in the relational sense). As conventionally understood a spreadsheet is a two dimensional grid in which data is accessed by positional co-ordinates. A relational table (AKA "relation") on the other hand can have any finite, non-negative number of dimensions and its data points (tuples or rows) are accessed by the values of those dimensions and never by position.

    Spreadsheet and relational table are two incompatible concepts. A spreadsheet is of course a database of sorts, but it's obviously not a relational one and it doesn't have much in common with a table except in the superficial sense that both are commonly depicted on the screen in the form of a grid.

    Ah, but we aren't talking about relational databases, we're talking about *databases*.

    In point of fact a table is nothing more than a 2D grid of data. It is not multi-dimensional, it is purely 2 dimensional, consisting of rows and columns (or tuples if you like the idea better, although tuples and columns are not completely synonymous).

    A table, in other words, is a primitive. Likewise a spreadsheet, at its simplest, is *also* a 2D grid of data. Both the table and the spreadsheet need two coordiates to define a specific location in the grid. For a spreadsheet it's row and column, for a table it's--row and column. πŸ™‚

    I also deny you do not locate data by position. If, for example, I'm looking for a particular fact, say the salary of a given employee I need both the employee ID (whatever that might be) and the column ID (whatever that might be).

    At the end of the day both employee ID and column ID must map to positional data to locate the actual salary value. The mapping is more elaborate than a spreadsheet's R1C1 coordinate system, but no different in its results.

    If you doubt this, remember that the R/C coordinate system in Excel actually maps to a 1D data stream, and the use of sparse data makes this mapping non-trivial.

    So at the lowest (physical) levels, there really is no difference. And despite mathematicians distaste for physicality, physicality is inescapable in computer science.

  • Take the better example of a table whose key has three dimensions, the attributes (a,b,c), plus three other attributes (d,e,f). How to convert such a relation into a two dimensional grid?

    The only way is to invent some new co-ordinates for each row and column in your grid and assign a piece of data to each pair of co-ordinates. Those co-ordinates don't exist in the table itself. Notice also that you cannot assume that either the attributes or attribute values have any inherent order to them. So each time I remove or insert new tuples you have to make some arbitrary decision about where those tuples should go in your grid. In other words you have to change the fundamental structure in order to squeeze the table into a 2D grid. Of course that N-dimensional to 2-dimensional mapping may exist internally "under the covers" but it is not inherent in the table itself and it doesn't exist at all in the external structure of the data. A table is a by definition the logical structure exposed by the database, not a physical storage structure.

    tuples if you like the idea better, although tuples and columns are not completely synonymous

    They are not synonymous at all. A tuple is a set of named attributes - like a row in SQL, not a column.

  • David Portas (12/21/2009)


    Take the better example of a table whose key has three dimensions, the attributes (a,b,c), plus three other attributes (d,e,f). How to convert such a relation into a two dimensional grid?

    The only way is to invent some new co-ordinates for each row and column in your grid and assign a piece of data to each pair of co-ordinates. Those co-ordinates don't exist in the table itself. Notice also that you cannot assume that either the attributes or attribute values have any inherent order to them. So each time I remove or insert new tuples you have to make some arbitrary decision about where those tuples should go in your grid. In other words you have to change the fundamental structure in order to squeeze the table into a 2D grid. Of course that N-dimensional to 2-dimensional mapping may exist internally "under the covers" but it is not inherent in the table itself and it doesn't exist at all in the external structure of the data. A table is a by definition the logical structure exposed by the database, not a physical storage structure.

    Eh? You have 6 values that stand as a unit. a 1D list of these values (one per row) makes a grid. The fact the (composite) key has 3 elements is irrelevant. The same is true of a 4, 5, or n element composite key. As long as you treat a set of values (a row, a tuple, take your pick) as a unit, and you have a list of those units, you have a grid.

    Also, order *is* important. Or more precisely, identification of each element within the row as to the type of element it is. Row order is irrelevant, column order is not. If you have two rows, those rows must have the same ordering (or at least the same mapping) of meaning to element. If not, you do not have a table, you have a random collection of data. Whether you tag the column ID via metadata or based on physical order, the column structure is still intact--and maps directly to a grid.

    The fact that a spreadsheet makes this mapping (visually) explicit in no way negates the fact it has a mapping identical to that of a table, in a logical sense, ignoring implementation details.

    tuples if you like the idea better, although tuples and columns are not completely synonymous

    They are not synonymous at all. A tuple is a set of named attributes - like a row in SQL, not a column.

    I stand corrected, but a ninety-degree rotation leaves the situation unchanged. πŸ™‚

  • tuples if you like the idea better, although tuples and columns are not completely synonymous

    They are not synonymous at all. A tuple is a set of named attributes - like a row in SQL, not a column.

    I stand corrected, but a ninety-degree rotation leaves the situation unchanged. πŸ™‚

    A tuple is a set. How can you turn a set on its side? I think you need to adjust your two dimensional thinking πŸ™‚

  • The idea in the editorial poll was to make it easier to talk about databases, not harder. Especially with non-database people. I think you're arguing in the opposite direction.

  • I think the earlier point about context is the key. As a manager/developer, when we talk to users, we might say things like

    "we can add that to the database"

    "We don't hold that information in the system" etc etc

    In that context, words like database, application, system etc are all interchangeable. We don't need to specify which physical database we're talking about. If we had the same conversation with a DBA, then possibly we would, although I hope not.

    But how far do you take it. "We can add that column to the database.." hmm, well not really. A column isn't a database, the table you add it to isn't a database either, both are stored in database, true. Actually more than one if you include all of the system db's etc... I could go on.

    Personally I'm happy to help somebody fix the 'excel database', that has all of their departments pricing info on... especially if that led onto me pursuading them to let me turn that into a full database solution.

  • There are now, always have been, and always will be, ambiguities in every human language that has ever existed.

    We've got extensive mechanisms for dealing with this already built into every language we use.

    I say don't worry about it. Linguistic ambiguity is simply part of how our brains work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Context is key here and so is ambiguity or rather ambiguity is a result of using the flexible generic communication form of language.

    I have read (and probably written) technical documents that apply the term database (both with and without the space) to mean different things. The trick is to take additional meaning from the context. In the case of general users (or to highlight they are not the dumb people we sometimes make them out to be - people with expertise in something other than IT) they do not want to know the details and just want to have the high level requirements there (i.e. implemented) including nonfunctional requirements.

    Users always want their data available but rarely speak of redundancy, clustering, mirroring or backups (unless they have old data they want to retrieve).

    It is the same for software development, users want their, yes their, systems flexible to changes in business requirements but try not to discuss the technicalities of Test Driven Design with them as they don't want to know. That's my job (and maybe yours too).

    When I go to the (medical) Doctor's I only want to roughly know what is wrong with me (preferably nothing) and what I must do to remedy the situation. I am not interested why particular drugs or therapy work or not. As long as I haven't got a tupperware box of pills to take every day I am happy to rely of the experts expertise with the details. Occasionally I ask about alternatives but I expect to be spoken to in terms that I would understand.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • roger.plowman (12/20/2009)


    And despite mathematicians distaste for physicality, physicality is inescapable in computer science.

    Until you said that I was 100% with you.

    But as a real mathematician (couple of degrees, professionally qualified as a chartered mathematician, fellow of the main UK professional institution for mathematicians) I just have to point out that mathematicians have no distaste for physicality. We do have a distaste for pointless complexity, so our models often spurn some aspects of physicality - just as a DBA's models do (I have yet to meet a DBA - as opposed to a mathematician - who knows in any detail the cacheing algorithms used in a RAID controller and understands why they are what they are, but accurately modelling that physical process would be, for a typical DBA, pointless complexity - as would detailed modelling of the aerodynamics of disc head flight, despite the fact that his RDBMS depends critically on thaty physical process). That's just applying the KISS principal, and not a distaste for physicality.

    Tom

  • David Portas (12/21/2009)


    A tuple is a set. How can you turn a set on its side? I think you need to adjust your two dimensional thinking πŸ™‚

    Not in any database system I've ever heard of, and certainly not in relational algebra. A tuple is an element of a mapping from names (column names) to values; in T-SQL the column names have a fixed order, so the tuple is also a map from numbers to values. Throw away these properties and you have a set - but it isn't a tuple any more.

    And it is an inherent property of relations that they can be transposed (turned on their sides). In this respect spreadsheets tend to be truer to relational algebra than is MS SQLS.

    Tom

  • SeΓ±orDBA (12/18/2009)


    Although the dictionary term isn't defined in terms of the technology used, when you say "database" to any DBA, programmer, IT guy or enthusiast 99 times out of 100 they are going to think SQL Server or MySQL or Oracle or PostgreSQL or whatever their RDBMS of choice is.

    I'd expect that to be far less that 99% of the time. People are still using network/hierarchical databases, object databases have become quite common, and there's a lot of work going on on functional databases although they certainly aren't mainstream yet. Maybe you would have been right if you had said 88% (actually I doubt it) but no way is 99% right across all DBAs, programmers, IT guys and enthusiasts. You certainly won't find any IDMS or IDMSX DBAs (yes, they do still exist) who assume it means something organised by an RDBMS, nor any of the guys who built the first petabyte database (that was O not R) or the guys who f?irst hit 1TB/h data input rate (that was N not R, and they have to go a lot faster than that now; that was written in Forth and I would call it a network DB but wikipedia calls it an object DB -or rather wikipedia says the firtst 1TB/h input was achieved with an object DB, but wikipedia may think it was someone other than I think it was).

    My own feeling is that when you say "database" to someone other than a died in the wool and blinkered RDBMS fanatic they will take it to mean "an organized body of related information" (which is the dictionary definition that Jan quoted). That's certainly the meaning that I understand when I hear the word.

    Tom

  • Tom.Thomson (2/8/2010)


    David Portas (12/21/2009)


    A tuple is a set. How can you turn a set on its side? I think you need to adjust your two dimensional thinking πŸ™‚

    Not in any database system I've ever heard of, and certainly not in relational algebra. A tuple is an element of a mapping from names (column names) to values; in T-SQL the column names have a fixed order, so the tuple is also a map from numbers to values. Throw away these properties and you have a set - but it isn't a tuple any more.

    And it is an inherent property of relations that they can be transposed (turned on their sides). In this respect spreadsheets tend to be truer to relational algebra than is MS SQLS.

    Fair comment. I was being a little imprecise. A tuple in the relational sense is indeed a mapping over a (possibly empty) set of attributes (or their values). Less formally I don't think it's unreasonable to say that a tuple is a "set of named and typed values". They are set in the sense of a collection without order or multiplicity. SQL obviously is not relational and a SQL row is not a tuple.

    I don't know what you mean by "transposing" relations. I can see that there might be certain transformations possible on relations having attributes of only one type and that some of those might qualify as "transpositions" because attribute values could map to new tuples. I fail to see how that can be a general property for relations with any number of arbitrarily typed attributes. Maybe you could give an example of the type of transformation you mean.

    Most people understand a spreadsheet to be two dimensional, whereas relations are N-dimensional (with any number of possible arbitrary representations in a 2D space). So I don't think it's accurate or helpful to compare relations to spreadsheets.

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply