Managing/Loading Large Amounts of Data

  • We have several terabytes of data broken out into a windows file structure. The data was added in 80gig increments. These were 80 gig files with timestamp data inside (essential 8 columns of data and a TON of rows). One of our programmers wrote a program to read the 80 gig file, apply a formula to it and write the data into a file and store it in a windows file structure. So, the data is broken up into types and then the files are stored in the proper...folder but the folder structure goes 30-40 folders deep (not sure why yet). The department complains about how slow the process takes and also fear of not having proper backup of their data. They also believe that its impossible to store this kinda data in a database...

    I seem to believe different. Even if it meant we had to use oracle to store the data, it would still be cheaper then the hardware they are requesting to manipulate the data. I'm thinking this would be cake for SQL Server anyways.

    Cliff notes: Is it complicated/impossible to load 80gig files with 8 columns of float(decimal) data into SQL Server? This is essential 1 table with 8 columns or 8 tables with 2 column (1 is used as an index) of data inside of it. Would the size of the database be around 80gigs?

    Opinions, suggestions, Ideas?

  • Cliff notes: Is it complicated/impossible to load 80gig files with 8 columns of float(decimal) data into SQL Server?

    That depends almost entirely on the nature, structure and format of the file. There is nothing complicated about float or decimal (though you do have to be set on which it is).

    This is essential 1 table with 8 columns or 8 tables with 2 column (1 is used as an index) of data inside of it. Would the size of the database be around 80gigs?

    Again that depends on how efficient the current files storage format is, but you can usually figure overhead of 30% to 80% in SQL Server. The narrower the rows, the higher the storage overhead, typically.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • That depends almost entirely on the nature, structure and format of the file. There is nothing complicated about float or decimal (though you do have to be set on which it is).

    I have not gotten to see the actual file yet but lets assume its float. The file is comma delimited.

    Again that depends on how efficient the current files storage format is, but you can usually figure overhead of 30% to 80% in SQL Server. The narrower the rows, the higher the storage overhead, typically.

    So you saying the SQL Server db will be 30-80% higher in size than the file structure would be and the narrower the rows, the higher the disk space required. But if the rows were wider and there were less tables involved, I would lower required disk space but I would be sacrificing performance? Correct?

  • grtn316 (8/28/2008)


    But if the rows were wider and there were less tables involved, I would lower required disk space but I would be sacrificing performance? Correct?

    No. The issue is how many rows, not how many tables. The more rows, the more storage overhead and the slower it will be. Wider rows are more efficient and faster (assuming that you are going to be using all of the data in them), what you give up is convenience, not performance.

    For this kind of data, you may also seriously want to consider a Data Warehouse setup, depending on how it is going to be used (which you haven't told us yet).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply