I Feel Like a Magician

  • Steve Jones - SSC Editor (7/6/2015)...

    I'd agree with this. If you find SSIS difficult, use something else. However I also think MS should be improving things that are broken, like basic CSV support.

    Without giving us the excuse that CSV isn't a standard.

    Still waiting? http://qa.sqlservercentral.com/Forums/Topic1391654-263-1.aspx#bm1391759 (Feedback is Encouraged).

  • The ` (grave) character makes a very good delimiter. It has no lexical value in any language.

  • Jeff Moden (7/6/2015)


    Steve Jones - SSC Editor (7/6/2015)


    ...the problem is MS is not investing in the tooling or features beyond what's needed to get a sale. That's short term thinking that allows other tools and platforms to gain traction.

    You took the words right out of my mouth. If they spent just one version fixing or bringing up to date those things that need it (and that includes a lot of "Closed - Working As Designed" items) and things like fixing ACE drivers and updating BCP and fixing things like the privs problems associate with BULK INSERT and whole lot of other things with the engine and TQL itself (seriously!!! Like the fact that shrink still doesn't work correctly, etc), then a whole lot more people would have a whole lot better opinion of both MS and SQL Server.

    And, while it's understood that Azure is important to future success, fix the product/customer base that you already have in the stand-alone product and then migrate those fixes to Azure. It's a win-Win-WIN for anyone and everyone involved.

    Man, this is starting to give me flashbacks of my old video game development days of when customers want existing bugs and features fixed before pushing out new content and expansions to the game.

    It's not a bad thing and not a good comparison because we are not forced into the next new expansion or game content like a video game is, but I feel the mentality from Microsoft is likely still the same regardless. They want to put all their focus on what's in front of them rather than what's behind them with SSIS, SSAS and SSRS.

    I almost feel it's so we continue to upgrade. Otherwise, if we are absolutely content with what we have, why continue to invest? I could be wrong as I don't have much experience in all versions of SQL Server and what's coming down the line with 2016, but it makes sense.

    As for SSIS, I have little problems using it to help with my ETL processing. I don't depend on it as much and only use it when I have to. Otherwise, I use TSQL to do everything I can and it seems to do a good job for me.

  • quagmired (7/6/2015)


    Eric M Russell (7/6/2015)


    Where I work, we use TAB delimited files, so that's why I'm not familiar with the annoywances regarding the CSV file format and embedded commas.

    What about embedded TABS or NEWLINES? Does it work?

    Probably not, but the files I typically injest are reference data or flattened transactional records with columns like last name or address. Commas are normal, which is why I've never gotten why CSV gained popularity over TAB delimited. However, things like tabs and CR+LF are (per documented requirements) not allowed. The data is expected to be conformed, not free form. Actually, I think that operations, who administer the client facing FTP site, may have something like a C++ program that makes a pass through the file, cleaning up or setting exclusion indicator on non-conforming records.

    The bottom line is that, rather than tossing out SSIS and all it has to offer, we simply developed a pattern that's been working for us for years. In our case it was a supplemental step that cleans or validates records in the staging environment. If I wanted to spend a rainy day researching it, I could probably find a way to exclude or correct non-conforming records using a built-in SSIS task; perhaps a Conditional Split.

    There are also 3rd party SSIS Data Flow tasks that can handle CSV records with embedded commas with way some people need.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Well in all fairness any flatfile format that isn't properly generated on the source side is going to break your dataflow. As for why csv are more popular it's because they were a more consistent standard, tabs might be a standard now but on older systems some might interpret a tab as 4 spaces some might interpret them as 8 spaces etc.... Similar issue with line breaks since for whatever reason MS decided to go with CR+NL for line breaks and nix systems use just NL i believe, that difference can make you cry if you're working on both platforms, fun is trying to figure out why the password you're copying and pasting at 3 in the morning from excel into a java program isn't working.

  • ZZartin (7/7/2015)


    Well in all fairness any flatfile format that isn't properly generated on the source side is going to break your dataflow. As for why csv are more popular it's because they were a more consistent standard, tabs might be a standard now but on older systems some might interpret a tab as 4 spaces some might interpret them as 8 spaces etc.... Similar issue with line breaks since for whatever reason MS decided to go with CR+NL for line breaks and nix systems use just NL i believe, that difference can make you cry if you're working on both platforms, fun is trying to figure out why the password you're copying and pasting at 3 in the morning from excel into a java program isn't working.

    Speaking of Excel, it has my vote for the worst possible data interchange format, especially when the source document is edited and supplied directly by the business.

    - Sometimes users will "hide" rows in Excel for convenience, like when they want to see only a subset of data. When they close Excel, they'll re-save the document prior to submitting it to IT. The problem is, hidden rows are excluded by SSIS and perhaps all other applications. Maybe that makes sense, but hiding is not the same as deleting.

    - When a user wants to delete a row, sometimes they will "strike through" text in that row. A row with strike-through text means it's deleted; right? So why did IT import those rows into the database? Well, maybe it looks that way to the user, but an ETL process doesn't see it that way.

    - When a user is entering data into an Excel sheet, they inadvertently code a leading space into a column called DepositAmount. Now, for that specific row, DepositAmount is treated as a character value, so the expression "WHERE DepositAmount > 10000" will always exclude that row, even if the value is " 12500".

    The possibilities for some type of unexpected behaviour are endless.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • ZZartin (7/7/2015)


    Well in all fairness any flatfile format that isn't properly generated on the source side is going to break your dataflow. As for why csv are more popular it's because they were a more consistent standard, tabs might be a standard now but on older systems some might interpret a tab as 4 spaces some might interpret them as 8 spaces etc.... Similar issue with line breaks since for whatever reason MS decided to go with CR+NL for line breaks and nix systems use just NL i believe, that difference can make you cry if you're working on both platforms, fun is trying to figure out why the password you're copying and pasting at 3 in the morning from excel into a java program isn't working.

    Completely agree, but when the file format opens find in Excel, opens in text editors as CSV, appears in PostgreSQL fine, what can't SSIS handle it? A very poor implementation.

  • Steve Jones - SSC Editor (7/7/2015)


    ZZartin (7/7/2015)


    Well in all fairness any flatfile format that isn't properly generated on the source side is going to break your dataflow. As for why csv are more popular it's because they were a more consistent standard, tabs might be a standard now but on older systems some might interpret a tab as 4 spaces some might interpret them as 8 spaces etc.... Similar issue with line breaks since for whatever reason MS decided to go with CR+NL for line breaks and nix systems use just NL i believe, that difference can make you cry if you're working on both platforms, fun is trying to figure out why the password you're copying and pasting at 3 in the morning from excel into a java program isn't working.

    Completely agree, but when the file format opens find in Excel, opens in text editors as CSV, appears in PostgreSQL fine, what can't SSIS handle it? A very poor implementation.

    You'd think that parsing records and columns in a CSV file would be a function of the provider library; the same library used by SSIS, Excel, and Java. I would expect the Data Flow in SSIS to provide only generic pumbing for moving data from one task to another. Perhaps the solution for getting SSIS to properly handle embedded commas in a CSV file is a matter of configuring something under advanced properties in the Flat File data source.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Microsoft, your product is mature, it is time to start sweating the small stuff

  • Eric M Russell (7/7/2015)


    You'd think that parsing records and columns in a CSV file would be a function of the provider library; the same library used by SSIS, Excel, and Java. I would expect the Data Flow in SSIS to provide only generic pumbing for moving data from one task to another. Perhaps the solution for getting SSIS to properly handle embedded commas in a CSV file is a matter of configuring something under advanced properties in the Flat File data source.

    I thought so too, after banging my head on the wall for a day and a half. Eventually found that there are some issues with how SSIS manages this and quite a few bugs reported. It wasn't worth it for me to debug further. I wrote a PoSh to open and save each csv as an Excel sheet and those imported fine.

  • David.Poole (7/8/2015)


    Microsoft, your product is mature, it is time to start sweating the small stuff

    + 1 billion! I've emblazoned one of your fine quotes in my head... I think this may be another!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Steve Jones - SSC Editor (7/12/2015)


    I thought so too, after banging my head on the wall for a day and a half. Eventually found that there are some issues with how SSIS manages this and quite a few bugs reported. It wasn't worth it for me to debug further. I wrote a PoSh to open and save each csv as an Excel sheet and those imported fine.

    YOU need to write an article on THAT!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (7/12/2015)


    Steve Jones - SSC Editor (7/12/2015)


    I thought so too, after banging my head on the wall for a day and a half. Eventually found that there are some issues with how SSIS manages this and quite a few bugs reported. It wasn't worth it for me to debug further. I wrote a PoSh to open and save each csv as an Excel sheet and those imported fine.

    YOU need to write an article on THAT!

    I started, but ran into rabbit holes left and right. I quit taking screen shots when my blood pressure rose 20 points. And I found a few notes by Jamie Thomson and other, real SSIS people that noted it was an issue.

  • Steve Jones - SSC Editor (7/6/2015)


    Stephen Hirsch (7/6/2015)


    The problem is that the people who use the tools aren't the ones who buy the tools. People who use tools aren't nearly as important to MS (and to be fair, to any business) than the people who buy the tools.

    That's reality, but the problem is MS is not investing in the tooling or features beyond what's needed to get a sale. That's short term thinking that allows other tools and platforms to gain traction.

    Great editorial, Steve, thank you!

    I've more or less given up believing MS is going to fix basic user kluges, b/c there's no profit in it. I remember using Quattro Pro in the early 90's and I could easily choose to display the row and column headers of the currently-selected cell in different, high-contrast colors so that I could quickly see the address of the cell or range I selected. Office 2013 finally implemented something like this, using black-on-battleship grey. Whoopee.

    A forehead-pounding ETL project I encountered on a 2005 Enterprise server a while back was that I could not use OPENROWSET() to read in a pipe-delimited file without editing the registry. Yup, the delimiter couldn't be changed to be anything but a comma in T-SQL (unless you wanted to read the entire line as a single string and then shred it, but why should I have to??). Here's a link that saved me: http://www.youdidwhatwithtsql.com/tsql-query-pipe-delimited-text-files-with-openrowset/429/[/url]

    Now maybe that's been fixed/changed since 2005, but it's not hard to see why the PostgreSQL user was flummoxed by MS's implementation of CSV handling.

    Rich

Viewing 14 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply