T-SQL versus Python with Pandas

  • So, a new employee is promoting the idea of migrating from T-SQL to Python with Pandas.  The existing code base is a large set of calculations (hundreds) done using stock market data.  Each new day's market data is stored in a SQL database; a scheduled SQL job then runs the calculations and stores those results back into a SQL database.

    None of these calculations is complex to the point of needing specialised libraries.  They're all written in T_SQL.

    Does anyone have experience with a situation like this?  Can you share it and any advice you might have?

    Many thanks.

     

     

     

  • What's the benefit to moving to Python?  I might ask that. Python is really good at handling some complex string things, and with SQL 2017+, you could do the Python calculations on data sets without import/export. That might make sense, but I would need some future benefit to why this is better in Python. Maybe there are future enhancements that would be better in Python, but those need to be spelled out.

    This sounds like someone wants to use new technology, more for the sake of newness or comfort than actual value. There might be some, but you'd have to prove to me that spending resources changing something that already works is worth the cost. Time costs money here, and this isn't trivial. Dev work, testing work, risk of issues, do enough people on call have Python skills to deal with it, or does said employee want to be on call for this?

    In general, I am not usually interested in rewriting things that work without a good reason. If there is technical debt and we are already working here, some refactoring might make sense. If this has limitations we'll hit, then maybe. If it's "just better" or "Python is cool/easier to read/works better", I am usually disinclined to spend the time here.

    I would ask for a business case written on why.

  • +1 for everything Steve wrote.

    If it works, why change it? There need to be some serious benefits in order to justify this.

    • This reply was modified 5 years, 2 months ago by  Phil Parkin.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • rchantler wrote:

    So, a new employee is promoting the idea of migrating from T-SQL to Python with Pandas.  The existing code base is a large set of calculations (hundreds) done using stock market data.  Each new day's market data is stored in a SQL database; a scheduled SQL job then runs the calculations and stores those results back into a SQL database. None of these calculations is complex to the point of needing specialised libraries.  They're all written in T_SQL. Does anyone have experience with a situation like this?  Can you share it and any advice you might have? Many thanks.      

    I run into such things a lot.  Ask them to prove the ROI with code and, yes, there must be ROI.  Doing something different just because you can isn't worth a hoot especially when something is already in place and doing things correctly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • There are some potential advantages of Python:

    • Pandas and NumPy are designed for complex numerical calculations, vectors, matrix manipulations, etc. If you use those features you could very likely find they perform better than equivalent T-SQL (obviously YMMV).
    • Python is extremely widely used for financial and quant calculations whereas T-SQL is not. So you should find it easier to hire people with the right domain expertise (I guess that background is where your new employee may come from).
    • Opens the door to a lot of other powerful APIs, like Spark
    • Truly cross-platform so you aren't dependent only on Microsoft stack
    • Object-oriented
    • More and better debug, test and productivity tools

    HTH

  • It would be a resounding no from me.

    I have had to support some custom Python solutions, and keeping them working is an absolute nightmare. Keeping all the dependencies working together can be awesomely time consuming and when you go to look for solutions, you get arguments from the forums on whatever module you are asking about on why you should or shouldn't be using a particular module, for a solution you neither developed or wanted in your environment.

    Have also had problems where supporting python in data would require adding control characters into your existing jobs that you later come back to and wonder what they were for.

    If they are willing to pay for training time on every one of the modules they plan to use for the entire DBA team, and training scripting python, then go for it if you want to. Otherwise I can easily see situations getting held hostage by your developers whenever something goes wrong, and you may not be able to do anything to troubleshoot. I have had entire SQL servers taken down by the mistakes of Python developers. The most notable was a 200 something gigabyte, single transaction that filled up the remaining space in the log drive for an extremely important reporting database which took almost a day to rollback

    If you need string manipulation - there probably is nothing better than python that is still widely used. (Perl is better but is starting to become a rare skill) I would question whether what is needed can't be done in SSIS which could be easily supported by anyone with just a little bit of time to tinker with.

    If you need complex math, use .net or Java. Either of them are going to be self-contained solutions without a random array of outside dependencies you will have to support.

    and +1 on what Jeff Moden and Steve Jones said.

  • Thank you all for responding.

    In our shop there's now a lot of hand waving about relative efficiencies, but so far no proof!  We don't have a problem with throughput.  Who wants to solve a non-existent problem?

    I do hope an analyst will find a useful calculation that requires very complex machinations from a Python library that we can implement via SQL 2017 + Python.  That would make a solid business case for that calculation (not the pre-existing ones).  Sign me up for that one.

    But re-doing years of work without a business case?  Total waste of time and money.

     

  • Thanks to [nova] for your answer that included "Opens the door to a lot of other powerful APIs, like Spark".

    This idea is the one that could get me on board i.e. if we want to make very complex calculations that need special (Python) libraries.

     

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply