extracting data from oracle source with ssis

  • i am looking for solution for the scenario.

    scenario is:

    i have some 50 oracle db tables.

    i have to load the data from oracle to sql server tables through ssis.

    the loading must be haappened incrementally for this i have to take last_updated_date from my destination table

    and have to pass to OLEDB source for filter data .

    the data must be filtered at source it self as the oracle tables have lakhs of records, but wana bring only records from last_updat_date.

    i have set Run64bitrun as false in ssis solution properties

    always usedefaultcode page=true at OLEDB source

    i tried fallowing methods but all are failed.

    1)excuteSQL task---taken last_updated_date from query and result set is passed as single row to user:Udate_key

    execute sql task---->dataflow(OLEDBsource-->OLEDB Destination)

    when i tried the fallowing select * from sourcetable where last_updated_date>? " the oledb command throws fallowing error

    ------------

    Provider cannot derive parameter information and SetParameterInfo has not been called. (Microsoft OLE DB Provider for Oracle)

    2)when i tried with taking another package level variable USER::SqlQuery as evaluate asexpression =true

    and entered "SELECT * FROM TRNG.AP_INVOICES_ALL WHERE LAST_UPDATE_DATE >" +@[User::UdateKey]

    the expression got validated

    but again oledb source with "sql command from variable" option throws error

    ----

    Error at Data Flow Task [OLE DB Source [1]]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E14.

    An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E14 Description: "ORA-00936: missing expression

    ".

    3)

    execute sql task------->script task---data floe task(oledb source----oledb destination)

    by taking user::udatekey as readonly variable and user::sqlquery as read and write variable for script task to dynamically assigning the value to variable

    Dts.Variables("SqlQuery").Value = "SELECT * FROM AP_INVOICES_ALL WHERE LAST_UPDATE_DATE >" + Dts.Variables("Updatekey").Value

    Dts.TaskResult = ScriptResults.Success

    it also throwing error.

    please guide me in this regard. with appropriate solution .......

    if possible send me dtailed steps

  • Looks like an oracle syntax error ("ORA-00936: missing expression").

    That would be the point to start.

    (Un)fortunately, I have no idea about oracle.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I did this in my previous job using a linked server with a dynamic open query call.

    My recollection is that I had to specify the date format used by Oracle.

    It was something like this.

    WHERE EditDate >= TO_DATE(@DateVariable,'YYYY-MM-DD HH24:MI:SS')

  • "SELECT * FROM table WHERE LAST_UPDATE_DATE =>" TO_DATE(+@UdateKey,''YYYY-MM-DD HH24:MI:SS')

    the expression is going to work

    can you give me correct way? , it helps me a lot

  • Hi Skanth,

    Did you find the solution for this. please let us know.

    Ritesh

  • Hi,

    I've had to do something similar before - numerous oracle tables, some with millions of records, others with just a few... all with a "last-update-date-time-stamp" field allowing selection of only the records not already loaded.

    My advice if you are able to do it this way, is to use oracle's sqlplus utility to extract the data from the oracle tables to text files/csv (sqlplus allows you to set delimiter, field sizes etc), then bulk insert/bcp to load the files.

    We had tried configuring the oracle box as a linked server and using DTS (it was a while ago!), but the performance improvement from using sqlplus to extract the data first took the total run-time of our job to less than 20% of previous best - it went from about 12 hours down to 2 if memory serves, so defo worth having a test in your environment if you can!

    Here's an example of the relevant sqlplus files - I think you could modify pretty easily to point to one of your oracle tables, and muck about with the config file to get the formats as you need...

    1 - the sqlplus config file:

    (eg C:\SQLPlusScripts\Config\TableX.sql)

    Contents of the file:

    connect user/password@oracle.server

    set echo off;

    set heading off;

    set feedback off;

    set wrap off;

    set pagesize 0;

    set linesize 2000;

    set termout off;

    set arraysize 500;

    set colsep '|||';

    set trimspool on;

    set newpage 0;

    rem set space 0;

    set tab off;

    set numformat 9999999999999.999;

    spool "E:\TableX.txt";

    @"C:\SQLPlusScripts\OracleSQL\TableX.sql"

    spool off;

    exit

    2 - the sql file

    C:\SQLPlusScripts\Config\TableX.sql

    (eg C:\SQLPlusScripts\OracleSQL\TableX.sql)

    Contents of the file:

    select /*+ ALL_ROWS */ * from TABLEX where LASTUPDATEDATE >= 1120424;

    (This file is automatically re-written every day to get the latest loaded date...)

    Then, to run from command line:

    sqlplus /nolog @"C:\SQLPlusScripts\Config\TableX.sql"

    Hope this has helped - it's my first ever reply! 😎

  • If it’s going to be one time execution, BCP / Bulk Insert approach is fine. If it has to be a regular scheduled task then I would recommend SSIS or Linked Server approach.

    Exporting to text file or csv would be 2-times of SQL job and would be very painful and time consuming.

  • Hi Dev,

    it would also be feasible to use SSIS (or something else) to automate the whole sqlplus->txt->bcp approach πŸ˜€ This is what I did (the something else option - a bespoke ETL framework) and have to say it all works great!

    It's not ideal though, and I do appreciate your point about doubling up activity - this was a misgiving for me before I saw the performance improvement for myself! Also there was a fair amount of set up required for my stuff - however, now it's all metadata - automatic generation and execution of all relevant files/scripts, and much easier to maintain than ever before!

    There are also other considerations, like the (temporary) storage required for the files... however, when moving data between different providers (eg oracle to sql) the whole process includes a raft of validations/checks, as well as just shifting actual data about... not so much so with the sqlplus/bcp approach - and when talking about huge amounts of data there are other benefits of sqlplus/bcp (like asyncronous extract/load using multiple clients).

    My original advice stands - if you can try this approach for yourself then do! Potentially saving 80% of the run duration of a large job must make it worth a look πŸ˜€

    Cheers!!

  • Please don’t blame SSIS for poor performance. It can be (& should be) optimized. As you already accepted you ended up with many unnecessary tasks with your approach which you could have saved with SSIS (only). If you think you get performance advantage in your approach then I am very sure the SSIS package was poorly developed.

  • Hi Dev,

    Can you please provide the proper solution also. Or any link you know just let me know. I have to work with SSIS not with any other tool.

    Thanks

    Ritesh

  • Ii pull data from oracle all of the time.

    The syntax can be "select * from table where last_update_date >= trunc(sysdate)"

    This will get anything changed for current day.

    or you can use trunc(sysdate -7) for the last 7 days of data.

    *****

    Give a man a fish, and you've given him a meal. Teach a man to fish and he'll ask to borrow your boat.:-P

  • This is my contribution. This works for me.

    1. create LAST_LOAD variable as datetime

    2. execute tsql command: SELECT LSET FROM [STG].[dbo].[ETL_CONFIG_LOAD] WHERE NAME = 'TABLEA' ..

    return as LSET to save into LAST_LOAD

    3. execute tsql command: DELETE FROM STG_TABLEA WHERE PERIOD >= ? * This delete from record greater than LAST_LOAD* Makes it dynamic

    4. create variable with string type

    5. Create expression for your variable and set EvaluateAsExpression = false

    Your Oracle code

    "SELECT

    FIELD1

    ,FIELD2

    ,FIELD3

    ,FIELD4

    ,FIELD5

    ..

    ..

    FROM

    TABLEA

    WHERE

    STAT_DATE_KEY >= TO_CHAR(TO_DATE('" + @[User::LAST_LOAD] +"','YYYY-MM-DD'),'YYYYMMDD')"

    6. Drag OLE DB Source control on the canvass

    7. set AlwaysUseDefaultCodePage = True

    8. Use SqlCommand from variable and select the variable

    ..

    ..

    ..

    ..

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply