Land and process 100's of files using Synapse

  • Hi fellow cloud people

    We are looking at moving from the land to the cloud

    we have 100's of files and SQL tables to load into the Data Lake Gen2 then process into the staging and presentation zones (I know names) and then on to the SQL Data warehouse

    We are looking for the best way to move and process these files using Synapse so that we don't have to code each file for the raw zone

    then I guess we would have to start coding for each file / table as the we progress zones into SQL

    Has Synapse a built in functionality for this or do we need a tool , if so can you recommend or suggest a tool

    We are currently an SSIS shop

    Thanks in advance and keep your head out of the cloud as it may crash 🙂

    Kevin

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I'll have a go at answering but I'm sure others will have more experience of this than me!

    Synapse is integrated with Data Factory which is kind of like SSIS in that it can transfer data from source to destination and transform it if necessary.  You can use it to iterate through files to load them into the database and I think it can generate the destination schema as part of the process.

    For the SQL tables, why not just load them straight into the database rather than going via the data lake?

    If you put your files (guessing they are csv or something) into a data lake you can also use external tables and polybase to read them.  If the format of the files is consistent and the layout of the data lake is thought about (perhaps have folders based on week/month for example),  then you can read them all in one go or in batches.

    If the data needs cleansing or transforming or is not suitable for polybase in some way then you can you python in Synapse to code the transformations (or use Data Factory)

     

    • This reply was modified 2 years, 3 months ago by  DNA_DBA.
  • Hi SSCrazy Eights

    thank you for your reply to my post!

    I think you have answered my question kind of

    Its my fault the way i wrote the original post

    We commissioned a consultant to build out of 1st environment and they suggest using ADF https://github.com/microsoft/azure-data-services-go-fast-codebase 

    to load the 100's more like 900 files on a daily bases into the RAW zone this tool they said would load these files without us having to do anything more. then process them into the next stage for example STAGING and PRES for the files that need to go into these zones etc.. my concern is

    1. is this the best tool for the job for example how easy is it to use
    2. how much coding is required
    3. support
    4. etc..

    If DF can do this for me then fine dont need the tool else what other tools out there to manage the processing and coding required to transform the data at later stages

     

    Kevin

    Kevin

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply