SQL rollback issues keep occurring due to lack of drive space

Question

SQL rollback issues keep occurring due to lack of drive space

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

SQLBlimp SSCertifiable Points: 6330 More actions · Answer 1

My concerns with SSIS is overhead, especially if the package runs on the SQL server. You have sql set to consume all available memory. This isn't a best practice (set sql to leave 2-3gb at minimum RAM (physical) for windows.

Does your client have any monitoring tools like spotlight or idera or redgate? Can you get a growth trend on the database? Also I think that SQL with a small batch size will be faster than SSIS.

Finally is the warehouse on the same machine as the OLTP database in production? (:-O)

mbrady SSC-Addicted Points: 412 More actions · Answer 2

I would run the package from my local machine if I went that route.

I had two batches in successfully and 3/4 through the 3rd the lead architect decided she would install a SP. :sick: This kicked everyone out of course, so now I have data integrity issues. Problems on top of problems. Yay me!

BTW I never did let anyone know but this is a 2008 (no R2) machine with 16 GB total RAM. I can see now what you are saying about the memory John. I will adjust it now.

And yes we are using Spotlight. I hadn't used it till I came here so I will investigate your suggestion and post results back. That is when the server comes back online of course. :hehe: Of course historical data would be gone at that point I would guess, unless Spotlight has some mojo that stores this data?

It will be on SP 3 when it is back up.

Michael

mbrady SSC-Addicted Points: 412 More actions · Answer 3

Oh and the answer to your last question is no. I think I would polish my resume up if that were the case. Not sure I shouldn't as it is. 😀

SQLBlimp SSCertifiable Points: 6330 More actions · Answer 4

mbrady (1/7/2014)
I would run the package from my local machine if I went that route.
I had two batches in successfully and 3/4 through the 3rd the lead architect decided she would install a SP. :sick: This kicked everyone out of course, so now I have data integrity issues. Problems on top of problems. Yay me!
BTW I never did let anyone know but this is a 2008 (no R2) machine with 16 GB total RAM. I can see now what you are saying about the memory John. I will adjust it now.
And yes we are using Spotlight. I hadn't used it till I came here so I will investigate your suggestion and post results back. That is when the server comes back online of course. :hehe: Of course historical data would be gone at that point I would guess, unless Spotlight has some mojo that stores this data?
It will be on SP 3 when it is back up.
Michael

Spotlight has a statistics repository. You configure how much history it keeps.

Thanks

John.

Jeffrey Williams SSC Guru Points: 89773 More actions · Answer 5

JohnFTamburo (1/7/2014)
mbrady (1/7/2014)
EncounterId is not unique as there can be several charges against a single encounter.
Is there any identity that is unique? You could cluster on a non-unique ID but the non-clustered indexes on the table would be larger.
I sympathize with your situation.
Thanks
John.

Not sure what overhead you are worried about. Open BIDS on your workstation - build your import/export and run it locally. All processing is then done on the workstation and only overhead on the server is the query processing. Yes, this will probably be slower than running everything on the server - however, the OP cannot get anything to run on the server now - so running through SSIS and getting it completed would be better than continually getting an error.

I think the issues he is having are related to the row_number() function and the OUTPUT. Using SSIS will avoid all the overhead that SQL Server has to perform to calculate the row_number() (for the batch size) and manage the OUTPUT into the other table. Using SSIS you can use a Multicast transform to send the data to multiple tables removing the issue with OUTPUT.

As for disabling NC indexes first, then rebuilding - I can tell you from experience that this will run much faster doing that than leaving the NC indexes enabled. If you were performing incremental loads and the amount of data being loaded is not a high percentage of the total rows - then disabling/rebuilding may not be cost effective. It will also reduce the amount of space required and used in the transaction log during the load, but will still require a lot of space for the index rebuilds.

Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs

mbrady SSC-Addicted Points: 412 More actions · Answer 6

I have the same suspicions you do Jeffery about the functions I had to add to get this to "work" causing the locking. I have gone back to my original process of chunking based on FacilityCode in ~20,000,000 record increments. I know that is very high but so far, up until the SP was applied w/o my knowledge, it has worked. And I am also not explicitly defining transactions around these batches. I attribute the fact it's working again to the mid-day reboot as well as keeping an eye on the transaction log.

The loads that are done are incremental, however I do know what percentage of records the typical load represents. I know from another guy on the team that this particular table has doubled in size over the past year. I will have to get with a couple colleges to determine the approximate number of records the load process takes in. I do know in PROD that it is done daily. What would be the percentage tipping point in your estimation of cost vs benefit to disable and rebuild the indexes? 5%? I just did a check in PROD and there are 3 NCI's there vs. the seven or eight in DEV currently. Also there are over 430 M records in PROD & QA vs. the ~ 220 M in DEV, so they are pretty far out of sync (about a year). It will be difficult to gauge index performance benefits based on 50% of the actual records unless I assume a linear benefit.

mbrady SSC-Addicted Points: 412 More actions · Answer 7

I'm at 63 M and climbing. I'll tell you what though I'm definitely going to back this db up once I get past this initial load. If I can find the space. :w00t:

SQLBlimp SSCertifiable Points: 6330 More actions · Answer 8

mbrady (1/7/2014)
The loads that are done are incremental, however I do know what percentage of records the typical load represents. I know from another guy on the team that this particular table has doubled in size over the past year. I will have to get with a couple colleges to determine the approximate number of records the load process takes in. I do know in PROD that it is done daily. What would be the percentage tipping point in your estimation of cost vs benefit to disable and rebuild the indexes? 5%? I just did a check in PROD and there are 3 NCI's there vs. the seven or eight in DEV currently. Also there are over 430 M records in PROD & QA vs. the ~ 220 M in DEV, so they are pretty far out of sync (about a year). It will be difficult to gauge index performance benefits based on 50% of the actual records unless I assume a linear benefit.

I believe that you are past the tipping point and should disable/rebuild or drop/recreate the NCIs after the job is done.

Again, I ask if there is *any* column that represents an unique identity on this table? If not can you add a column like table_id bigint identity(1,1) not null to the sourcetable and the same column without identity specification to the ETL target without breaking the app or your logs? If you can wriggle that in, you can dramatically speed up your ETL and control index sizes.

Thanks

John.

mbrady SSC-Addicted Points: 412 More actions · Answer 9

John,

My question was for on-going iterative loads. Whether it would be possible to determine a tipping point for the overall scheme once all my changes are implemented. My goal immediately is to populate this dang table. But thinking past that I was soliciting/suggesting ideas for disabling the indexes for the daily load once they are built and tuned. Sorry for the confusion.

After the fact table I'm using to test with is populated I will start working on creating and tuning indexes for 4 queries that I used Gail's process on Simple-Talk to identify as costly. But I also was also letting the group know that I have to consider the impact of indexes as this table is used daily for ETL loads and heavily queried.

thanks,

Michael

mbrady SSC-Addicted Points: 412 More actions · Answer 10

Unique identity on the source table will be a moot point once the testing table is populated. I would need to get time with the analysts to determine uniqueness as I am not familiar with the data as they are. That is not easily obtainable.

Perry Whittle SSC Guru Points: 233792 More actions · Answer 11

mbrady (1/7/2014)
The PRIMARY filegroup is set to Autogrow by 10 GB unrestricted on a drive with 770 GB free space remaining.

Do you have the local security policy "perform volume maintenance tasks" set for the sql server service account?

If you don't a 10gb file growth will take extended time causing the insert process to time out and start rolling back due to lack of free space in the primary file group.

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

mbrady SSC-Addicted Points: 412 More actions · Answer 12

And down the rabbit hole we go. I am not able access LSP, GPE, SQL Server Config Mgr, etc. I have confirmed that others are getting the same errors I am. I am an admin on this box as are they. I am going to need to reinstall .NET framework, MMC, and SQL Server 2008.

mbrady SSC-Addicted Points: 412 More actions · Answer 13

As an update:

All the administrative functions that were down are now back up. Spent the w/e researching and repairing. In the end it took a hotfix from M$ and repairing the PATH environment variable that was messed up.

I applied the recommended memory config. Checked to see if the service account had the recommended policy setting and it did.

The test table is populated and I am off and running.

In one performance tuning effort that is already on it's way to PROD I took a view that ran at approximately 40 minutes and added 2 key indexes. It is now running in under 5 minutes! The ELT process that this query is called multiple times for went from running in 4 hours to 1 hour and 6 minutes.

Obviously there is more work to do, but that is not a bad start for spending a few hours analyzing query plans.

Thank you all very much for your help(s).