Thursday, October 21, 2010

Importing Large Amounts of Data

The Problem


I have a system which imports large amounts of data over night. This will normally be in the low millions. If it fails for a few nights it can then be tens of millions. The problems is space – I don't have enough drive space to allow the transaction log to grow large enough to accommodate more than two million rows of data.


The Solution


Create a variable called @loopRowCounter.

Create Another called @loopCount

Use an execute TSQL task to populate that variable with the number of rows in the data source.

You can do this by:

Under General

SQL command should start like this - SELECT count(*) as loopRowCount FROM …

Set the ResultSet to single row

Under Result Set

Set result name to loopRowCount and set it to match your variable - User::loopRowCounter.

Now you have a variable with the number of rows in.

Move on to a For Loop.

In InitExpressions enter this @loopCount = 2000002 – this is to set a default value

In EvalExpression enter this @loopCount > 0

In asignExpression enter @loopCount = @loopRowCounter – inside the container @loopRowCounter will decremented.

Now use a data flow and in the Data source use Select Top (500000) …

You will need to do a check that the rows in the source are not in the destination – I do a join on the table and compare ids

This will limit the flow of data to 500000 rows.

Next got to an Execute SQL Task

Under Parameter Mapping

Select the variable Name User::loopRowCounter give the parameter a name and set it to type Long with a size of 4

Under General

SQL command should start like this:

checkpoint;

SELECT ? - 500000 as loopRowCount

Set the ResultSet to single row

Under Result Set

Set result name to loopRowCount and set it to match your variable - User::loopRowCounter.

We know how may rows were in the table and we assigned that value to User::loopRowCounter. We know we have just move 500000 out so we subtract 500000 from the parameter which represents User::loopRowCounter and the result is then assigned back to User::loopRowCounter.

Now, no matter how many millions of records go through my log file only has to cope with 500000 at a time.

No comments:

Post a Comment