Wednesday, March 7, 2012

Question about BCP

I have a data file that I am trying to BCP IN into a table. The table has an IDENTITY column as PK.

If the BCP process fails in between (there would be about 500,000 records in the data file) and I restart the process, would the records be overwritten into the table or deleted and re-isnerted if the record already exists? I noticed that it does not create duplicates. So either its over writing the existing records or ignoring them and inserting the new records. I did not find any documenttion regarding this in BOL.

Thanks.

I did a test, found that BCP always tries to append data from file to table. You can use-bbatch_size to specifies the number of rows per batch of data copied. Each batch is copied to the server as one transaction. SQL Server commits or rolls back, in the case of failure, the transaction for every batch. By default, all data in the specified data file is copied in one batch.

And there is a -E switch, which decides whether using IDENTITY values from file, or generate a new unique IDENTITY. So when you use BCP with -E option to import data from file that may bring duplicate key values, an error will be raised saying vilation primary key constraint. You can take a look at this SQL SDK article:

http://msdn.microsoft.com/library/en-us/coprompt/cp_bcp_61et.asp?frame=true

|||

I do use the -E option. I have a stored proc that I run from multiple Query Analyzer windows parallely to transfer the data faster.

Here's how the command builds up:

SET @.str = 'bcp ' + @.db + '.dbo.' + @.table + ' in "' + @.Fileloc + '" -f"C:\mount\backup21\DocPhrase.fmt" -S' + @.server + ' -T -E '

When I run multiple processes, occassionally one of them fails with locking issue. So I re-run the stored proc. I noticed it doesnt complain about existing records. And there are no duplicates too if I run the bcp in multiple times. So I was wondering whether BCP ignores if the record already exists in the table or overwrites it.

|||

ndinakar:

When I run multiple processes, occassionally one of them fails with locking issue. So I re-run the stored proc. I noticed it doesnt complain about existing records. And there are no duplicates too if I run the bcp in multiple times. So I was wondering whether BCP ignores if the record already exists in the table or overwrites it.

Really strange, in my testing BCP always tried to append rows, not ignore, nor overwritel; and if run same bcp multiple times with -E option, the duplicates vilation error will be raised. I use Profiler to trace SQL server, and found actually BCP calls 'insert bulk' command, not update.

Are you sure your identify column is primary key of the table? How about add a hint to the bcp command as:

-h CHECK_CONSTRAINT

Or you canget the Profiler trace to see what happens why the BCP runs to your SQL server.

|||

Yes you are right. It does throw an error if I try to BCP in the same file again if the BCP in was successful the first time.

I believe what was happening in mi situation was, since the entire bcp in is treated as one transaction, if for some locking reason it fails none of the records are inserted. That is why when I re run the process, it does not complain and nicely inserts the records. The records were never in in the first place.

|||It should be as you say, something related to sql transaction in BCP. And there is a-bbatch_size option that can be used to specifies the number of rows per batch of data copied. Each batch is copied to the server as one transaction. SQL Server commits or rolls back, in the case of failure, the transaction for every batch.

No comments:

Post a Comment