I have a large database of several hundred gig and I need to perform monthly
updates of about 4 gigs of that data. However I have to test to check if the
update records exists in the current database so that I may either perform
an update or an insert. I've tried all kind of different ways for the
update, but the fastest appears to be to delete all records in the current
database that exist in the update and then do everything as an insert. This
still takes 30 hours to complete. I need to know if there are any tricks or
tips for updating large databases more quickly.
TIAThere is probably 1001 tips, but here are the two I know.
I Remove all foreign keys before performing the insert /
update, if you need to know why then replay, nb I have not
included Primary key as you will need it to check if
record exists ;)
The second thing is to perform the update on the same
server, taking out the network. What I mean is this if 4gb
on server B is to copied onto 5gb on Server A, then it
will take less time if you write it to a file on server B,
compress it, send it Server A, uncompress it, load it into
a temporary table then perform the SQL.
Ok I now have a stupid question, did you try the
INSERT into TABLE Values (A B C) WHERE PrimaryKey not in
(SELECT PrimaryKey from TEMPTABLE) ?
Hope this helps
Peter
>--Original Message--
>I have a large database of several hundred gig and I need
to perform monthly
>updates of about 4 gigs of that data. However I have to
test to check if the
>update records exists in the current database so that I
may either perform
>an update or an insert. I've tried all kind of different
ways for the
>update, but the fastest appears to be to delete all
records in the current
>database that exist in the update and then do everything
as an insert. This
>still takes 30 hours to complete. I need to know if there
are any tricks or
>tips for updating large databases more quickly.
>TIA
>
>.
>|||The Update resides in the same database as a separate Update table, I did
not remove the indexes from the Primary table before attempting the update,
but will try that, leaving the PK as the ony key on the table. I've tried
the INSERT into TABLE Values (A B C) WHERE PrimaryKey not in
(SELECT PrimaryKey from TEMPTABLE) ?, but it was slower than deleting all
the records that existing in the Primary table that don't existing in the
TempTable, then doing just an insert of all data from the TEMPTABLE. So
basically:
DELETE PrimaryTable WHERE PrimaryKey IN(SELECT PrimaryKey FROM TempTable)
INSERT INTO PrimaryTable SELECT * FROM TempTable
has been the fastest.
I've tried
DELETE PrimaryTable FROM TempTable WHERE
TempTable.PrimaryKey=PrimaryTable.PrimaryKey
but this is really slow. Also NOT EXISTS, etc.
I think the issue is that it is recomputing the indexes as the query runs.
It will probably be worth dropping the indexes and reapplying them after the
update. I do this on BULK INSERT routines.
"Peter" <anonymous@.discussions.microsoft.com> wrote in message
news:0ac001c46e75$c58c1ee0$a301280a@.phx.gbl...
> There is probably 1001 tips, but here are the two I know.
> I Remove all foreign keys before performing the insert /
> update, if you need to know why then replay, nb I have not
> included Primary key as you will need it to check if
> record exists ;)
> The second thing is to perform the update on the same
> server, taking out the network. What I mean is this if 4gb
> on server B is to copied onto 5gb on Server A, then it
> will take less time if you write it to a file on server B,
> compress it, send it Server A, uncompress it, load it into
> a temporary table then perform the SQL.
> Ok I now have a stupid question, did you try the
> INSERT into TABLE Values (A B C) WHERE PrimaryKey not in
> (SELECT PrimaryKey from TEMPTABLE) ?
> Hope this helps
> Peter
>
> >--Original Message--
> >I have a large database of several hundred gig and I need
> to perform monthly
> >updates of about 4 gigs of that data. However I have to
> test to check if the
> >update records exists in the current database so that I
> may either perform
> >an update or an insert. I've tried all kind of different
> ways for the
> >update, but the fastest appears to be to delete all
> records in the current
> >database that exist in the update and then do everything
> as an insert. This
> >still takes 30 hours to complete. I need to know if there
> are any tricks or
> >tips for updating large databases more quickly.
> >
> >TIA
> >
> >
> >.
> >|||In cases like this I usually separate out the Inserts from the Updates up
front by either placing them in separate staging tables or by a flag in the
existing single staging table. I usually determine this with an EXISTS type
statement. But when it comes down to any updating or Inserting you need to
do them in smaller batches. Trying to update 4GB at a time will take
forever as you are painfully aware. If you do them in smaller batches of
say 10 or 20K at a time you will usually find a much faster overall time.
Doing the updates in order of the clustered indexes usually helps. By that I
mean if you are updating a lot of rows and they are lumped together by the
CI expression the database can do partial scans instead of millions of
seeks.
--
Andrew J. Kelly SQL MVP
"DWinter" <dwinter@.attbi.com> wrote in message
news:eT2ytjnbEHA.404@.TK2MSFTNGP10.phx.gbl...
> The Update resides in the same database as a separate Update table, I did
> not remove the indexes from the Primary table before attempting the
update,
> but will try that, leaving the PK as the ony key on the table. I've tried
> the INSERT into TABLE Values (A B C) WHERE PrimaryKey not in
> (SELECT PrimaryKey from TEMPTABLE) ?, but it was slower than deleting
all
> the records that existing in the Primary table that don't existing in the
> TempTable, then doing just an insert of all data from the TEMPTABLE. So
> basically:
> DELETE PrimaryTable WHERE PrimaryKey IN(SELECT PrimaryKey FROM TempTable)
> INSERT INTO PrimaryTable SELECT * FROM TempTable
> has been the fastest.
> I've tried
> DELETE PrimaryTable FROM TempTable WHERE
> TempTable.PrimaryKey=PrimaryTable.PrimaryKey
> but this is really slow. Also NOT EXISTS, etc.
> I think the issue is that it is recomputing the indexes as the query runs.
> It will probably be worth dropping the indexes and reapplying them after
the
> update. I do this on BULK INSERT routines.
> "Peter" <anonymous@.discussions.microsoft.com> wrote in message
> news:0ac001c46e75$c58c1ee0$a301280a@.phx.gbl...
> > There is probably 1001 tips, but here are the two I know.
> >
> > I Remove all foreign keys before performing the insert /
> > update, if you need to know why then replay, nb I have not
> > included Primary key as you will need it to check if
> > record exists ;)
> >
> > The second thing is to perform the update on the same
> > server, taking out the network. What I mean is this if 4gb
> > on server B is to copied onto 5gb on Server A, then it
> > will take less time if you write it to a file on server B,
> > compress it, send it Server A, uncompress it, load it into
> > a temporary table then perform the SQL.
> >
> > Ok I now have a stupid question, did you try the
> >
> > INSERT into TABLE Values (A B C) WHERE PrimaryKey not in
> > (SELECT PrimaryKey from TEMPTABLE) ?
> >
> > Hope this helps
> > Peter
> >
> >
> > >--Original Message--
> > >I have a large database of several hundred gig and I need
> > to perform monthly
> > >updates of about 4 gigs of that data. However I have to
> > test to check if the
> > >update records exists in the current database so that I
> > may either perform
> > >an update or an insert. I've tried all kind of different
> > ways for the
> > >update, but the fastest appears to be to delete all
> > records in the current
> > >database that exist in the update and then do everything
> > as an insert. This
> > >still takes 30 hours to complete. I need to know if there
> > are any tricks or
> > >tips for updating large databases more quickly.
> > >
> > >TIA
> > >
> > >
> > >.
> > >
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment