Showing posts with label million. Show all posts
Showing posts with label million. Show all posts

Monday, March 12, 2012

fastest way to deduplicate a list

Im trying to dedupe a table with only one field on it. The table has
40 million records in it. What is the fastest way?

1) create a table with a unque constraint on it insert into that
table?

2) create a table without a unique constraint on it and use insert
into table select distinct un from table2?

3) another way?

MichaelMichael Evanchik (mre224@.yahoo.com) writes:

Quote:

Originally Posted by

Im trying to dedupe a table with only one field on it. The table has
40 million records in it. What is the fastest way?
>
1) create a table with a unque constraint on it insert into that
table?


I assume that you would use the IGNORE_DUP_KEY option? Else the scheme
wouldn't work. That could very well be the fastest method.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Friday, March 9, 2012

Fastest method for Inserting 1 million records into SQL Database

I am reading a text file and modifing the data to match fields in a SQL 2000 Database then inserting the record in. I am using vb.net and have tried various methods but all are to slow. I would appreciate any help anybody could offer.
Have you tried BCP or BULK INSERT? It sounds like you're inserting the data
row-by-row using VB.NET; perhaps you can get away with a BULK INSERT using a
format file to tweak the data? It would probably also be faster if you BULK
INSERT the data in its "raw" form into a temporary table and then move it to
its final destination using INSERT ... SELECT, and do any data modifications
necessary in the SELECT.
"BradC" <BradC@.discussions.microsoft.com> wrote in message
news:F2AA271E-A8B7-448F-83BC-900EFFB679E8@.microsoft.com...
> I am reading a text file and modifing the data to match fields in a SQL
2000 Database then inserting the record in. I am using vb.net and have tried
various methods but all are to slow. I would appreciate any help anybody
could offer.

Fastest method for Inserting 1 million records into SQL Database

Adam is totally correct, however you get best results by
putting the file on the Server that your SQL Server is
on, that way there isn't going to be a network overhead.
J

>--Original Message--
>I am reading a text file and modifing the data to match
fields in a SQL 2000 Database then inserting the record
in. I am using vb.net and have tried various methods but
all are to slow. I would appreciate any help anybody
could offer.
>.
>
"Julie" <anonymous@.discussions.microsoft.com> wrote in message
news:2d85901c469da$da1d8050$a301280a@.phx.gbl...
> ..and I hope people don't take that as arrogantly as it
> sounds...
The part about network overhead? Yeah, I thought you went way over the
line on that one
Actually, I have no clue what you're talking about!

Fastest method for Inserting 1 million records into SQL Database

Adam is totally correct, however you get best results by
putting the file on the Server that your SQL Server is
on, that way there isn't going to be a network overhead.
J

>--Original Message--
>I am reading a text file and modifing the data to match
fields in a SQL 2000 Database then inserting the record
in. I am using vb.net and have tried various methods but
all are to slow. I would appreciate any help anybody
could offer.
>.
>"Julie" <anonymous@.discussions.microsoft.com> wrote in message
news:2d85901c469da$da1d8050$a301280a@.phx
.gbl...
> ..and I hope people don't take that as arrogantly as it
> sounds...
The part about network overhead? Yeah, I thought you went way over the
line on that one
Actually, I have no clue what you're talking about!

Fastest method for Inserting 1 million records into SQL Database

I am reading a text file and modifing the data to match fields in a SQL 2000
Database then inserting the record in. I am using vb.net and have tried var
ious methods but all are to slow. I would appreciate any help anybody could
offer.Have you tried BCP or BULK INSERT? It sounds like you're inserting the data
row-by-row using VB.NET; perhaps you can get away with a BULK INSERT using a
format file to tweak the data? It would probably also be faster if you BULK
INSERT the data in its "raw" form into a temporary table and then move it to
its final destination using INSERT ... SELECT, and do any data modifications
necessary in the SELECT.
"BradC" <BradC@.discussions.microsoft.com> wrote in message
news:F2AA271E-A8B7-448F-83BC-900EFFB679E8@.microsoft.com...
> I am reading a text file and modifing the data to match fields in a SQL
2000 Database then inserting the record in. I am using vb.net and have tried
various methods but all are to slow. I would appreciate any help anybody
could offer.

Faster Deletion from a production database

I need to delete several million transactions from a production database. I
have tried the SP below but it takes 4 1/2 hours to delete 5,000
transactions. Can someone please look over this and let me know if there is
someway to speed up this process? I can't truncate the table because I stil
l
need the most recent (6 months) worth of transactions. The remaining
transactions date back to 10/21/2003 and have been backed up and now need to
be deleted.
Thanks in advance,
Matt
CREATE PROC dbo.ArchiveProduction
(
@.CutOffDate DATETIME = NULL
)
AS
BEGIN
SET NOCOUNT ON
IF @.CutOffDate IS NULL
BEGIN
SET @.CutOffDate = DATEADD(mm, -6, CURRENT_TIMESTAMP)
END
ELSE
BEGIN
IF @.CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP)
BEGIN
RAISERROR ('Cannot delete transactions from last three months', 16, 1)
RETURN -1
END
END
BEGIN TRAN
-- Fill a temp table with the transaction id's to delete
DECLARE @.tmp table(TranID int, BatchID int, OutboundFileID int,
FileContentsID int, FileReceiveID int)
INSERT INTO @.tmp
SELECT t.[ID] as TranID, b.[ID] as BatchID, t.OutboundFileID, f.[ID] AS
FileContentsID,
r.[ID] AS FileReceiveID
FROM FileReceive r LEFT OUTER JOIN FileContents f ON r.[ID] =
f.FileReceiveID
LEFT OUTER JOIN Batches b ON f.[ID] = b.FileContentsID
LEFT OUTER JOIN Transactions t ON b.[ID] = t.BatchID
WHERE r.FileFindDate < @.CutOffDate
-- Acknowledgments
DELETE FROM dbo.Acknowledgments
WHERE TransactionID IN
(
Select TranID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.Acknowledgments',
16, 1)
RETURN -1
END
-- OutboundTranSet
DELETE FROM dbo.OutboundTranSet
FROM dbo.OutboundTranSet o INNER JOIN dbo.OutboundBatch b
ON o.OutboundBatchID = b.[ID] INNER JOIN OutboundFile f
ON f.[ID] = b.FileID INNER JOIN dbo.Transactions t
ON f.[ID] = t.OutboundFileID
WHERE t.OutboundFileID IN
(
SELECT OutboundFileID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.OutboundTranSet',
16, 1)
RETURN -1
END
-- OutboundBatch
DELETE FROM dbo.OutboundBatch
FROM dbo.OutboundBatch o INNER JOIN dbo.OutboundFile f
ON o.FileID = f.[ID] INNER JOIN dbo.Transactions t
ON f.[ID] = t.OutboundFileID
WHERE t.OutboundFileID IN
(
SELECT OutboundFileID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.OutboundBatch',
16, 1)
RETURN -1
END
-- OutboundFile
DELETE FROM dbo.OutboundFile
WHERE [ID] IN
(
SELECT OutboundFileID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.OutboundFile', 16,
1)
RETURN -1
END
-- PayorResponse
DELETE FROM dbo.PayorResponse
WHERE TransactionID IN
(
SELECT TranID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.PayorResponse',
16, 1)
RETURN -1
END
-- ClaimInfo
DELETE FROM dbo.ClaimInfo
WHERE TransactionID IN
(
SELECT TranID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.ClaimInfo', 16, 1)
RETURN -1
END
-- Transactions
-- AttachmentImages
DELETE FROM dbo.AttachmentImages
FROM dbo.AttachmentImages a INNER JOIN dbo.AttachmentsToClaims c
ON a.A2CID = c.[ID]
WHERE c.TranID IN
(
SELECT TranID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.AttachmentImages',
16, 1)
RETURN -1
END
-- AttachmentsToClaims
DELETE FROM dbo.AttachmentsToClaims
WHERE TranID IN
(
SELECT TranID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from
dbo.AttachmentsToClaims', 16, 1)
RETURN -1
END
-- ClaimsToAttachments
DELETE FROM dbo.ClaimsToAttachments
WHERE TranID IN
(
SELECT TranID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from
dbo.ClaimsToAttachments', 16, 1)
RETURN -1
END
-- Transactions
DELETE FROM dbo.Transactions
WHERE [ID] IN
(
SELECT TranID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.Transactions', 16,
1)
RETURN -1
END
-- Batches
-- Batches
DELETE FROM dbo.Batches
WHERE [ID] IN
(
SELECT BatchID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.Batches', 16, 1)
RETURN -1
END
-- FileContents
DELETE FROM dbo.FileContents
WHERE [ID] IN
(
SELECT FileContentsID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.FileContents', 16,
1)
RETURN -1
END
-- FileReceive
DELETE FROM dbo.FileReceive
WHERE [ID] IN
(
SELECT FileReceiveID FROM @.tmp
)
IF @.@.ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.FileReceive', 16, 1)
RETURN -1
END
IF @.@.TRANCOUNT > 0
BEGIN
COMMIT TRAN
RETURN 0
END
END
GO1) Don't use table variables for large amounts of data (more than say 500
rows), use temporary tables.
2) Use joins or EXISTS, not IN.
3) Create indexes on all columns that you join on, including those in the
temporary tables.
4) By the looks of it you can cut out quite a few joins, specially between
the OutboundFile and Transactions tables.
Jacco Schalkwijk
SQL Server MVP
"Matt" <Matt@.discussions.microsoft.com> wrote in message
news:E40D3532-FD70-43BE-A437-0B2168A71C75@.microsoft.com...
>I need to delete several million transactions from a production database.
>I
> have tried the SP below but it takes 4 1/2 hours to delete 5,000
> transactions. Can someone please look over this and let me know if there
> is
> someway to speed up this process? I can't truncate the table because I
> still
> need the most recent (6 months) worth of transactions. The remaining
> transactions date back to 10/21/2003 and have been backed up and now need
> to
> be deleted.
> Thanks in advance,
> Matt
>
> CREATE PROC dbo.ArchiveProduction
> (
> @.CutOffDate DATETIME = NULL
> )
> AS
> BEGIN
> SET NOCOUNT ON
> IF @.CutOffDate IS NULL
> BEGIN
> SET @.CutOffDate = DATEADD(mm, -6, CURRENT_TIMESTAMP)
> END
> ELSE
> BEGIN
> IF @.CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP)
> BEGIN
> RAISERROR ('Cannot delete transactions from last three months', 16, 1)
> RETURN -1
> END
> END
> BEGIN TRAN
> -- Fill a temp table with the transaction id's to delete
> DECLARE @.tmp table(TranID int, BatchID int, OutboundFileID int,
> FileContentsID int, FileReceiveID int)
> INSERT INTO @.tmp
> SELECT t.[ID] as TranID, b.[ID] as BatchID, t.OutboundFileID, f.[ID] AS
> FileContentsID,
> r.[ID] AS FileReceiveID
> FROM FileReceive r LEFT OUTER JOIN FileContents f ON r.[ID] =
> f.FileReceiveID
> LEFT OUTER JOIN Batches b ON f.[ID] = b.FileContentsID
> LEFT OUTER JOIN Transactions t ON b.[ID] = t.BatchID
> WHERE r.FileFindDate < @.CutOffDate
> -- Acknowledgments
> DELETE FROM dbo.Acknowledgments
> WHERE TransactionID IN
> (
> Select TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Acknowledgments',
> 16, 1)
> RETURN -1
> END
> -- OutboundTranSet
> DELETE FROM dbo.OutboundTranSet
> FROM dbo.OutboundTranSet o INNER JOIN dbo.OutboundBatch b
> ON o.OutboundBatchID = b.[ID] INNER JOIN OutboundFile f
> ON f.[ID] = b.FileID INNER JOIN dbo.Transactions t
> ON f.[ID] = t.OutboundFileID
> WHERE t.OutboundFileID IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundTranSet',
> 16, 1)
> RETURN -1
> END
> -- OutboundBatch
> DELETE FROM dbo.OutboundBatch
> FROM dbo.OutboundBatch o INNER JOIN dbo.OutboundFile f
> ON o.FileID = f.[ID] INNER JOIN dbo.Transactions t
> ON f.[ID] = t.OutboundFileID
> WHERE t.OutboundFileID IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundBatch',
> 16, 1)
> RETURN -1
> END
> -- OutboundFile
> DELETE FROM dbo.OutboundFile
> WHERE [ID] IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundFile', 16,
> 1)
> RETURN -1
> END
> -- PayorResponse
> DELETE FROM dbo.PayorResponse
> WHERE TransactionID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.PayorResponse',
> 16, 1)
> RETURN -1
> END
> -- ClaimInfo
> DELETE FROM dbo.ClaimInfo
> WHERE TransactionID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.ClaimInfo', 16, 1)
> RETURN -1
> END
> -- Transactions
> -- AttachmentImages
> DELETE FROM dbo.AttachmentImages
> FROM dbo.AttachmentImages a INNER JOIN dbo.AttachmentsToClaims c
> ON a.A2CID = c.[ID]
> WHERE c.TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.AttachmentImages',
> 16, 1)
> RETURN -1
> END
> -- AttachmentsToClaims
> DELETE FROM dbo.AttachmentsToClaims
> WHERE TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from
> dbo.AttachmentsToClaims', 16, 1)
> RETURN -1
> END
> -- ClaimsToAttachments
> DELETE FROM dbo.ClaimsToAttachments
> WHERE TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from
> dbo.ClaimsToAttachments', 16, 1)
> RETURN -1
> END
> -- Transactions
> DELETE FROM dbo.Transactions
> WHERE [ID] IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Transactions', 16,
> 1)
> RETURN -1
> END
> -- Batches
> -- Batches
> DELETE FROM dbo.Batches
> WHERE [ID] IN
> (
> SELECT BatchID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Batches', 16, 1)
> RETURN -1
> END
> -- FileContents
> DELETE FROM dbo.FileContents
> WHERE [ID] IN
> (
> SELECT FileContentsID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.FileContents', 16,
> 1)
> RETURN -1
> END
>
> -- FileReceive
> DELETE FROM dbo.FileReceive
> WHERE [ID] IN
> (
> SELECT FileReceiveID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.FileReceive', 16,
> 1)
> RETURN -1
> END
> IF @.@.TRANCOUNT > 0
> BEGIN
> COMMIT TRAN
> RETURN 0
> END
> END
> GO
>|||Thanks, Jacco! I'll give it a shot.
"Jacco Schalkwijk" wrote:

> 1) Don't use table variables for large amounts of data (more than say 500
> rows), use temporary tables.
> 2) Use joins or EXISTS, not IN.
> 3) Create indexes on all columns that you join on, including those in the
> temporary tables.
> 4) By the looks of it you can cut out quite a few joins, specially between
> the OutboundFile and Transactions tables.
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Matt" <Matt@.discussions.microsoft.com> wrote in message
> news:E40D3532-FD70-43BE-A437-0B2168A71C75@.microsoft.com...
>
>|||Depending on how many records are in the tmp Table, Your problem *might* be
because there are no indices on that tmp table. WHich means that each delet
e
must do a complete table scan on that table. SO, If there are a chunk of
records in there, I might suggest using four table variables instead, as
follows:
Declare @.TranIDs Table(TId Integer Primary Key Not Null)
Declare @.BatchesIDs Table(BId Integer Primary Key Not Null)
Declare @.FileIDs Table(FId Integer Primary Key Not Null)
Declare @.ContentIDs Table(CId Integer Primary Key Not Null)
Declare @.ReceiveIDs Table(RId Integer Primary Key Not Null)
And then use these tables in all the delete queries instead... This will
allow each delete query to use Indexes on Table variables and may improve
performace substantially..
Also I suggest you put the errhandling at the end, and just "Goto" a named
label... It'll make the SP much cleaner and easier to maintain, as so:
CREATE PROC dbo.ArchiveProduction
@.CutOffDate DATETIME = NULL
AS
BEGIN
SET NOCOUNT ON
Declare @.Msg VarChar(300)
Declare @.Err Integer
Set @.CutOffDate = IsNUll(@.CutOffDate, DATEADD(mm, -6, CURRENT_TIMESTAMP))
Set @.Msg = 'Cannot delete transactions from last three months'
IF @.CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP) Goto ErrHandler
-- ----
Declare @.TranIDs Table(TId Integer Primary Key Not Null)
Declare @.BatchIDs Table(BId Integer Primary Key Not Null)
Declare @.FileIDs Table(FId Integer Primary Key Not Null)
Declare @.ContentIDs Table(CId Integer Primary Key Not Null)
Declare @.ReceiveIDs Table(RId Integer Primary Key Not Null)
BEGIN TRAN
-- Fill a temp table with the transaction id's to delete
INSERT @.ReceiveIDs(RId)
Select Distinct ID FROM FileReceive r
WHERE FileFindDate < @.CutOffDate
-- --
Insert @.ContentIDs(CId)
Select Distinct C.ID From FileContents C
Join @.ReceiveIDs R On R.RID = C.FileReceiveID
-- --
Insert @.BatchIDs(BId)
Select Distinct B.ID From Batches B
Join @.ContentIDs C On C.CId = B.FileContentsID
Insert @.TranIDs (TId)
Select Distinct ID From Transactions
Join @.BatchIDs B On B.Bid = T.ID
/*SELECT t.[ID] as TranID,
b.[ID] BatchID,
t.OutboundFileID,
f.[ID] FileContentsID,
r.[ID] FileReceiveID
FROM FileReceive r
LEFT JOIN FileContents f ON f.FileReceiveID = r.[ID]
LEFT JOIN Batches b ON b.FileContentsID = f.[ID]
LEFT JOIN Transactions t ON t.BatchID = b.[ID]
WHERE r.FileFindDate < @.CutOffDate */
Begin Transaction
-- Acknowledgments
'Error occured while deleting data from dbo.Acknowledgments'
DELETE dbo.Acknowledgments
WHERE TransactionID IN (Select TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- OutboundTranSet
Set @.Msg = 'Error occured while deleting data from dbo.OutboundTranSet'
DELETE dbo.OutboundTranSet
FROM dbo.OutboundTranSet o
JOIN dbo.OutboundBatch b ON o.OutboundBatchID = b.[ID]
JOIN OutboundFile f ON f.[ID] = b.FileID
JOIN dbo.Transactions t ON f.[ID] = t.OutboundFileID
WHERE t.ID IN (Select TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- OutboundBatch
Set @.Msg = 'Error occured while deleting data from dbo.OutboundBatch'
DELETE dbo.OutboundBatch
FROM dbo.OutboundBatch o
JOIN dbo.OutboundFile f ON o.FileID = f.[ID]
JOIN dbo.Transactions t ON f.[ID] = t.OutboundFileID
WHERE t.ID IN (Select TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- OutboundFile
Set @.Msg = 'Error occured while deleting data from dbo.OutboundFile'
DELETE dbo.OutboundFile
From OutboundFile F Join Transactions T
On T.OutboundFileID = F.ID
WHERE T.ID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- PayorResponse
Set @.Msg = 'Error occured while deleting data from dbo.PayorResponse'
DELETE dbo.PayorResponse
WHERE TransactionID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- ClaimInfo
Set @.Msg = 'Error occured while deleting data from dbo.ClaimInfo'
DELETE dbo.ClaimInfo
WHERE TransactionID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- Transactions
-- AttachmentImages
Set @.Msg = 'Error occured while deleting data from dbo.AttachmentImages'
DELETE dbo.AttachmentImages
FROM dbo.AttachmentImages a
JOIN dbo.AttachmentsToClaims c
ON a.A2CID = c.[ID]
WHERE c.TranID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- AttachmentsToClaims
Set @.Msg = 'Error occured while deleting data from
dbo.AttachmentsToClaims'
DELETE FROM dbo.AttachmentsToClaims
WHERE TranID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- ClaimsToAttachments
Set @.Msg = 'Error occured while deleting data from
dbo.ClaimsToAttachments'
DELETE FROM dbo.ClaimsToAttachments
WHERE TranID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- Transactions
Set @.Msg = 'Error occured while deleting data from dbo.Transactions'
DELETE dbo.Transactions
WHERE [ID] IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- Batches
-- Batches
Set @.Msg = 'Error occured while deleting data from dbo.Batches'
DELETE dbo.Batches
WHERE [ID] IN (SELECT BatchID FROM @.BatchIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- FileContents
Set @.Msg = 'Error occured while deleting data from dbo.FileContents'
DELETE dbo.FileContents
WHERE [ID] IN (SELECT FID FROM @.ContentIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- FileReceive
Set @.Msg = 'Error occured while deleting data from dbo.FileReceive'
DELETE dbo.FileReceive
WHERE [ID] IN (SELECT RID FROM @.ReceiveIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
COMMIT TRAN
RETURN (0) -- This is the end of normal processing
-- ***********************************
ErrHandler:
If @.@.TranCount > 0 RollBack Transaction
Raiserror(@.Msg, 16, 1)
If @.Err = 0 Set @.Err = -1
Return(@.Err)|||oh oh, some extra stuff in there I neglected to delete.. Here's correctted
version...
CREATE PROC dbo.ArchiveProduction
@.CutOffDate DATETIME = NULL
AS
BEGIN
SET NOCOUNT ON
Declare @.Msg VarChar(300)
Declare @.Err Integer
Set @.CutOffDate = IsNUll(@.CutOffDate, DATEADD(mm, -6, CURRENT_TIMESTAMP))
Set @.Msg = 'Cannot delete transactions from last three months'
IF @.CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP) Goto ErrHandler
-- ----
Declare @.TranIDs Table(TId Integer Primary Key Not Null)
Declare @.BatchIDs Table(BId Integer Primary Key Not Null)
Declare @.FileIDs Table(FId Integer Primary Key Not Null)
Declare @.ContentIDs Table(CId Integer Primary Key Not Null)
Declare @.ReceiveIDs Table(RId Integer Primary Key Not Null)
INSERT @.ReceiveIDs(RId)
Select Distinct ID FROM FileReceive r
WHERE FileFindDate < @.CutOffDate
-- --
Insert @.ContentIDs(CId)
Select Distinct C.ID From FileContents C
Join @.ReceiveIDs R On R.RID = C.FileReceiveID
-- --
Insert @.BatchIDs(BId)
Select Distinct B.ID From Batches B
Join @.ContentIDs C On C.CId = B.FileContentsID
Insert @.TranIDs (TId)
Select Distinct ID From Transactions
Join @.BatchIDs B On B.Bid = T.ID
Begin Transaction
-- Acknowledgments
'Error occured while deleting data from dbo.Acknowledgments'
DELETE dbo.Acknowledgments
WHERE TransactionID IN (Select TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- OutboundTranSet
Set @.Msg = 'Error occured while deleting data from dbo.OutboundTranSet'
DELETE dbo.OutboundTranSet
FROM dbo.OutboundTranSet o
JOIN dbo.OutboundBatch b ON o.OutboundBatchID = b.[ID]
JOIN OutboundFile f ON f.[ID] = b.FileID
JOIN dbo.Transactions t ON f.[ID] = t.OutboundFileID
WHERE t.ID IN (Select TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- OutboundBatch
Set @.Msg = 'Error occured while deleting data from dbo.OutboundBatch'
DELETE dbo.OutboundBatch
FROM dbo.OutboundBatch o
JOIN dbo.OutboundFile f ON o.FileID = f.[ID]
JOIN dbo.Transactions t ON f.[ID] = t.OutboundFileID
WHERE t.ID IN (Select TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- OutboundFile
Set @.Msg = 'Error occured while deleting data from dbo.OutboundFile'
DELETE dbo.OutboundFile
From OutboundFile F Join Transactions T
On T.OutboundFileID = F.ID
WHERE T.ID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- PayorResponse
Set @.Msg = 'Error occured while deleting data from dbo.PayorResponse'
DELETE dbo.PayorResponse
WHERE TransactionID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- ClaimInfo
Set @.Msg = 'Error occured while deleting data from dbo.ClaimInfo'
DELETE dbo.ClaimInfo
WHERE TransactionID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- Transactions
-- AttachmentImages
Set @.Msg = 'Error occured while deleting data from dbo.AttachmentImages'
DELETE dbo.AttachmentImages
FROM dbo.AttachmentImages a
JOIN dbo.AttachmentsToClaims c
ON a.A2CID = c.[ID]
WHERE c.TranID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- AttachmentsToClaims
Set @.Msg = 'Error occured while deleting data from
dbo.AttachmentsToClaims'
DELETE FROM dbo.AttachmentsToClaims
WHERE TranID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- ClaimsToAttachments
Set @.Msg = 'Error occured while deleting data from
dbo.ClaimsToAttachments'
DELETE FROM dbo.ClaimsToAttachments
WHERE TranID IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- Transactions
Set @.Msg = 'Error occured while deleting data from dbo.Transactions'
DELETE dbo.Transactions
WHERE [ID] IN (SELECT TID FROM @.TranIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- Batches
-- Batches
Set @.Msg = 'Error occured while deleting data from dbo.Batches'
DELETE dbo.Batches
WHERE [ID] IN (SELECT BatchID FROM @.BatchIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- FileContents
Set @.Msg = 'Error occured while deleting data from dbo.FileContents'
DELETE dbo.FileContents
WHERE [ID] IN (SELECT FID FROM @.ContentIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
-- FileReceive
Set @.Msg = 'Error occured while deleting data from dbo.FileReceive'
DELETE dbo.FileReceive
WHERE [ID] IN (SELECT RID FROM @.ReceiveIDs)
Set @.Err = @.@.Error If @.Err <> 0 Goto ErrHandler
-- ---
COMMIT TRAN
RETURN (0)
-- ***********************************
ErrHandler:
If @.@.TranCount > 0 RollBack Transaction
Raiserror(@.Msg, 16, 1)
If @.Err = 0 Set @.Err = -1
Return(@.Err)|||What is the recovery model on your database? Try setting to simple recovery.
Also, create a loop where you delete only 100,000 records in each iteration
followed by a checkpoint. You will also want to perform a backup just prior
to this.
"Matt" <Matt@.discussions.microsoft.com> wrote in message
news:E40D3532-FD70-43BE-A437-0B2168A71C75@.microsoft.com...
> I need to delete several million transactions from a production database.
I
> have tried the SP below but it takes 4 1/2 hours to delete 5,000
> transactions. Can someone please look over this and let me know if there
is
> someway to speed up this process? I can't truncate the table because I
still
> need the most recent (6 months) worth of transactions. The remaining
> transactions date back to 10/21/2003 and have been backed up and now need
to
> be deleted.
> Thanks in advance,
> Matt
>
> CREATE PROC dbo.ArchiveProduction
> (
> @.CutOffDate DATETIME = NULL
> )
> AS
> BEGIN
> SET NOCOUNT ON
> IF @.CutOffDate IS NULL
> BEGIN
> SET @.CutOffDate = DATEADD(mm, -6, CURRENT_TIMESTAMP)
> END
> ELSE
> BEGIN
> IF @.CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP)
> BEGIN
> RAISERROR ('Cannot delete transactions from last three months', 16, 1)
> RETURN -1
> END
> END
> BEGIN TRAN
> -- Fill a temp table with the transaction id's to delete
> DECLARE @.tmp table(TranID int, BatchID int, OutboundFileID int,
> FileContentsID int, FileReceiveID int)
> INSERT INTO @.tmp
> SELECT t.[ID] as TranID, b.[ID] as BatchID, t.OutboundFileID, f.[ID] AS
> FileContentsID,
> r.[ID] AS FileReceiveID
> FROM FileReceive r LEFT OUTER JOIN FileContents f ON r.[ID] =
> f.FileReceiveID
> LEFT OUTER JOIN Batches b ON f.[ID] = b.FileContentsID
> LEFT OUTER JOIN Transactions t ON b.[ID] = t.BatchID
> WHERE r.FileFindDate < @.CutOffDate
> -- Acknowledgments
> DELETE FROM dbo.Acknowledgments
> WHERE TransactionID IN
> (
> Select TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Acknowledgments',
> 16, 1)
> RETURN -1
> END
> -- OutboundTranSet
> DELETE FROM dbo.OutboundTranSet
> FROM dbo.OutboundTranSet o INNER JOIN dbo.OutboundBatch b
> ON o.OutboundBatchID = b.[ID] INNER JOIN OutboundFile f
> ON f.[ID] = b.FileID INNER JOIN dbo.Transactions t
> ON f.[ID] = t.OutboundFileID
> WHERE t.OutboundFileID IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundTranSet',
> 16, 1)
> RETURN -1
> END
> -- OutboundBatch
> DELETE FROM dbo.OutboundBatch
> FROM dbo.OutboundBatch o INNER JOIN dbo.OutboundFile f
> ON o.FileID = f.[ID] INNER JOIN dbo.Transactions t
> ON f.[ID] = t.OutboundFileID
> WHERE t.OutboundFileID IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundBatch',
> 16, 1)
> RETURN -1
> END
> -- OutboundFile
> DELETE FROM dbo.OutboundFile
> WHERE [ID] IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundFile', 16,
> 1)
> RETURN -1
> END
> -- PayorResponse
> DELETE FROM dbo.PayorResponse
> WHERE TransactionID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.PayorResponse',
> 16, 1)
> RETURN -1
> END
> -- ClaimInfo
> DELETE FROM dbo.ClaimInfo
> WHERE TransactionID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.ClaimInfo', 16, 1)
> RETURN -1
> END
> -- Transactions
> -- AttachmentImages
> DELETE FROM dbo.AttachmentImages
> FROM dbo.AttachmentImages a INNER JOIN dbo.AttachmentsToClaims c
> ON a.A2CID = c.[ID]
> WHERE c.TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.AttachmentImages',
> 16, 1)
> RETURN -1
> END
> -- AttachmentsToClaims
> DELETE FROM dbo.AttachmentsToClaims
> WHERE TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from
> dbo.AttachmentsToClaims', 16, 1)
> RETURN -1
> END
> -- ClaimsToAttachments
> DELETE FROM dbo.ClaimsToAttachments
> WHERE TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from
> dbo.ClaimsToAttachments', 16, 1)
> RETURN -1
> END
> -- Transactions
> DELETE FROM dbo.Transactions
> WHERE [ID] IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Transactions', 16,
> 1)
> RETURN -1
> END
> -- Batches
> -- Batches
> DELETE FROM dbo.Batches
> WHERE [ID] IN
> (
> SELECT BatchID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Batches', 16, 1)
> RETURN -1
> END
> -- FileContents
> DELETE FROM dbo.FileContents
> WHERE [ID] IN
> (
> SELECT FileContentsID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.FileContents', 16,
> 1)
> RETURN -1
> END
>
> -- FileReceive
> DELETE FROM dbo.FileReceive
> WHERE [ID] IN
> (
> SELECT FileReceiveID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.FileReceive', 16,
1)
> RETURN -1
> END
> IF @.@.TRANCOUNT > 0
> BEGIN
> COMMIT TRAN
> RETURN 0
> END
> END
> GO
>|||Thanks to all of you for your help. With your input the delete now only
takes 2 1/2 minutes to delete 5,000 transactions and all associated items.
"Matt" wrote:

> I need to delete several million transactions from a production database.
I
> have tried the SP below but it takes 4 1/2 hours to delete 5,000
> transactions. Can someone please look over this and let me know if there
is
> someway to speed up this process? I can't truncate the table because I st
ill
> need the most recent (6 months) worth of transactions. The remaining
> transactions date back to 10/21/2003 and have been backed up and now need
to
> be deleted.
> Thanks in advance,
> Matt
>
> CREATE PROC dbo.ArchiveProduction
> (
> @.CutOffDate DATETIME = NULL
> )
> AS
> BEGIN
> SET NOCOUNT ON
> IF @.CutOffDate IS NULL
> BEGIN
> SET @.CutOffDate = DATEADD(mm, -6, CURRENT_TIMESTAMP)
> END
> ELSE
> BEGIN
> IF @.CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP)
> BEGIN
> RAISERROR ('Cannot delete transactions from last three months', 16, 1)
> RETURN -1
> END
> END
> BEGIN TRAN
> -- Fill a temp table with the transaction id's to delete
> DECLARE @.tmp table(TranID int, BatchID int, OutboundFileID int,
> FileContentsID int, FileReceiveID int)
> INSERT INTO @.tmp
> SELECT t.[ID] as TranID, b.[ID] as BatchID, t.OutboundFileID, f.[ID] AS
> FileContentsID,
> r.[ID] AS FileReceiveID
> FROM FileReceive r LEFT OUTER JOIN FileContents f ON r.[ID] =
> f.FileReceiveID
> LEFT OUTER JOIN Batches b ON f.[ID] = b.FileContentsID
> LEFT OUTER JOIN Transactions t ON b.[ID] = t.BatchID
> WHERE r.FileFindDate < @.CutOffDate
> -- Acknowledgments
> DELETE FROM dbo.Acknowledgments
> WHERE TransactionID IN
> (
> Select TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Acknowledgments'
,
> 16, 1)
> RETURN -1
> END
> -- OutboundTranSet
> DELETE FROM dbo.OutboundTranSet
> FROM dbo.OutboundTranSet o INNER JOIN dbo.OutboundBatch b
> ON o.OutboundBatchID = b.[ID] INNER JOIN OutboundFile f
> ON f.[ID] = b.FileID INNER JOIN dbo.Transactions t
> ON f.[ID] = t.OutboundFileID
> WHERE t.OutboundFileID IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundTranSet'
,
> 16, 1)
> RETURN -1
> END
> -- OutboundBatch
> DELETE FROM dbo.OutboundBatch
> FROM dbo.OutboundBatch o INNER JOIN dbo.OutboundFile f
> ON o.FileID = f.[ID] INNER JOIN dbo.Transactions t
> ON f.[ID] = t.OutboundFileID
> WHERE t.OutboundFileID IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundBatch',
> 16, 1)
> RETURN -1
> END
> -- OutboundFile
> DELETE FROM dbo.OutboundFile
> WHERE [ID] IN
> (
> SELECT OutboundFileID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.OutboundFile', 1
6,
> 1)
> RETURN -1
> END
> -- PayorResponse
> DELETE FROM dbo.PayorResponse
> WHERE TransactionID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.PayorResponse',
> 16, 1)
> RETURN -1
> END
> -- ClaimInfo
> DELETE FROM dbo.ClaimInfo
> WHERE TransactionID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.ClaimInfo', 16,
1)
> RETURN -1
> END
> -- Transactions
> -- AttachmentImages
> DELETE FROM dbo.AttachmentImages
> FROM dbo.AttachmentImages a INNER JOIN dbo.AttachmentsToClaims c
> ON a.A2CID = c.[ID]
> WHERE c.TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.AttachmentImages
',
> 16, 1)
> RETURN -1
> END
> -- AttachmentsToClaims
> DELETE FROM dbo.AttachmentsToClaims
> WHERE TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from
> dbo.AttachmentsToClaims', 16, 1)
> RETURN -1
> END
> -- ClaimsToAttachments
> DELETE FROM dbo.ClaimsToAttachments
> WHERE TranID IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from
> dbo.ClaimsToAttachments', 16, 1)
> RETURN -1
> END
> -- Transactions
> DELETE FROM dbo.Transactions
> WHERE [ID] IN
> (
> SELECT TranID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Transactions', 1
6,
> 1)
> RETURN -1
> END
> -- Batches
> -- Batches
> DELETE FROM dbo.Batches
> WHERE [ID] IN
> (
> SELECT BatchID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.Batches', 16, 1)
> RETURN -1
> END
> -- FileContents
> DELETE FROM dbo.FileContents
> WHERE [ID] IN
> (
> SELECT FileContentsID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.FileContents', 1
6,
> 1)
> RETURN -1
> END
>
> -- FileReceive
> DELETE FROM dbo.FileReceive
> WHERE [ID] IN
> (
> SELECT FileReceiveID FROM @.tmp
> )
> IF @.@.ERROR <> 0
> BEGIN
> ROLLBACK TRAN
> RAISERROR ('Error occured while deleting data from dbo.FileReceive', 16
, 1)
> RETURN -1
> END
> IF @.@.TRANCOUNT > 0
> BEGIN
> COMMIT TRAN
> RETURN 0
> END
> END
> GO
>

Wednesday, March 7, 2012

faster count(*) or alternative

hi,

I have a table with several million rows.
Each row is simply the date and time a certain page was viewed.

eg
page1 1-1-00
page2 2-1-00
page1 16-1-00
page1 17-1-00
page2 19-1-00

I now need to find the most popular pages so I do this...

SELECT TOP 10 COUNT(*) AS mycount FROM tblPageViews
GROUP BY place ORDER BY COUNT(place) DESC

...which gives me the top 10 most viewed pages

big problem - this is getting slower and slower as the table grows and
grows.

what should I do?

is there an alternative?

I think I need all the rows (rather than a simple incrementing
counter) because I might want to say "what was the most popular page
on a certain day or within a certain period"

tia

Timnot entirely sure why my browser posted my original message again!
sorry.

thanks guys for your help so far.
yes I have indexes on the columns

erland - I don't quite get what you are doing with creating and
dropping a table and a view?

Tim|||C10B (tswalton@.gmail.com) writes:

Quote:

Originally Posted by

erland - I don't quite get what you are doing with creating and
dropping a table and a view?


The DROP at the end is just cleanup. The script was meant to show how
you create an indexed view and how to use it.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||I'm still mildly puzzled as to how an indexed view would be quicker to
function than going straight to the table. Surely there's exactly the
same amount of work to be done?
I'll check it out though and let you know if it's any quicker.

Thanks

Tim|||"C10B" <tswalton@.gmail.comwrote in message
news:1172494748.828877.126110@.m58g2000cwm.googlegr oups.com...

Quote:

Originally Posted by

I'm still mildly puzzled as to how an indexed view would be quicker to
function than going straight to the table. Surely there's exactly the
same amount of work to be done?
I'll check it out though and let you know if it's any quicker.
>
Thanks
>
Tim
>


If you want more help, I suggest you give us a complete DDL of your
database, some real example data and perhaps we can do better.

--
Greg Moore
SQL Server DBA Consulting
sql (at) greenms.com http://www.greenms.com|||

Quote:

Originally Posted by

If you want more help, I suggest you give us a complete DDL of your
database, some real example data and perhaps we can do better.


The requirement is simply this...

I have a lot of pages and I want to record when each is visited.
I then need to be able to answer questions like this...

"Which was the most popular page in January 07?"
"Which are the top 10 most viewed pages between 2 and 3 in the
afternoon"

with unlimited segmenting of the data really.

So I thought I would record an entry in a table everytime a page is
viewed, along with the date and time.
Using some simple queries using "count(*)", a where clause and a group
by clause I was answering all the questions.
Trouble is the speed. Several million records (and growing rapidly)
and count(*) gets slow.

Just while I was writing this I had an idea - I could move a finished
month's data to another table, so each month has a table of data. It
might still be a million records though, but at least it wont get
worse and worse over time. This would make some results harder to get
like "most viewed page of all time" but my monthy, hourly, weekly
figures would be quicker.

Any other tips?

Thanks|||C10B (tswalton@.gmail.com) writes:

Quote:

Originally Posted by

I'm still mildly puzzled as to how an indexed view would be quicker to
function than going straight to the table. Surely there's exactly the
same amount of work to be done?


No. An indexed view is materialised, and this particular view has one
row per webpage with the count of visitors. An alternative would be have
a separate table with the counts, and update that table through a trigger.
But why roll our own when SQL Server can do the job for us?

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||"C10B" <tswalton@.gmail.comwrote in message
news:1172502298.291068.60420@.s48g2000cws.googlegro ups.com...

Quote:

Originally Posted by

>

Quote:

Originally Posted by

>If you want more help, I suggest you give us a complete DDL of your
>database, some real example data and perhaps we can do better.


>
>
The requirement is simply this...
>
I have a lot of pages and I want to record when each is visited.
I then need to be able to answer questions like this...
>


Yes, we understand the requirements. That doesn't eliminate the value of a
DDL and sample data.

Quote:

Originally Posted by

"Which was the most popular page in January 07?"
"Which are the top 10 most viewed pages between 2 and 3 in the
afternoon"
>
with unlimited segmenting of the data really.
>
So I thought I would record an entry in a table everytime a page is
viewed, along with the date and time.
Using some simple queries using "count(*)", a where clause and a group
by clause I was answering all the questions.
Trouble is the speed. Several million records (and growing rapidly)
and count(*) gets slow.
>


Define slow?

(my former employer did queries similar to this in subsecond times.)

Quote:

Originally Posted by

Just while I was writing this I had an idea - I could move a finished
month's data to another table, so each month has a table of data. It
might still be a million records though, but at least it wont get
worse and worse over time. This would make some results harder to get
like "most viewed page of all time" but my monthy, hourly, weekly
figures would be quicker.
>
Any other tips?


Yes, post a full DDL.

Quote:

Originally Posted by

>
Thanks
>


--
Greg Moore
SQL Server DBA Consulting
sql (at) greenms.com http://www.greenms.com|||C10B (tswalton@.gmail.com) writes:

Quote:

Originally Posted by

I have a lot of pages and I want to record when each is visited.
I then need to be able to answer questions like this...
>
"Which was the most popular page in January 07?"
"Which are the top 10 most viewed pages between 2 and 3 in the
afternoon"
>
with unlimited segmenting of the data really.
>
So I thought I would record an entry in a table everytime a page is
viewed, along with the date and time.
Using some simple queries using "count(*)", a where clause and a group
by clause I was answering all the questions.
Trouble is the speed. Several million records (and growing rapidly)
and count(*) gets slow.


You did not say anything about wanting to get the counts for a certain
day, so I interpreted your question that you wanted total counts of
the entire table. Whence my suggestion of an indexed view.

If you are not able to relate clearly what you want to do, then the
answers you get will be equally imprecise.

Quote:

Originally Posted by

Just while I was writing this I had an idea - I could move a finished
month's data to another table, so each month has a table of data. It
might still be a million records though, but at least it wont get
worse and worse over time. This would make some results harder to get
like "most viewed page of all time" but my monthy, hourly, weekly
figures would be quicker.


There is no need for that. If you always want to run these count
queries for a certain period, make sure that there is a clustered
index on the datetime column in your table.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

faster count(*) or alternative

hi,

I have a table with several million rows.
Each row is simply the date and time a certain page was viewed.

eg
page1 1-1-00
page2 2-1-00
page1 16-1-00
page1 17-1-00
page2 19-1-00

I now need to find the most popular pages so I do this...

SELECT TOP 10 COUNT(*) AS mycount FROM tblPageViews
GROUP BY place ORDER BY COUNT(place) DESC

...which gives me the top 10 most viewed pages

big problem - this is getting slower and slower as the table grows and
grows.

what should I do?

is there an alternative?

I think I need all the rows (rather than a simple incrementing
counter) because I might want to say "what was the most popular page
on a certain day or within a certain period"

tia

Tim"C10B" <tswalton@.gmail.comwrote in message
news:1172439270.641426.83160@.a75g2000cwd.googlegro ups.com...

Quote:

Originally Posted by

hi,
>
I have a table with several million rows.
Each row is simply the date and time a certain page was viewed.


First question, do you have an index on the table.

That should help.

Quote:

Originally Posted by

>
eg
page1 1-1-00
page2 2-1-00
page1 16-1-00
page1 17-1-00
page2 19-1-00
>
I now need to find the most popular pages so I do this...
>
SELECT TOP 10 COUNT(*) AS mycount FROM tblPageViews
GROUP BY place ORDER BY COUNT(place) DESC
>
...which gives me the top 10 most viewed pages
>
big problem - this is getting slower and slower as the table grows and
grows.
>
what should I do?
>
is there an alternative?
>
I think I need all the rows (rather than a simple incrementing
counter) because I might want to say "what was the most popular page
on a certain day or within a certain period"
>
tia
>
Tim
>


--
Greg Moore
SQL Server DBA Consulting
sql (at) greenms.com http://www.greenms.com|||C10B (tswalton@.gmail.com) writes:

Quote:

Originally Posted by

I have a table with several million rows.
Each row is simply the date and time a certain page was viewed.
>
eg
page1 1-1-00
page2 2-1-00
page1 16-1-00
page1 17-1-00
page2 19-1-00
>
I now need to find the most popular pages so I do this...
>
SELECT TOP 10 COUNT(*) AS mycount FROM tblPageViews
GROUP BY place ORDER BY COUNT(place) DESC
>
...which gives me the top 10 most viewed pages
>
big problem - this is getting slower and slower as the table grows and
grows.
>
what should I do?


A non-clustered index on place would help, as SQL Server then only
would have to scan that index, and not the entire difference. But it
seems that the table has two columns. In such case, the index would not
reduce execution time that much.

A better alternative may be to define an indexed view that maintains
the count:

CREATE TABLE pagehits (pageid varchar(20) NOT NULL,
viewtime datetime NOT NULL,
PRIMARY KEY (pageid, viewtime)
)
go
CREATE VIEW pagecount WITH SCHEMABINDING AS
SELECT pageid, cnt = COUNT_BIG(*)
FROM dbo.pagehits
GROUP BY pageid
go
CREATE UNIQUE CLUSTERED INDEX pagecount_ix ON pagecount(pageid)
go
SELECT TOP 10 pageid, cnt
FROM pagecount WITH (NOEXPAND)
ORDER BY cnt
go
DROP VIEW pagecount
DROP TABLE pagehits

I added the NOEXPAND hint to the query, since it's only on Enterprise
Edition, the optimizer considers indexed views.

Note that this could have effect on performance when writing to the table.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Fastest returning query would be on a separate table that maintains the
count to the granularity you would require in your output (week, day, or
hour). A query that requires more than hourly granularity can still be done
on the source table.

For each new week|day|hour add a row for each page to be tracked. This can
be done dynamically during monitoring but it is better to do it ahead of
time.

Then put an insert trigger on your million row table that will increment the
count on the proper row of the table.

An alternative would be a temp table that contains a subset copy of the
source data for the time range most likely to queried. For example a table
that contains the last 31 days of data, the 32nd day of data
deleted/archived every night.

"C10B" <tswalton@.gmail.comwrote in message
news:1172439270.641426.83160@.a75g2000cwd.googlegro ups.com...

Quote:

Originally Posted by

hi,
>
I have a table with several million rows.
Each row is simply the date and time a certain page was viewed.
>
eg
page1 1-1-00
page2 2-1-00
page1 16-1-00
page1 17-1-00
page2 19-1-00
>
I now need to find the most popular pages so I do this...
>
SELECT TOP 10 COUNT(*) AS mycount FROM tblPageViews
GROUP BY place ORDER BY COUNT(place) DESC
>
...which gives me the top 10 most viewed pages
>
big problem - this is getting slower and slower as the table grows and
grows.
>
what should I do?
>
is there an alternative?
>
I think I need all the rows (rather than a simple incrementing
counter) because I might want to say "what was the most popular page
on a certain day or within a certain period"
>
tia
>
Tim
>

|||erland - you are quite right, I could have worded my first post a
little better

greg - I don't know what a DDL is

russ - thanks|||C10B (tswalton@.gmail.com) writes:

Quote:

Originally Posted by

greg - I don't know what a DDL is


DDL = Data Definition Language. That is, CREATE TABLE and CREATE INDEX and
the like.

If I had had a penny for every time someone says DDL in answer, without the
person asking having no idea what it means, I would be a rich man now.

Quote:

Originally Posted by

russ - thanks


What Russ proposed is the same idea that I proposed, but rather than relying
SQL Server updating an indexed view, he suggested a separate table that you
update through a trigger.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||"Erland Sommarskog" <esquel@.sommarskog.sewrote in message
news:Xns98E480B95D736Yazorman@.127.0.0.1...

Quote:

Originally Posted by

C10B (tswalton@.gmail.com) writes:

Quote:

Originally Posted by

>greg - I don't know what a DDL is


>
DDL = Data Definition Language. That is, CREATE TABLE and CREATE INDEX and
the like.
>
If I had had a penny for every time someone says DDL in answer, without
the
person asking having no idea what it means, I would be a rich man now.
>

Quote:

Originally Posted by

>russ - thanks


>
What Russ proposed is the same idea that I proposed, but rather than
relying
SQL Server updating an indexed view, he suggested a separate table that
you
update through a trigger.
>


Indexed view is probably best, I just like having more control.

Quote:

Originally Posted by

>
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
>
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Fast Way To "Insert Into" a million records?

See the SQL below, on our SQL server this takes about
10min for 50,000 records, and about 3 hours for a million
records. Is there ANYTHING I can do to speed this up?
-Can I allocate DB space ahead of time?
-Can I put a table in to some type of lock mode'
-Is there something better than insert into?
INSERT INTO MaintHist (DebtorID, AssignCollector,
ChangeCollector, DateChanged, TableName, FieldName,
OldValue, NewValue)
SELECT TempUpdate.RecordUniqueValue, CAST
(TempUpdate.CurrentCollectorID AS VarChar(4)), 'BATC',
GetDate(), TempUpdate.TableName, TempUpdate.FieldName,
TempUpdate.OldValue, TempUpdate.NewValue FROM TempUpdate
WHERE TempUpdate.BatchID = 505
Thanks!
Jasonyou can try to Bulk copy it in as this is non-logged. However that may NOT
be the right solution for you.
it's also fairly common to drop indexes before you do huge inserts and then
rebuild them when inserts are complete.
just food for thought
Greg Jackson
PDX, Oregon|||Comments Inline
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Jason Roozee" <jason@.camcoinc.net> wrote in message
news:16e3c01c448b6$d7c55ca0$a601280a@.phx
.gbl...
> See the SQL below, on our SQL server this takes about
> 10min for 50,000 records, and about 3 hours for a million
> records. Is there ANYTHING I can do to speed this up?
>
Note that your example indicates an insert speed of 5000 records/minute.
That means 200 minutes for 1M rows. 200 minutes = 3.33 hours, so you have
better than linear scaling. This is good.

> -Can I allocate DB space ahead of time?
Yes. Expand the database before adding the records. This will improve
performance some.
> -Can I put a table in to some type of lock mode'
SQL handles this automagically. I doubt you could improve performance with
a locking hint.
> -Is there something better than insert into?
You could try a DTS package to do the transfer, but I am not sure if that
will help.
There is always faster hardware.
>
> INSERT INTO MaintHist (DebtorID, AssignCollector,
> ChangeCollector, DateChanged, TableName, FieldName,
> OldValue, NewValue)
> SELECT TempUpdate.RecordUniqueValue, CAST
> (TempUpdate.CurrentCollectorID AS VarChar(4)), 'BATC',
> GetDate(), TempUpdate.TableName, TempUpdate.FieldName,
> TempUpdate.OldValue, TempUpdate.NewValue FROM TempUpdate
> WHERE TempUpdate.BatchID = 505
> Thanks!
> Jason|||How would a BULK COPY be done?
Jason Roozee

>--Original Message--
>you can try to Bulk copy it in as this is non-logged.
However that may NOT
>be the right solution for you.
>it's also fairly common to drop indexes before you do
huge inserts and then
>rebuild them when inserts are complete.
>
>just food for thought
>
>Greg Jackson
>PDX, Oregon
>
>.
>|||To add some to Geoff's comments. You might try inserting them in smaller
batches if at all possible. When inserting into an existing table it is
usually faster to do insert them in batches of say 10,000 rows vs all 3
million in one transaction. If you have a clustered index on the table
being inserted into you should try to insert them in that order as well.
Andrew J. Kelly
SQL Server MVP
"Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
news:O7ZfMlLSEHA.3056@.TK2MSFTNGP11.phx.gbl...
> Comments Inline
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
> I support the Professional Association for SQL Server
> www.sqlpass.org
> "Jason Roozee" <jason@.camcoinc.net> wrote in message
> news:16e3c01c448b6$d7c55ca0$a601280a@.phx
.gbl...
> Note that your example indicates an insert speed of 5000 records/minute.
> That means 200 minutes for 1M rows. 200 minutes = 3.33 hours, so you have
> better than linear scaling. This is good.
>
> Yes. Expand the database before adding the records. This will improve
> performance some.
> SQL handles this automagically. I doubt you could improve performance
with
> a locking hint.
> You could try a DTS package to do the transfer, but I am not sure if that
> will help.
> There is always faster hardware.
>|||look up BCP in Books On Line for all the details.
Commonly used utility for blasting lots of data into sql server.
GAJ|||Im already using Bulk Insert to get the new data in to the
server, but now I need to update data from one table to
another table.
Jsason
>--Original Message--
>look up BCP in Books On Line for all the details.
>Commonly used utility for blasting lots of data into sql
server.
>
>GAJ
>
>.
>|||to do massive updates, you'll probably want to batch the updates into groups
as others have suggested.
Cheers,
GAJ

Fast Way To "Insert Into" a million records?

See the SQL below, on our SQL server this takes about
10min for 50,000 records, and about 3 hours for a million
records. Is there ANYTHING I can do to speed this up?
-Can I allocate DB space ahead of time?
-Can I put a table in to some type of lock mode?
-Is there something better than insert into?
INSERT INTO MaintHist (DebtorID, AssignCollector,
ChangeCollector, DateChanged, TableName, FieldName,
OldValue, NewValue)
SELECT TempUpdate.RecordUniqueValue, CAST
(TempUpdate.CurrentCollectorID AS VarChar(4)), 'BATC',
GetDate(), TempUpdate.TableName, TempUpdate.FieldName,
TempUpdate.OldValue, TempUpdate.NewValue FROM TempUpdate
WHERE TempUpdate.BatchID = 505
Thanks!
Jason
you can try to Bulk copy it in as this is non-logged. However that may NOT
be the right solution for you.
it's also fairly common to drop indexes before you do huge inserts and then
rebuild them when inserts are complete.
just food for thought
Greg Jackson
PDX, Oregon
|||Comments Inline
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Jason Roozee" <jason@.camcoinc.net> wrote in message
news:16e3c01c448b6$d7c55ca0$a601280a@.phx.gbl...
> See the SQL below, on our SQL server this takes about
> 10min for 50,000 records, and about 3 hours for a million
> records. Is there ANYTHING I can do to speed this up?
>
Note that your example indicates an insert speed of 5000 records/minute.
That means 200 minutes for 1M rows. 200 minutes = 3.33 hours, so you have
better than linear scaling. This is good.

> -Can I allocate DB space ahead of time?
Yes. Expand the database before adding the records. This will improve
performance some.
> -Can I put a table in to some type of lock mode?
SQL handles this automagically. I doubt you could improve performance with
a locking hint.
> -Is there something better than insert into?
You could try a DTS package to do the transfer, but I am not sure if that
will help.
There is always faster hardware.
>
> INSERT INTO MaintHist (DebtorID, AssignCollector,
> ChangeCollector, DateChanged, TableName, FieldName,
> OldValue, NewValue)
> SELECT TempUpdate.RecordUniqueValue, CAST
> (TempUpdate.CurrentCollectorID AS VarChar(4)), 'BATC',
> GetDate(), TempUpdate.TableName, TempUpdate.FieldName,
> TempUpdate.OldValue, TempUpdate.NewValue FROM TempUpdate
> WHERE TempUpdate.BatchID = 505
> Thanks!
> Jason
|||How would a BULK COPY be done?
Jason Roozee

>--Original Message--
>you can try to Bulk copy it in as this is non-logged.
However that may NOT
>be the right solution for you.
>it's also fairly common to drop indexes before you do
huge inserts and then
>rebuild them when inserts are complete.
>
>just food for thought
>
>Greg Jackson
>PDX, Oregon
>
>.
>
|||To add some to Geoff's comments. You might try inserting them in smaller
batches if at all possible. When inserting into an existing table it is
usually faster to do insert them in batches of say 10,000 rows vs all 3
million in one transaction. If you have a clustered index on the table
being inserted into you should try to insert them in that order as well.
Andrew J. Kelly
SQL Server MVP
"Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
news:O7ZfMlLSEHA.3056@.TK2MSFTNGP11.phx.gbl...
> Comments Inline
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
> I support the Professional Association for SQL Server
> www.sqlpass.org
> "Jason Roozee" <jason@.camcoinc.net> wrote in message
> news:16e3c01c448b6$d7c55ca0$a601280a@.phx.gbl...
> Note that your example indicates an insert speed of 5000 records/minute.
> That means 200 minutes for 1M rows. 200 minutes = 3.33 hours, so you have
> better than linear scaling. This is good.
> Yes. Expand the database before adding the records. This will improve
> performance some.
> SQL handles this automagically. I doubt you could improve performance
with
> a locking hint.
> You could try a DTS package to do the transfer, but I am not sure if that
> will help.
> There is always faster hardware.
>
|||look up BCP in Books On Line for all the details.
Commonly used utility for blasting lots of data into sql server.
GAJ
|||Im already using Bulk Insert to get the new data in to the
server, but now I need to update data from one table to
another table.
Jsason
>--Original Message--
>look up BCP in Books On Line for all the details.
>Commonly used utility for blasting lots of data into sql
server.
>
>GAJ
>
>.
>
|||to do massive updates, you'll probably want to batch the updates into groups
as others have suggested.
Cheers,
GAJ

Fast Way To "Insert Into" a million records?

See the SQL below, on our SQL server this takes about
10min for 50,000 records, and about 3 hours for a million
records. Is there ANYTHING I can do to speed this up?
-Can I allocate DB space ahead of time?
-Can I put a table in to some type of lock mode'
-Is there something better than insert into?
INSERT INTO MaintHist (DebtorID, AssignCollector,
ChangeCollector, DateChanged, TableName, FieldName,
OldValue, NewValue)
SELECT TempUpdate.RecordUniqueValue, CAST
(TempUpdate.CurrentCollectorID AS VarChar(4)), 'BATC',
GetDate(), TempUpdate.TableName, TempUpdate.FieldName,
TempUpdate.OldValue, TempUpdate.NewValue FROM TempUpdate
WHERE TempUpdate.BatchID = 505
Thanks!
Jasonyou can try to Bulk copy it in as this is non-logged. However that may NOT
be the right solution for you.
it's also fairly common to drop indexes before you do huge inserts and then
rebuild them when inserts are complete.
just food for thought
Greg Jackson
PDX, Oregon|||Comments Inline
--
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Jason Roozee" <jason@.camcoinc.net> wrote in message
news:16e3c01c448b6$d7c55ca0$a601280a@.phx.gbl...
> See the SQL below, on our SQL server this takes about
> 10min for 50,000 records, and about 3 hours for a million
> records. Is there ANYTHING I can do to speed this up?
>
Note that your example indicates an insert speed of 5000 records/minute.
That means 200 minutes for 1M rows. 200 minutes = 3.33 hours, so you have
better than linear scaling. This is good.
> -Can I allocate DB space ahead of time?
Yes. Expand the database before adding the records. This will improve
performance some.
> -Can I put a table in to some type of lock mode'
SQL handles this automagically. I doubt you could improve performance with
a locking hint.
> -Is there something better than insert into?
You could try a DTS package to do the transfer, but I am not sure if that
will help.
There is always faster hardware. :)
>
> INSERT INTO MaintHist (DebtorID, AssignCollector,
> ChangeCollector, DateChanged, TableName, FieldName,
> OldValue, NewValue)
> SELECT TempUpdate.RecordUniqueValue, CAST
> (TempUpdate.CurrentCollectorID AS VarChar(4)), 'BATC',
> GetDate(), TempUpdate.TableName, TempUpdate.FieldName,
> TempUpdate.OldValue, TempUpdate.NewValue FROM TempUpdate
> WHERE TempUpdate.BatchID = 505
> Thanks!
> Jason|||How would a BULK COPY be done?
Jason Roozee
>--Original Message--
>you can try to Bulk copy it in as this is non-logged.
However that may NOT
>be the right solution for you.
>it's also fairly common to drop indexes before you do
huge inserts and then
>rebuild them when inserts are complete.
>
>just food for thought
>
>Greg Jackson
>PDX, Oregon
>
>.
>|||To add some to Geoff's comments. You might try inserting them in smaller
batches if at all possible. When inserting into an existing table it is
usually faster to do insert them in batches of say 10,000 rows vs all 3
million in one transaction. If you have a clustered index on the table
being inserted into you should try to insert them in that order as well.
--
Andrew J. Kelly
SQL Server MVP
"Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
news:O7ZfMlLSEHA.3056@.TK2MSFTNGP11.phx.gbl...
> Comments Inline
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
> I support the Professional Association for SQL Server
> www.sqlpass.org
> "Jason Roozee" <jason@.camcoinc.net> wrote in message
> news:16e3c01c448b6$d7c55ca0$a601280a@.phx.gbl...
> > See the SQL below, on our SQL server this takes about
> > 10min for 50,000 records, and about 3 hours for a million
> > records. Is there ANYTHING I can do to speed this up?
> >
> Note that your example indicates an insert speed of 5000 records/minute.
> That means 200 minutes for 1M rows. 200 minutes = 3.33 hours, so you have
> better than linear scaling. This is good.
> > -Can I allocate DB space ahead of time?
> Yes. Expand the database before adding the records. This will improve
> performance some.
> > -Can I put a table in to some type of lock mode'
> SQL handles this automagically. I doubt you could improve performance
with
> a locking hint.
> > -Is there something better than insert into?
> You could try a DTS package to do the transfer, but I am not sure if that
> will help.
> There is always faster hardware. :)
> >
> >
> >
> > INSERT INTO MaintHist (DebtorID, AssignCollector,
> > ChangeCollector, DateChanged, TableName, FieldName,
> > OldValue, NewValue)
> >
> > SELECT TempUpdate.RecordUniqueValue, CAST
> > (TempUpdate.CurrentCollectorID AS VarChar(4)), 'BATC',
> > GetDate(), TempUpdate.TableName, TempUpdate.FieldName,
> > TempUpdate.OldValue, TempUpdate.NewValue FROM TempUpdate
> > WHERE TempUpdate.BatchID = 505
> >
> > Thanks!
> > Jason
>|||look up BCP in Books On Line for all the details.
Commonly used utility for blasting lots of data into sql server.
GAJ|||Im already using Bulk Insert to get the new data in to the
server, but now I need to update data from one table to
another table.
Jsason
>--Original Message--
>look up BCP in Books On Line for all the details.
>Commonly used utility for blasting lots of data into sql
server.
>
>GAJ
>
>.
>|||to do massive updates, you'll probably want to batch the updates into groups
as others have suggested.
Cheers,
GAJ

Fast insert and select at the same time?

I have two tables: Account and AccountTransction. Each table contains more
than 20 million records. It one-to-many relationship between account and
AccountTransaction. The new records are constantly loading into each table
through text file by using DTS.
Question:
My team member insists that using cursor to insert record one by one to the
table to avoid affect (lock) the selection on these tables. . There are tons
of indexes on both tables for fast searching. The insertion process is
extremely slow. I recommended batch mode insertion, instead of one by one
using cursor. It is much more faster and efficient in terms of insertion,
but the selection while insertion going on is a little bit slower. What is
your suggestion? How can I achieve the fast insertion and fast selection at
the same time'
Is cursor alway a bad idea in terms of speed and performance?
Thanks a lot,
FlxI would never use a cursor to insert one row of data one by one...
Your colleague is smart to have worries about contention. However... it's
quite easy to do this in a safe manner. My standard technique for manaing a
situation like this is to:
* insert N number of rows per batch through an insert into stmt
* N is tested to ensure
- the insert happens fast enough to have a negligible impact on blocks
for selects
- durtion between batch inserts is long enough to ensure we're not
having a constant impact and quueses aren't growing
- but N is large enough to ensure I can insert enough records fast
enough such that the insert process isn't horrible slow.
I've been able to achieve VERY high insert and select throughput using
techniques like that...
--
Brian Moran
Principal Mentor
Solid Quality Learning
SQL Server MVP
http://www.solidqualitylearning.com
"FLX" <nospam@.hotmail.com> wrote in message
news:e5rEoTiAEHA.3004@.TK2MSFTNGP10.phx.gbl...
> I have two tables: Account and AccountTransction. Each table contains more
> than 20 million records. It one-to-many relationship between account and
> AccountTransaction. The new records are constantly loading into each table
> through text file by using DTS.
> Question:
> My team member insists that using cursor to insert record one by one to
the
> table to avoid affect (lock) the selection on these tables. . There are
tons
> of indexes on both tables for fast searching. The insertion process is
> extremely slow. I recommended batch mode insertion, instead of one by one
> using cursor. It is much more faster and efficient in terms of insertion,
> but the selection while insertion going on is a little bit slower. What is
> your suggestion? How can I achieve the fast insertion and fast selection
at
> the same time'
> Is cursor alway a bad idea in terms of speed and performance?
> Thanks a lot,
> Flx
>
>
>
>
>
>
>
>
>
>
>|||Here's an out-of-the-box idea.
Create new tables, same schema, so you have pairs to tables.
These new tables are for 'todays' data.
Create views to cover the pairs of tables. These are what your application/users look at / use.
The DTS populates the 'today' tables.
Once a day, at some light / quite period, stop the DTS. Copy / Move 'todays' data into the oringal, large table
Don't index the 'todays' tables, as they will (hopefully) be small enough to not need them. Or add only essential indexes.
Or only do this for the AccountTransaction table, and use the current method for the Account table.
If possible, you might want to drop the main table indexes just before you load the data from the 'todays' tables.
It depends, of course, on how quite your quite period will be.

Fast insert and select at the same time?

I have two tables: Account and AccountTransction. Each table contains more
than 20 million records. It one-to-many relationship between account and
AccountTransaction. The new records are constantly loading into each table
through text file by using DTS.
Question:
My team member insists that using cursor to insert record one by one to the
table to avoid affect (lock) the selection on these tables. . There are tons
of indexes on both tables for fast searching. The insertion process is
extremely slow. I recommended batch mode insertion, instead of one by one
using cursor. It is much more faster and efficient in terms of insertion,
but the selection while insertion going on is a little bit slower. What is
your suggestion? How can I achieve the fast insertion and fast selection at
the same time'
Is cursor alway a bad idea in terms of speed and performance?
Thanks a lot,
FlxI would never use a cursor to insert one row of data one by one...
Your colleague is smart to have worries about contention. However... it's
quite easy to do this in a safe manner. My standard technique for manaing a
situation like this is to:
* insert N number of rows per batch through an insert into stmt
* N is tested to ensure
- the insert happens fast enough to have a negligible impact on blocks
for selects
- durtion between batch inserts is long enough to ensure we're not
having a constant impact and quueses aren't growing
- but N is large enough to ensure I can insert enough records fast
enough such that the insert process isn't horrible slow.
I've been able to achieve VERY high insert and select throughput using
techniques like that...
Brian Moran
Principal Mentor
Solid Quality Learning
SQL Server MVP
http://www.solidqualitylearning.com
"FLX" <nospam@.hotmail.com> wrote in message
news:e5rEoTiAEHA.3004@.TK2MSFTNGP10.phx.gbl...
> I have two tables: Account and AccountTransction. Each table contains more
> than 20 million records. It one-to-many relationship between account and
> AccountTransaction. The new records are constantly loading into each table
> through text file by using DTS.
> Question:
> My team member insists that using cursor to insert record one by one to
the
> table to avoid affect (lock) the selection on these tables. . There are
tons
> of indexes on both tables for fast searching. The insertion process is
> extremely slow. I recommended batch mode insertion, instead of one by one
> using cursor. It is much more faster and efficient in terms of insertion,
> but the selection while insertion going on is a little bit slower. What is
> your suggestion? How can I achieve the fast insertion and fast selection
at
> the same time'
> Is cursor alway a bad idea in terms of speed and performance?
> Thanks a lot,
> Flx
>
>
>
>
>
>
>
>
>
>
>|||Here's an out-of-the-box idea.
Create new tables, same schema, so you have pairs to tables.
These new tables are for 'todays' data.
Create views to cover the pairs of tables. These are what your application/u
sers look at / use.
The DTS populates the 'today' tables.
Once a day, at some light / quite period, stop the DTS. Copy / Move 'todays'
data into the oringal, large table
Don't index the 'todays' tables, as they will (hopefully) be small enough to
not need them. Or add only essential indexes.
Or only do this for the AccountTransaction table, and use the current method
for the Account table.
If possible, you might want to drop the main table indexes just before you l
oad the data from the 'todays' tables.
It depends, of course, on how quite your quite period will be.

Sunday, February 26, 2012

Fast copy big table content

Hi all,

need advice on the following task:
copy the content of a big table from DB_A to DB_B in the same server

the size of table:
~ 7 million rows, ~ 9G in size, 1 clustered pk index, 13 nonclustered
index

current practice:
use DTS to copy the data, takes over 20 hours as
-- first had to delete existing data of the table in DB_B
-- then copy
-- all these happen while all indexes are in place.

I am trying to check what is the best or most efficient way to copy
this kind of data and what would
be the expected time for such load.

my machine: SQL 2000 Enterprise, 8-way P4, 12G RAM on a EMC Clarrion
600 SAN.Most efficient way is:

drop table on DB_B (remember that this invalidates inceremenal backups.

Allow bulk inserts, set recovery to simple or bulk insert on DB_B

use DB_B

then do a

select * FROM DB_A.table_source WITH(NOLOCK) INTO table_dest

Then

CREATE INDEXS
ALTER TABLE ADD PRIMARY KEY

There is no faster way;)

DM Unseen|||"New MSSQL DBA" <boscong88@.gmail.com> wrote in message
news:1119515833.641071.147380@.f14g2000cwb.googlegr oups.com...
> Hi all,
> need advice on the following task:
> copy the content of a big table from DB_A to DB_B in the same server
> the size of table:
> ~ 7 million rows, ~ 9G in size, 1 clustered pk index, 13 nonclustered
> index
> current practice:
> use DTS to copy the data, takes over 20 hours as
> -- first had to delete existing data of the table in DB_B
> -- then copy
> -- all these happen while all indexes are in place.

Dropthe indices and use bulk insert or BCP and then rebuild your indices.

I routinely load a table even bigger than that in under 6 hours this way.
(and I really haven't spent much time optimizing the process since we moved
to the new hardware, so I might be able to improve that further.)

> I am trying to check what is the best or most efficient way to copy
> this kind of data and what would
> be the expected time for such load.
> my machine: SQL 2000 Enterprise, 8-way P4, 12G RAM on a EMC Clarrion
> 600 SAN.