Tracked this one down, now I'm sharing:
select name, count(*)
from dbo.TraceTable20091214 tr join sys.trace_events
on trace_event_id = eventclass
group by name
order by 2 desc
sys.trace_events picks up the name of the event_id, which is what gets stored in the profiler trace log.
enjoy!
Tuesday, December 15, 2009
Sunday, October 4, 2009
SQL Saturday #21 in Orlando, 10/17/2009
Hi,
You've seen this post before...
I'm giving my index talks in Orlando again in October... hope to see you there! Send me a note if you want an outline, or better yet, go to
www.sqlsaturday.com
for an outline and to register. Hope to see you there.
Jeff
You've seen this post before...
I'm giving my index talks in Orlando again in October... hope to see you there! Send me a note if you want an outline, or better yet, go to
www.sqlsaturday.com
for an outline and to register. Hope to see you there.
Jeff
Monday, August 3, 2009
Load a 1Tb database in 30 minutes!
Some links deserver their own blog post. This one was passed on to me by a client, it's teriffic, you should read it immediately.
http://technet.microsoft.com/en-us/library/dd537533.aspx
http://technet.microsoft.com/en-us/library/dd537533.aspx
Labels:
data load,
data warehouse,
sql server 2008,
tb,
terabyte
SQL Saturday in Miami
Miami anybody?
I'm speaking at SQL Saturday in Miami on 8/8, go to sqlsaturday.com to register & for details if you haven't already.
I'm speaking at SQL Saturday in Miami on 8/8, go to sqlsaturday.com to register & for details if you haven't already.
Tuesday, March 3, 2009
Concurrency issues during massive updates / archiving
Mass updates / mass deltes often cause contention issue with locking / blocking. One technique I suggest to my students is to update / delete parts of the table at a time, where feasible, so as to avoid embarassing issues like table locks.
The following code could be improved, but it has the right general idea: For each involved table, delete 1000 rows at a time until we're done.
CREATE PROCEDURE [dbo].[DeleteSnapshotData]
@I_lNumDays int
AS
SET NOCOUNT ON
-- get cutoff date string
declare @tablename varchar(50)
declare @dtCutoff smalldatetime
declare @strCutoff varchar(50)
declare @command nvarchar(2000)
set @dtCutoff = getdate() - @I_lNumDays
set @strCutoff = cast(year(@dtCutoff) as varchar) + '-' + right('00' + cast(month(@dtCutoff) as varchar), 2) + '-' + right('00' + cast(day(@dtCutoff) as varchar), 2)
-- build temp table
set rowcount 0
select sysobjects.name into #snapshots from sysobjects join syscolumns on sysobjects.id = syscolumns.id where sysobjects.name like '%snapshot%' and (OBJECTPROPERTY(sysobjects.id, N'IsTable') = 1) and (syscolumns.name = 'tdate')
set rowcount 1
select @tablename = name from #snapshots
while @@rowcount > 0
begin
delete from #snapshots where name = @tablename
if ((@tablename != 'Closing_Charges_Snapshot') and (@tablename != 'Closing_Credits_Snapshot'))
BEGIN
print 'deleting expired data from ' + @tablename + '...'
SET ROWCOUNT 1000
WHILE (1=1)
BEGIN
BEGIN TRANSACTION
SET NOCOUNT OFF
select @command = N'delete from ' + @tablename + ' where tdate < ''' + @strCutoff + ''''
exec (@command)
IF @@ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
END
SET NOCOUNT ON
set rowcount 1
select @tablename = name from #snapshots
end
drop table #snapshots
The following code could be improved, but it has the right general idea: For each involved table, delete 1000 rows at a time until we're done.
CREATE PROCEDURE [dbo].[DeleteSnapshotData]
@I_lNumDays int
AS
SET NOCOUNT ON
-- get cutoff date string
declare @tablename varchar(50)
declare @dtCutoff smalldatetime
declare @strCutoff varchar(50)
declare @command nvarchar(2000)
set @dtCutoff = getdate() - @I_lNumDays
set @strCutoff = cast(year(@dtCutoff) as varchar) + '-' + right('00' + cast(month(@dtCutoff) as varchar), 2) + '-' + right('00' + cast(day(@dtCutoff) as varchar), 2)
-- build temp table
set rowcount 0
select sysobjects.name into #snapshots from sysobjects join syscolumns on sysobjects.id = syscolumns.id where sysobjects.name like '%snapshot%' and (OBJECTPROPERTY(sysobjects.id, N'IsTable') = 1) and (syscolumns.name = 'tdate')
set rowcount 1
select @tablename = name from #snapshots
while @@rowcount > 0
begin
delete from #snapshots where name = @tablename
if ((@tablename != 'Closing_Charges_Snapshot') and (@tablename != 'Closing_Credits_Snapshot'))
BEGIN
print 'deleting expired data from ' + @tablename + '...'
SET ROWCOUNT 1000
WHILE (1=1)
BEGIN
BEGIN TRANSACTION
SET NOCOUNT OFF
select @command = N'delete from ' + @tablename + ' where tdate < ''' + @strCutoff + ''''
exec (@command)
IF @@ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
END
SET NOCOUNT ON
set rowcount 1
select @tablename = name from #snapshots
end
drop table #snapshots
Labels:
concurrency,
mass updates,
SQL Server 2005,
sql server 2008
Monday, March 2, 2009
Undocumented system procedure
Critical path for most folks is preventive maintenance, whether they know it or not.
If you're ever looking for a quick way of doing something on all (user) tables in a database, there's a system stored procedure for it.
EXEC sp_MSforeachtable @command1="print '?' DBCC DBREINDEX ('?', ' ', 80)"
This command will run the "dbcc DBREINDEX" command on each table in the database.
Your table-level preventive maintenance regiman, CRUCIAL to performance success, include:
If you're ever looking for a quick way of doing something on all (user) tables in a database, there's a system stored procedure for it.
EXEC sp_MSforeachtable @command1="print '?' DBCC DBREINDEX ('?', ' ', 80)"
This command will run the "dbcc DBREINDEX" command on each table in the database.
Your table-level preventive maintenance regiman, CRUCIAL to performance success, include:
- keeping all table statistics up to date
- Keeping index structures balanced; this may be an indexdefrag (which, while intrusive, can be performed online) or preferably by rebuilding indexes (dbcc DBREINDEX )
- Making sure you have the indexes you need, and remove indexes you don't use (see new managment views for that!)
Thursday, February 26, 2009
SQL Yesterday vs. SQL Today
Quick Tip: Use the ANSI Standard Join syntax in all of your queries.
I've been writing SQL for abotu 20 years now, and have been forcing my hands to type the new code. It's not just a good idea; it's now critical to performance. I've talked to hosts of people who claim that their performance has improved by changing format from the old syntax (where joincol = joincol) to the new syntax (ON joincol = joincol) ... I'm sure the "Why" has been documented / blogged about, so I'll just leave the top & a sample correction here...
Reproduced here with the client's permission:
SELECT
Facilities.code, Facilities.name, Trips.veh, Trips.status, Trips.priority, Trips.tdate,
Trips.dstatus, Facilities.initials, Trips.dispdate, Trips.disptime, Trips.enrdate,
Trips.enrtime, Trips.atsdate, Trips.atstime, Trips.atddate, Trips.atdtime, Trips.tradate,
Trips.tratime, Trips.cmpy, Trips.daddr2, Trips.daddr, Trips.avldate, Trips.avltime,
Trips.IncidentID, Trips.RunNumber, Notes.NoteDate, Notes.Description,
Trips.transpriority
FROM
dbo.Facilities Facilities, dbo.Trips Trips, dbo.TripNotes Tripnotes, dbo.Notes Notes
WHERE
Trips.dfac = Facilities.code AND
Tripnotes.job = Trips.job AND
Notes.code = Tripnotes.note AND
Trips.tdate = Tripnotes.tdate AND
(
Trips.dstatus IN (4,5,6) AND
Trips.tdate >= convert(char(10),getdate()-1,126) AND
Facilities.code <> 1
)
And recommend it be rewritten like this:
SELECT
Facilities.code, Facilities.name, Trips.veh, Trips.status, Trips.priority, Trips.tdate,
Trips.dstatus, Facilities.initials, Trips.dispdate, Trips.disptime, Trips.enrdate,
Trips.enrtime, Trips.atsdate, Trips.atstime, Trips.atddate, Trips.atdtime, Trips.tradate,
Trips.tratime, Trips.cmpy, Trips.daddr2, Trips.daddr, Trips.avldate, Trips.avltime,
Trips.IncidentID, Trips.RunNumber, Notes.NoteDate, Notes.Description,
Trips.transpriority
FROM
dbo.Facilities Facilities join dbo.Trips Trips on
Trips.dfac = Facilities.code JOIN
dbo.TripNotes Tripnotes on
Trips.job = Tripnotes.job AND
Trips.tdate = Tripnotes.tdate JOIN
dbo.Notes Notes ON
Notes.code = Tripnotes.note
WHERE
Trips.dstatus IN (4,5,6) AND
Trips.tdate >= convert(char(10),getdate()-1,126) AND
Facilities.code <> 1
I've been writing SQL for abotu 20 years now, and have been forcing my hands to type the new code. It's not just a good idea; it's now critical to performance. I've talked to hosts of people who claim that their performance has improved by changing format from the old syntax (where joincol = joincol) to the new syntax (ON joincol = joincol) ... I'm sure the "Why" has been documented / blogged about, so I'll just leave the top & a sample correction here...
Reproduced here with the client's permission:
SELECT
Facilities.code, Facilities.name, Trips.veh, Trips.status, Trips.priority, Trips.tdate,
Trips.dstatus, Facilities.initials, Trips.dispdate, Trips.disptime, Trips.enrdate,
Trips.enrtime, Trips.atsdate, Trips.atstime, Trips.atddate, Trips.atdtime, Trips.tradate,
Trips.tratime, Trips.cmpy, Trips.daddr2, Trips.daddr, Trips.avldate, Trips.avltime,
Trips.IncidentID, Trips.RunNumber, Notes.NoteDate, Notes.Description,
Trips.transpriority
FROM
dbo.Facilities Facilities, dbo.Trips Trips, dbo.TripNotes Tripnotes, dbo.Notes Notes
WHERE
Trips.dfac = Facilities.code AND
Tripnotes.job = Trips.job AND
Notes.code = Tripnotes.note AND
Trips.tdate = Tripnotes.tdate AND
(
Trips.dstatus IN (4,5,6) AND
Trips.tdate >= convert(char(10),getdate()-1,126) AND
Facilities.code <> 1
)
And recommend it be rewritten like this:
SELECT
Facilities.code, Facilities.name, Trips.veh, Trips.status, Trips.priority, Trips.tdate,
Trips.dstatus, Facilities.initials, Trips.dispdate, Trips.disptime, Trips.enrdate,
Trips.enrtime, Trips.atsdate, Trips.atstime, Trips.atddate, Trips.atdtime, Trips.tradate,
Trips.tratime, Trips.cmpy, Trips.daddr2, Trips.daddr, Trips.avldate, Trips.avltime,
Trips.IncidentID, Trips.RunNumber, Notes.NoteDate, Notes.Description,
Trips.transpriority
FROM
dbo.Facilities Facilities join dbo.Trips Trips on
Trips.dfac = Facilities.code JOIN
dbo.TripNotes Tripnotes on
Trips.job = Tripnotes.job AND
Trips.tdate = Tripnotes.tdate JOIN
dbo.Notes Notes ON
Notes.code = Tripnotes.note
WHERE
Trips.dstatus IN (4,5,6) AND
Trips.tdate >= convert(char(10),getdate()-1,126) AND
Facilities.code <> 1
Thursday, February 19, 2009
Common Table Expressions (CTE) in SQL Server 2005 & 2008
From 2008 Books Online:
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
The really cool part of this is the self-referencing part.
I'm not going to try to teach you CTE here. Honestly, it's been done too well already. Here's a link to a favorite:
http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-common-table-expressions/
It does reference a few other documents (like I did with books online, above), again so as not to reinvent the wheel.
I wanted to share a couple of limitations in CTEs that I encountered this past week.
Requirement: Unlimited self-referencing / recursion in object definitions.
With the self-referencing objects, the problem that you have is the circular reference. With a CTE, you can limit this by specifying a recursion limit (i.e. "circle no more than X times").
Is this a good idea? Does it make sense to circle multiple passes through a set of objects once you've already collected all of the objects?
We ended up (in SQL 2005) using a temp table implementation (well, really, memory table not temp table). The reason: we needed to be able to say, "grab the next set of child relationships, but ONLY if we haven't already grabbed those relationships." The CTE limitation in sql 2005 doesn't allow mulitple recursive references.
Imagine my delight when I read the above: "...can be referenced multiple times in the same query,,," ... this in and of itself is enough to talk this particular client into going to SQL Server 2008.
Happy CTEs...
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
The really cool part of this is the self-referencing part.
I'm not going to try to teach you CTE here. Honestly, it's been done too well already. Here's a link to a favorite:
http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-common-table-expressions/
It does reference a few other documents (like I did with books online, above), again so as not to reinvent the wheel.
I wanted to share a couple of limitations in CTEs that I encountered this past week.
Requirement: Unlimited self-referencing / recursion in object definitions.
With the self-referencing objects, the problem that you have is the circular reference. With a CTE, you can limit this by specifying a recursion limit (i.e. "circle no more than X times").
Is this a good idea? Does it make sense to circle multiple passes through a set of objects once you've already collected all of the objects?
We ended up (in SQL 2005) using a temp table implementation (well, really, memory table not temp table). The reason: we needed to be able to say, "grab the next set of child relationships, but ONLY if we haven't already grabbed those relationships." The CTE limitation in sql 2005 doesn't allow mulitple recursive references.
Imagine my delight when I read the above: "...can be referenced multiple times in the same query,,," ... this in and of itself is enough to talk this particular client into going to SQL Server 2008.
Happy CTEs...
Tuesday, February 3, 2009
SQL 2008 new features for developers
I was recently asked to outline a "New Features for SQL 2008 for Developers" course... this is not strictly perormance related, and you can certainly get this list from other sources... and if you want detail, go to microsoft.com... and in fact, if you want detail, you'll have to go there... but if you're interested in a concise list, here it is:
SQL Server 2008 New features for developers
3 days
Topics:
- Auditing
- Change tracking
- Resource governor
- Compressed data and indexes
- Filestream storage
- Partition switching
- Sparse columns
- Spatial data storage
- Wide tables
- New data types
- Full text search enhancements
- Compound operators
- Grouping sets
- Merge statement
- SQL Dependency reporting
- Table-values parameters
- T-SQL Row Constructors
- Filtered indexes
- Data encryption enhancements
- LINQ
(For more detail, check out our training web site later, www.soaringeagle.com/training)
SQL Server 2008 New features for developers
3 days
Topics:
- Auditing
- Change tracking
- Resource governor
- Compressed data and indexes
- Filestream storage
- Partition switching
- Sparse columns
- Spatial data storage
- Wide tables
- New data types
- Full text search enhancements
- Compound operators
- Grouping sets
- Merge statement
- SQL Dependency reporting
- Table-values parameters
- T-SQL Row Constructors
- Filtered indexes
- Data encryption enhancements
- LINQ
(For more detail, check out our training web site later, www.soaringeagle.com/training)
Friday, January 30, 2009
Free Webinar Series
Hi,
Please register to join our webinar series, great for anybody who ever does any T-SQL programming, free to anybody who finds my blog, email webinar@soaringeagle.biz to register:
About the Accelerated SQL Performance Series
Noted author, lecturer and consultant Jeffrey Garbus is hosting a 6 week webinar series to help you improve SQL performance. Jeff has spent 20 years helping organizations of all sizes and industries maximize database performance. If you saw him speak at the conference, you already know that your time investment for these sessions will reap rich dividends.
These 30 minute sessions will directly help you address performance issues and give you the tools to get even greater performance out of your design.
Beginning on Tuesday February 10, 2009, at 12:15 PM, the series will run weekly for 6 consecutive weeks at lunchtime.
The series will include:
1) Clustered vs. non-clustered indexes
2) Choosing indexes in a single-table query environment
3) Choosing indexes for join optimization
4) Stored procedures & performance
5) Writing characteristic & determinate functions
6) Table design & performance
Space is limited so register today.
Please register to join our webinar series, great for anybody who ever does any T-SQL programming, free to anybody who finds my blog, email webinar@soaringeagle.biz to register:
About the Accelerated SQL Performance Series
Noted author, lecturer and consultant Jeffrey Garbus is hosting a 6 week webinar series to help you improve SQL performance. Jeff has spent 20 years helping organizations of all sizes and industries maximize database performance. If you saw him speak at the conference, you already know that your time investment for these sessions will reap rich dividends.
These 30 minute sessions will directly help you address performance issues and give you the tools to get even greater performance out of your design.
Beginning on Tuesday February 10, 2009, at 12:15 PM, the series will run weekly for 6 consecutive weeks at lunchtime.
The series will include:
1) Clustered vs. non-clustered indexes
2) Choosing indexes in a single-table query environment
3) Choosing indexes for join optimization
4) Stored procedures & performance
5) Writing characteristic & determinate functions
6) Table design & performance
Space is limited so register today.
Monday, January 26, 2009
Context
One of my favorite quotations from a fictional character comes from “Atlas Shrugged” by Ayn Rand: There is no substitute for competence.
One of the most important things that we need to do as IT professionals is to use our own experience and competence even when taking advice from peers, or those you deem as experts. Even my own advice is sometimes not correct, when taken out of context (You will note if you follow these blog entries that I will attempt a lot of “tongue-in-cheek” humor, please take it that way).
As a public speaker (Users Groups, conventions, SQL Saturdays, etc.) I continually try to challenge those attendees at my talks to rethink some of the decisions they’ve made, based upon facts I give them based upon my own experiences. Frequently, they challenge me back (for me, this is part of the fun).
At a recent talk I gave on indexes, I said something along these lines:
“You may be able to come with some theoretical exceptions to this rule, but from a practical perspective, you should have a clustered index on every table. Without a clustered index, you are creating a hot spot at the tail end of your table’s page chain, (meaning contention when you have multiple insertions, and/or updates which cause rows to remove). In addition, updates which cause rows to move will cause data to be unbalanced towards the tail end of the table’s page chain (as well).”
A point that has come up (on at least two recent occasions) from an audience member was along the following lines:
“ I was reading a paper by {no idea who wrote the paper, if anybody reading this does, please let me know so that I can credit the author), and in it he/she stated that benchmarking has shown that insertions are fastest with NO clustered index on the table because of some benefits of page allocations …”
The first time I heard that, I responded that I’d have to read the article to see what it had to say before responding, but that my recommendations from own extensive practical experience stood. The second time I heard that, something clicked, and I responded with, “Was that a single process performing a data load of some sort, or was that a thousand concurrent processes trying to gain access to a table?”
Context.
If you are trying to load data into a table, your issue isn’t necessarily contention, it is the speed of your disk & corresponding page allocations. If you are trying to give many users access to the table, your issue is going to be both page and index contention, at update time, page split time, etc.
Over time, I’m going to be offering a variety of recommendations, every one of which will be based upon my personal experiences in 20 years of tuning SQL Servers.
Yet, I caution you: everything needs to be taken in context. Test my assertions; make sure that the environments I’m describing actually match your own.
One of the most important things that we need to do as IT professionals is to use our own experience and competence even when taking advice from peers, or those you deem as experts. Even my own advice is sometimes not correct, when taken out of context (You will note if you follow these blog entries that I will attempt a lot of “tongue-in-cheek” humor, please take it that way).
As a public speaker (Users Groups, conventions, SQL Saturdays, etc.) I continually try to challenge those attendees at my talks to rethink some of the decisions they’ve made, based upon facts I give them based upon my own experiences. Frequently, they challenge me back (for me, this is part of the fun).
At a recent talk I gave on indexes, I said something along these lines:
“You may be able to come with some theoretical exceptions to this rule, but from a practical perspective, you should have a clustered index on every table. Without a clustered index, you are creating a hot spot at the tail end of your table’s page chain, (meaning contention when you have multiple insertions, and/or updates which cause rows to remove). In addition, updates which cause rows to move will cause data to be unbalanced towards the tail end of the table’s page chain (as well).”
A point that has come up (on at least two recent occasions) from an audience member was along the following lines:
“ I was reading a paper by {no idea who wrote the paper, if anybody reading this does, please let me know so that I can credit the author), and in it he/she stated that benchmarking has shown that insertions are fastest with NO clustered index on the table because of some benefits of page allocations …”
The first time I heard that, I responded that I’d have to read the article to see what it had to say before responding, but that my recommendations from own extensive practical experience stood. The second time I heard that, something clicked, and I responded with, “Was that a single process performing a data load of some sort, or was that a thousand concurrent processes trying to gain access to a table?”
Context.
If you are trying to load data into a table, your issue isn’t necessarily contention, it is the speed of your disk & corresponding page allocations. If you are trying to give many users access to the table, your issue is going to be both page and index contention, at update time, page split time, etc.
Over time, I’m going to be offering a variety of recommendations, every one of which will be based upon my personal experiences in 20 years of tuning SQL Servers.
Yet, I caution you: everything needs to be taken in context. Test my assertions; make sure that the environments I’m describing actually match your own.
Subscribe to:
Posts (Atom)