We have a pretty large code base of stored procedures ( 700 ),
functions ( 300 ) , and views ( 200 ) at our medium-sized company. Why
someone would implement the majority of their business logic in
transact-sql , I don't know, but we are way beyond the point of being
able prevent that from happening. And have a few years before we are
out of it.
So, the major thing I find myself griping over as we continue to roll
out new sets of code into the database and maintain the large legacy
codebase, is the lack of a module / package / namespace system for
organzing our stored procedures, functions, and views.
(My other major gripes are: lack of strong array / list support outside
of table variables and temp tables, especially the ability to pass
around table variables as parameters; inability to define custom
aggregation functions (addressed in 2005); inability to create
"on-the-fly" views or table expressions(addressed in 2005); the
arbitrary limit of insert exec's; the inability to use scalar udf's
because of their ugly performance)
So, I want to know about anybody out there who has attempted to fake
modules in someway or another by conventions or anything. Any
particular methods that were useful for meaningfully organzing hundreds
/ thousands of stored procedures?
Right now, i've been using the convention of putting a module name and
underscore before each function, stored proc, and view. I haven't
written the queries against the information_schema views or made any
handy utilities for browsing the database parsing these names, but
that's probably next on my list.
Also, why does this not seem to even be addressed in 2005'
nposternposter (code4breakfast@.yahoo.com) writes:
> We have a pretty large code base of stored procedures ( 700 ),
> functions ( 300 ) , and views ( 200 ) at our medium-sized company. Why
> someone would implement the majority of their business logic in
> transact-sql , I don't know,
I know. Because T-SQL is where the data is. In our shop, we are strong
believers in having business logic in stored procedures. Actually, one
guy in our shop thought we should move logic up to the middle layer,
and conducted a test where he rewrote a complex stored procedure. The
test was a total fiasco: performance was awful and it did not scale.
And before you ask: we have around 3700 stored procedures.
> So, the major thing I find myself griping over as we continue to roll
> out new sets of code into the database and maintain the large legacy
> codebase, is the lack of a module / package / namespace system for
> organzing our stored procedures, functions, and views.
There is in SQL 2005: schemas. Well, there are schemas in SQL 2000 as well,
but as they are tied to users in SQL 2000, they are useless.
> (My other major gripes are: lack of strong array / list support outside
> of table variables and temp tables, especially the ability to pass
> around table variables as parameters;
Have a look at http://www.sommarskog.se/share_data.html for some tips.
> inability to define custom
> aggregation functions (addressed in 2005); inability to create
> "on-the-fly" views or table expressions(addressed in 2005);
But beware that CTEs are basically macros, and recalculated each time
they occur in a query.
> the arbitrary limit of insert exec's;
A limit of 1 does not look very arbitrary to me.
> the inability to use scalar udf's because of their ugly performance)
This has improved in SQL 2005, where you also can write UDFs in the
CLR.
To be honest, it seems to me that your main problem is that you are
trying to use T-SQL as if it is a traditional programming langauge.
It isn't. Yes, T-SQL is a bit poor on general constructs for
modularisation, but just get used to it. Don't try to write in T-SQL
as if it was C++ or Visual Basic.
> So, I want to know about anybody out there who has attempted to fake
> modules in someway or another by conventions or anything. Any
> particular methods that were useful for meaningfully organzing hundreds
> / thousands of stored procedures?
Actually, we have a subsystem concept that is implemented through our
loadtool. The main reason we have done this, is that we don't ship
all parts of the system to all customers. So this is a concept that
has been added for configuration management rather than programming.
Our largest subsystem - which is large by legacy - has over 1000 stored
procedures.
> Also, why does this not seem to even be addressed in 2005'
As I said, the separation of user and schema is an important step in
this direction.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||>> . Why someone would implement the majority of their business logic in tra
nsact-sql , I don't know, .. <<
Because the database is where the data is and most of the business
logic probably has to do with rules about relationships in the data. I
also hope you have a lot of DRI, defaults and constraints, too.
Think about what a disaster it would be if you did this in the front
end. What if 699 of your procedures believe there is a rule that says
(age > 18) and one that believes the rule is (age >= 18)?
You do business rules in the schema so you have them in ONE place, ONE
way, ONE time.
T-SQL was never meant to be a development language. In fact, the old
rule of thumb was never write a proc over one page long. But you can
find some code control tools out there which will work with T-SQL.
lity to create "on-the-fly" views or table expressions(addressed in 2005); t
he
arbitrary limit of insert exec's; the inability to use scalar udf's
because of their ugly performance) <<
In short, you are not a SQL programmers at all! Let's go down the list
of things you missed about SQL
1) We don't use arrays and lists -- ever hear of 1NF?
2) Table variables as parameters would mix data and meta-data
3) I have been able to write custom aggregation functions with the
math avaialble to me in Standard SQL. The statistical and analytics
should be done in a stat package that can handle the floating point
problems.
4) We design schemas carefully and never use "on-the-fly" views or
table. It means that you are probably converting 3GL programs into
T-SQL and need 1950's styule scratch files.
5) What does "the arbitrary limit of insert exec's" mean?
6) Why do you need scalar udf's? You are not doing computations in the
Database, are you? How would put such things into an optimizer --even
in theory?
So much for the ISO-11179 Standards!
Your problem is that you "don't get it" -- SQL is not for development;
it is for data. The joke we had in ANSI X3H2 was that SQL stood for
"Scarcely Qualifies as a Language" because all it does is maintain,
validate and persist data.|||Erland Sommarskog wrote:
> nposter (code4breakfast@.yahoo.com) writes:
> I know. Because T-SQL is where the data is. In our shop, we are strong
> believers in having business logic in stored procedures. Actually, one
> guy in our shop thought we should move logic up to the middle layer,
> and conducted a test where he rewrote a complex stored procedure. The
> test was a total fiasco: performance was awful and it did not scale.
> And before you ask: we have around 3700 stored procedures.
>
This is obviously a long discussion, and I don't have enough background
writing large, scalable data layers to really argue. The angle I was
coming from is that most people would agree that you can write more
cleanly organized and modularized code outside of transact-sql for
things beyond simple queries. On writing pure reports or processing
large data sets, sure I would vote for transact-sql, but for small
hits, single update or inserts with complex validation checks, small
data set queries on not a very heavy transactional db, I would vote for
the taking the logic out.
> There is in SQL 2005: schemas. Well, there are schemas in SQL 2000 as well
,
> but as they are tied to users in SQL 2000, they are useless.
okay ! i might not invest as much energy into rolling my own module
system then. i've only begun to skim the 2005 docs.
>
> Have a look at http://www.sommarskog.se/share_data.html for some tips.
>
i haven't read this site before, but upon reading it now, i have used
each one of these techniques. they are still limiting and a pain to
implement for something as simple as just passing aroudn a table /
list. its like i have to compile my code into this lower level language
of passing around temp tables.
> But beware that CTEs are basically macros, and recalculated each time
> they occur in a query.
>
> A limit of 1 does not look very arbitrary to me.
>
Why limit to 1? Because "we said so". This coudl have been used as a
general mechanism for calling stored procs from other stored procs if
it wasn't for the limitation... yeah , i know, make a table-valued udf,
but what if i want to pass tables into and out of the table-valued udf.
which leads you back to temp tables and stored procs.
> To be honest, it seems to me that your main problem is that you are
> trying to use T-SQL as if it is a traditional programming langauge.
> It isn't. Yes, T-SQL is a bit poor on general constructs for
> modularisation, but just get used to it. Don't try to write in T-SQL
> as if it was C++ or Visual Basic.
>
i have embraced t-sql. i hate the fact that i have, but i have. i use
views, table-valued udf's, stored procs in the style i think is
considered the "t-sql way". i gave up the expectation of being able to
write reusable , efficient scalar functions. but i really don't think
its too much to ask to simply ask for a way to group my code logically.
these weren't top secret ideas when t-sql was created.
> Actually, we have a subsystem concept that is implemented through our
> loadtool. The main reason we have done this, is that we don't ship
> all parts of the system to all customers. So this is a concept that
> has been added for configuration management rather than programming.
> Our largest subsystem - which is large by legacy - has over 1000 stored
> procedures.
>
custom written loadtool? i was thinking of using make/ant scripts with
isqlw commands to load sets of code in. then check that into source
control. can you give any more details about the loadtool and how you
use it? how do you browse the subsystems?|||> ... On writing pure reports or processing
> large data sets, sure I would vote for transact-sql, but for small
> hits, single update or inserts with complex validation checks, small
> data set queries on not a very heavy transactional db, I would vote for
> the taking the logic out.
I would vote for keeping the logic in, for a number of reasons: (1)
assert. (2) deadlocks. (3) flexibility. (4) performance. (5)
scalability. (6) security. The list continues.
"nposter" <code4breakfast@.yahoo.com> wrote in message
news:1124669275.064403.271190@.g44g2000cwa.googlegroups.com...
> Erland Sommarskog wrote:
> This is obviously a long discussion, and I don't have enough background
> writing large, scalable data layers to really argue. The angle I was
> coming from is that most people would agree that you can write more
> cleanly organized and modularized code outside of transact-sql for
> things beyond simple queries. On writing pure reports or processing
> large data sets, sure I would vote for transact-sql, but for small
> hits, single update or inserts with complex validation checks, small
> data set queries on not a very heavy transactional db, I would vote for
> the taking the logic out.
>
well,
> okay ! i might not invest as much energy into rolling my own module
> system then. i've only begun to skim the 2005 docs.
>
outside
> i haven't read this site before, but upon reading it now, i have used
> each one of these techniques. they are still limiting and a pain to
> implement for something as simple as just passing aroudn a table /
> list. its like i have to compile my code into this lower level language
> of passing around temp tables.
>
> Why limit to 1? Because "we said so". This coudl have been used as a
> general mechanism for calling stored procs from other stored procs if
> it wasn't for the limitation... yeah , i know, make a table-valued udf,
> but what if i want to pass tables into and out of the table-valued udf.
> which leads you back to temp tables and stored procs.
>
> i have embraced t-sql. i hate the fact that i have, but i have. i use
> views, table-valued udf's, stored procs in the style i think is
> considered the "t-sql way". i gave up the expectation of being able to
> write reusable , efficient scalar functions. but i really don't think
> its too much to ask to simply ask for a way to group my code logically.
> these weren't top secret ideas when t-sql was created.
>
hundreds
> custom written loadtool? i was thinking of using make/ant scripts with
> isqlw commands to load sets of code in. then check that into source
> control. can you give any more details about the loadtool and how you
> use it? how do you browse the subsystems?
>|||nposter (code4breakfast@.yahoo.com) writes:
> This is obviously a long discussion, and I don't have enough background
> writing large, scalable data layers to really argue. The angle I was
> coming from is that most people would agree that you can write more
> cleanly organized and modularized code outside of transact-sql for
> things beyond simple queries.
Admittedly, code in C#, Ada or whatever can be prettier than in T-SQL.
(Then again, you can write it in C++, and don't tell me that is
prettier! :-)
However, pretty code is not always the as effecient code. The cost for
moving the logic out the database can be enourmous with all data that
has to go forth and back.
You can do code reuse in T-SQL as well, but it is certainly a little
clumsier. But one should also keep in mind that code reuse is not really
the same virtue in T-SQL as it is client languages. For instance, accessing
a view because it pre-computes some result you need may not be a good idea
after all, because the view accesses four tables irrelevant to your
task.
> i haven't read this site before, but upon reading it now, i have used
> each one of these techniques. they are still limiting and a pain to
> implement for something as simple as just passing aroudn a table /
> list. its like i have to compile my code into this lower level language
> of passing around temp tables.
I certainly can't disagree that this is a point where T-SQL could serve
from improvement. Being able to pass a table as a parameter would be a
great addition.
In SQL 2005 there is, by the way, a new workaround: since you know can
receive the result of FOR XML queries into a variable of the new xml
data type, you can pass a table in an xml parameter. (And you don't need
OPENXML to shred it anymore.) Yes, this is really, really ugly. A lot
better novelty in SQL 2005 is statement recompilation. This makes the
cost for sharing temp tables less expensive.
> Why limit to 1? Because "we said so".
More probably because permitting more levels would have made the
implementation more complicated. I have no idea, but I can imagine that
doing it in one layer was something that almost could be done for free
by using something already there, like placing the result set in the
output buffer, and the intercept that. But that would be possible to
do in several layers without considerable re-architecture.
And in any case, INSERT-EXEC have several other issues as well, as I
discuss in my article. From the point of view of modular programming,
I dislike INSERT-EXEC because you can change the callee and wreck a
caller you don't know about. It's simply not robust enough.
> custom written loadtool? i was thinking of using make/ant scripts with
> isqlw commands to load sets of code in. then check that into source
> control. can you give any more details about the loadtool and how you
> use it? how do you browse the subsystems?
Sure, the full documentation is on http://www.abaris.se/abaperls
One more feature that we have that may be of particular interest to
you, is our pre-processor.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland Sommarskog wrote:
> nposter (code4breakfast@.yahoo.com) writes:
> I know. Because T-SQL is where the data is. In our shop, we are strong
> believers in having business logic in stored procedures. Actually, one
> guy in our shop thought we should move logic up to the middle layer,
> and conducted a test where he rewrote a complex stored procedure. The
> test was a total fiasco: performance was awful and it did not scale.
> And before you ask: we have around 3700 stored procedures.
>
> There is in SQL 2005: schemas. Well, there are schemas in SQL 2000 as well
,
> but as they are tied to users in SQL 2000, they are useless.
>
Well, they're tied to users or roles. In fact, we've had a reasonable
amount of success here with a database split into (at last count) 9
different schemas (as new subsystems are added to the rest of the
project, 1 or 2 new schemas are created), all of which are owned by
roles.
I do, of course, look forward to being able to do away with these roles
and have them as pure schemas, but it works reasonably well now. We've
even managed to get Merge Replication working with it (after a few
false starts, we went live with this a couple of ws ago)
Damien|||Can you explain this a little further. Do you have the tables in one
centralized schema. Then each new schema encapsulates sets of stored
procedures, functions, and views? This is in 2000 still I assume. So
you have something like:
server = PRODSERVER
database/schema1 = PROD_TABLES
database/schema2 = PROD_PROCEDURE_SCHEMA1
database/schema3 = PROD_PROCEDURE_SCHEMA2
or am i missing the definition of schemas. are schemas lower level than
the database level?|||Very intersting. this is exactly what i was envisioning somebody out
there had already created. I don't know if i will use it directly, but
I will probably write up my own simplified version inspired by your
system|||nposter wrote:
> Can you explain this a little further. Do you have the tables in one
> centralized schema. Then each new schema encapsulates sets of stored
> procedures, functions, and views? This is in 2000 still I assume. So
> you have something like:
> server = PRODSERVER
> database/schema1 = PROD_TABLES
> database/schema2 = PROD_PROCEDURE_SCHEMA1
> database/schema3 = PROD_PROCEDURE_SCHEMA2
> or am i missing the definition of schemas. are schemas lower level than
> the database level?
How I'm using it is pretty well how it'll work under 2005. I'm not
saying it's the right way, or the best way, but it's working here, and
avoiding come collisions.
So, we create a database, and a couple of schemas:
create database MyBigDatabase
go
use MyBigDatabase
go
sp_addrole 'ClientServices' --This will change to create Schema in SS
2005
go
sp_addrole 'CreditorServices'
go
--And so on, for each schema needed, then
create table ClientServices.Users
UserID uniqueidentifier not null,
--Columns specific to the concept of a User within Client Services
go
create table CreditorServices.Users
UserID uniqueidentifier not null,
--Columns specific to the concept of a User within Creditor Services
--Create Proc, Create View, etc, all within their own schemas
Some of the tables we use are in fact still owned by dbo - these are
tables which are globally required/accessable within the individual
services. Before anyone asks about my use of Uniqueidentifier columns
for what are obviously (with more DDL) going to be Primary Keys, I'd
like to point out that we always knew we wanted to support replication,
so it just saved time. And yes, there are unique constraints on the
natural keys.
Damien
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment