Friday, March 23, 2012

federated query architecture advice

I'm involved in a project building a federated query app. There is an ASP.NET web app talking to a central SQL Server 2005 db. Users input query parameters to the web ap, the request is submitted as a stored procedure to a central db, and from there it is distributed to the selected remote SQL Server 2005 db servers with similar but not necessarily identical schemas where the actual data resides.

We must wait for all servers to execute the query and return the results, then aggregate those query results and present them to the web app user.

The question is, what is the best way to design this with Server 2005? We need to be able to initiate multiple concurrent queries, then wait (up to some max timeout value) for all responses to return. What way(s) do I have with SQL Server 2005 to (1) initiate multiple queries, and (2) wait on the events (queries) to complete, up to some max timeout value?

I've looked at Service Broker some but that (maybe) seems like more than we need because all we're doing are queries. The tricky part seems to be initiating concurrent queries and waiting for all responses. Any advice or comments are appreciated.

Using SQL Server 2005 and ADO.NET 2.0 will enable you to do async queries / commands to the database. THis might be the best solution for you.

HTH, Jens K. Suessmeyer.

http://www.sqlserver2005.de
|||I was hoping for a single stored procedure (sp) with a parameter naming the data sources to be queried. The sp would start multiple parallel queries, then consolidate the results into a single result set to return to the caller. Any ideas along these lines?|||

Just to get some understanding why do need to use federated searches. Are you running into scale problems?

The one solution is to use partitioned views, this enables the Query Optimiser to figure out which servers to query.

|||

We have multiple SQL Server databases with essentially identical schemas. These databases are located remotely from our central server (assuming we'll connect via vpn or...?). The idea is for a stored procedure call to be made to the central server that will in turn query the remote servers, with the consolidate / federated result set from all servers involved returned to the caller as a single result set.

Looks like I need to look into partitioned views; I presume this requires linked servers? If so, I've heard about problems with links staying "up" over extended periods when the remote servers are not connected to a LAN - any feedback on this question?

|||

Federation is for searching across multiple servers in a single location (in my opinion) not bringing remote data together.

Based on what you have said I believe you should be looking at a replication model. Each of your sites replicates data to the central point. The queries at the central point are then on local copies of the data.

Doing distributed queries when you have a poor connection is just not practical. Replication can handle this level of connectivity.

If you have Site A, Site B and SIte C and you need to have users in all sites seeing data from the other sites, you probably want to look at peer to perr transactional replication.

|||

Normal replication is not an option. Here's the scenario: we have multiple heterogeneous "source" systems (Oracle, Informix, SQL Server, etc.), each unique and different. Co-located with each source system is a SQL Server db. On a daily basis, the relevant tables from the source systems (which can be several GB's in total) will be copied into equivalent SQL Server tables using SSIS. (This is why replication to a central db is not an option; take this as a given, trust me). From the source schema, the data will be transformed (again via SSIS) into a "provider" schema on the co-located server. These provider schemas will be similar but not identical. We want to query and aggregate results from the provider schemas.

For example, if we have 4 sites (A, B, C, and D), and the user wants to query sites A and C, we want to fire-off parallel queries at site A nd site C. The rowset structure from the two sites will be identical, but the queries that produce them will not necessarily be identical. The final step is to combine the result sets from sites A and C and return the combined rowset to the stored procedure caller.

Once again, the co-located SQL Server boxes will be accessed over the Internet (via vpn?).

|||When you say trust me are you referring to office politics or are you referring to a technical problem? It appears (to me) that the co-located Sql Server databases (sources) could replicate to a single SQL Server database (target) from which querying could be performed. The target database could define a field to show the source of the data co-located Sql Server databases.

No comments:

Post a Comment