123

I have been working for different companies, and I have noticed that some of them prefer to have views that will join a table with all its "relatives". But then in the application sometimes, we only need to use only 1 column.

So would it be faster to just make simple selects, and then "join" them in the system code?

The system could be in php, java, asp, or any language that connect to the database.

So the question is, what is faster going from a server side (php, java, asp, ruby, python...) to the database and running one query that gets everything we need or going from the server side to the database and running a query that only gets the columns from one table at a time?

auspicious99
  • 105
  • 1
  • 8
sudo.ie
  • 1,331
  • 2
  • 9
  • 5

3 Answers3

106

What would address your question is the subject JOIN DECOMPOSITION.

According to Page 209 of the Book

High Performance MySQL

You can decompose a join by running multiple single-table queries instead of a multitable join, and then performing the join in the application. For example, instead of this single query:

SELECT * FROM tag
JOIN tag_post ON tag_post.tag_id = tag.id
JOIN post ON tag_post.post_id = post.id
WHERE tag.tag = 'mysql';

You might run these queries:

SELECT * FROM tag WHERE tag = 'mysql';
SELECT * FROM tag_post WHERE tag_id=1234;
SELECT * FROM post WHERE post.id IN (123,456,567,9098,8904);

Why on earth would you do this ? It looks wasteful at first glance, because you've increased the number of queries without getting anything in return. However, such restructuring can actually give significant performance advantages:

  • Caching can be more efficient. Many applications cache "objects" that map directly to tables. In this example, if the object with the tag mysql is already cached, the application will skip the first query. If you find posts with an ID of 123, 567, or 908 in the cache, you can remove them from the IN() list. The query cache might also benefit from this strategy. If only one of the tables changes frequently, decomposing a join can reduce the number of cache invalidations.
  • Executing the queries individually can sometimes reduce lock contention
  • Doing joins in the application makes it easier to scale the database by placing tables on different servers.
  • The queries themselves can be more efficient. In this example, using an IN() list instead of a join lets MySQL sort row IDs and retrieve rows more optimally than might be possible with a join.
  • You can reduce redundant row accesses. Doing a join in the application means retrieving each row only once., whereas a join in the query is essentially a denormalization that might repeatedly access the same data. For the same reason, such restructuring might also reduce the total network traffic and memory usage.
  • To some extent, you can view this technique as manually implementing a hash join instead of the nested loops algorithm MySQL uses to execute a join. A hash join might be more efficient.

As a result, doings joins in the application can be more efficient when you cache and reuse a lot of data from earlier queries, you distribute data across multiple servers, you replace joins with IN() lists, or a join refers to the same table multiple times.

OBSERVATION

I like the first bulletpoint because InnoDB is a little heavy-handed when it crosschecks the query cache.

As for the last bulletpoint, I wrote a post back on Mar 11, 2013 (Is there an execution difference between a JOIN condition and a WHERE condition?) that describes the nested loop algorithm. After reading it, you will see how good join decomposition may be.

As for all other points from the book, the developers really look for performance as the bottom line. Some rely on external means (outside of the application) for performance enhancements such as using a fast disk, get more CPUs/Cores, tuning the storage engine, and tuning the configuration file. Others will buckle down and write better code. Some may resort to coding all the business intelligence in Stored Procedures but still not apply join decomposition (See What are the arguments against or for putting application logic in the database layer? along with the other posts). It's all up to the culture and tolerance of each developer shop.

Some may be satisfied with performance and not touch the code anymore. Other simply don't realize there are great benefits one can reap if they try join composition.

For those developers that are willing ...

GIVE IT A TRY !!!

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
50

In Postgres (and probably any RDBMS to a similar extent, MySQL to a lesser extent), fewer queries are almost always much faster.

The overhead of parsing and planning multiple queries is already more than any possible gain in most cases.

Not to speak of additional work to be done in the client, combining the results, which is typically much slower at that. An RDBMS specializes in that kind of task and operations are based on original data types. No casting to text and back for intermediate results or transforming to native types of the client, which may even lead to less correct (or incorrect!) results. Think of floating point numbers ...

You also transfer more data between DB server and client. This may be negligible for a hand full of values, or make a huge difference.

If multiple queries mean multiple round trips to the database server, you also collect multiple times the network latency and transaction overhead, possibly even connection overhead. Big, big loss.

Depending on your setup, network latency alone may take longer than all the rest by orders of magnitude.

Related question on SO:

There may be a turning point for very big, long running queries because transactions collect locks on DB rows on the way. Very big queries may hold many locks for an extended period of time which may cause friction with concurrent queries.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
-1

I don't know if this is possible in (most) sql versions I only really know transact SQL (Microsoft)

The performance loss is mostly given by first joining everything together than filtering in the end by a where clause.

How about integrating the where into the join ON statement and if you have multiple joins intelligently order them so the most filtering is done in the first join reducing the rows to be used in the subsequent joins.

Like:

SELECT * FROM tag
JOIN tag_post ON tag.tag = 'mysql' AND tag_post.tag_id = tag.id 
JOIN post ON tag_post.post_id = post.id

In t-sql this can greatly enhance the speed of the query as most rows from tag_post are filtered out in the first join step (given you have many different tags).

Can anyone comment if this is possible in other sql dialects and if it is as performant as doing multiple queries or if multiple queries still hold a performance boost?

FrankKrumnow
  • 107
  • 1