1

Through trial and error I found that if I do a left join such as:

SELECT 
    firsttable.id,
    secondtable.id,
    secondtable.varcharColumn
FROM 
    firsttable
LEFT JOIN 
    secondtable
ON 
    firsttable.id=secondtable.id

The performance is terrible for larger tables. If I EITHER remove secondtable.varcharColumn as a column in the resultset OR change the type then the performance is about an order of magnitude better. The column is a varchar 255, so it should not make that much of a difference. If I change the type of column to integerColumn or dateColumn than I also see an order of magnitude difference in the performance. Even a varchar of a few characters seems to result in the same performance degradation.

Please note that the WHERE clause is NOT the issue here. I've omitted it because the issue is the same whether or not a WHERE clause is included. The issue has to do with the type of data in the column and not the quantity of data returned. I could also use LIMIT 1000 and have the very same issue.

Indexing the column will not help as the column is not used for joining. It is also not part of the where clause if there was one.

Any suggestions on how to improve the performance would be greatly appreciated!!

3 Answers3

1

The lack of a WHERE clause or an ORDER BY clause probably means that your query is not using an index (and is most certainly selecting all rows in the table).

So, I'd suggest:

  1. Add a "WHERE" clause
  2. Add an "ORDER BY" clause
  3. Make sure that you have appropriate indexes on the two tables for the ids and the columns in the WHERE and ORDER BY clauses.
  4. Look at the query execution plan to make sure that your indexes are used.
1

Check out your query plan and see whether columns included is using indexing.

When you use LEFT JOIN it does matter (results-wise) which table to start with; So, check out that also.

Create index on secondtable.varcharColumn

Here is the link to read more on performance : http://db.apache.org/derby/docs/10.8/tuning/index.html

-1

A bit late to the party, but perhaps this idea will be helpful for anyone.

My suggestion is that query without varcharColumn doesn't need to go into the joined table: using the index (being a primary key, I think) is enough to get requested data.

I believe if you had a composite index { secondtable.id, secondtable.varcharColumn }, you would not have any difference in processing time.

Most probably this time would be greater compared to the "fast version" due to request for an extra field and increased physical size of the index - and thus extra IO operations).

Konstantin
  • 101
  • 1