Switch to: V13V12V11V10V9V8V7V6V5

JOIN

Multi-table queries allow to get result of joining data from several tables.

Table Equality Join

The process of record forming by the comparison of corresponding field’s contents is called the table join.

Join on the basis of exact equality between two fields is called equality join. Joins can be based on other kinds of fields’ comparison.

Joins are the basis of multi-table queries. All possible relations between tables can be formed by the comparison of corresponding field’s contents. Thus, joins are the powerful tool for the eliciting the relations existed between data.

SELECT command for a multi-table query should contain the selection condition which defines the relation between fields.

Show the list of all orders including number and price of the order, client name and his credit limit.

--
-- SQL Standard style, specify JOIN in the FROM clause
--
SELECT order_num, amount, company, credit_limit
FROM orders JOIN customers ON cust = cust_num
--
-- Old Style (1980-2000), when join conditions are specified in the WHERE clause.
--
SELECT order_num, amount, company, credit_limit
FROM orders, customers
WHERE cust = cust_num

In the given example two fields from two different tables are compared. These fields are called linked. And CUST = CUST_NUM condition is called relationship condition.

Any two fields can be linked if they have a comparative data type.

Queries with Relationship Ancestor/Descendant Usage

Inside the multi-table queries, the most widespread are queries for two tables linked by the ancestor/descendant relationship.

In the relational database primary and foreign keys form the ancestor/descendant 1:M relationship. The table containing foreign key is a descendant (aka Many-Table, aka Child-Table), a table containing primary key is an ancestor (aka One-Table, aka Parent-Table).

In order to use in the query the ancestor/descendant relationship, you should use the selection condition in which primary key is compared with foreign one.

--
-- Show the list of all employees including the regions they work at.
--
SELECT name, city, region
FROM salesreps JOIN offices ON rep_office = office

You should pay attention that two tables are joining in an explicit way with the JOIN operation and the selection condition which describes the join is in ON statement inside FROM statement, e.g. all WHERE statement contents just go to ON statement.

Valentina database has more than just foreign key based relations. It could be ObjectPtr link or Binary Link. In case of ObjectPtr link you still can write something like:

SELECT name, city, region
FROM salesreps JOIN offices ON salesreps.office_ptr = offices.RecID

where salesreps.office_ptr is ObjectPtr, which points to offices table.

But in case of binary link such syntax is impossible because this kind of link is not based on fields at all. Instead, you can use link name in a join condition for any type of link:

SELECT name, city, region
FROM salesreps JOIN offices ON Lnk_salesreps_office

where Lnk_salesreps_office is a link between salesreps and office tables. It is no matter what kind the link is.

JOIN USING

One more variant of a query for the simple inner join of the tables is possible. If linked fields of two tables have the same names and are compared equally, you can use an alternative form of ON statement. There is just a USING keyword followed by a comma-separated list of linked fields’ names:

SELECT name, city, region
FROM salesreps JOIN offices USING(office)

Such field names should be identical in both tables. This statement is fully equivalent to ON statement in which pair of linked fields is pointed in explicit way but much more compact.

SELECT name, city, region
FROM salesreps JOIN offices ON office = office

Record Selection Condition

You can combine the selection condition with indicated linked fields (join condition) with the other search condition in order to narrow the query results.

--
-- Show the list of offices in which sales plan exceeds $30000.0
-- ATTN! This is old style. Prefer to specify JOIN conditions in the FROM clause.
--
SELECT city, name, title
FROM offices, salesreps
WHERE mgr = empl_num
  	  AND target > 30000.0

In the given example owing to the usage of additional selection condition the number of records in query results’ table decreased.

Retrieval Queries for Three and More Tables

You can join data from three and more tables using the same approach as one which is used for the data join to form two tables.

--
-- Show the order list with the price over $25000, including the name of employee got the order and the name of client do it.
--
SELECT order_num, amount, company, name
FROM orders, customers, salesreps
WHERE cust = cust_num
 	AND rep = empl_num
	AND amount > $25000

Multitable Queries Features

  • sometimes in multi-table queries it is necessary to use qualified fields’ names in order to delete ambiguous references on the fields;
  • in multi-table queries the selection of all fields has the special sense SELECT*);
  • self-joins are created for the making of multi-table queries linking the table in itself;
  • the table aliases are used very often in FROM statement in order to simplify fields’ names qualifying and to differ field references in self-join case.

Qualified Field Names

In the case when two tables contain the fields with the same names you have to use field names qualified with table name instead of simple ones in SELECT command. The table name should correspond to one of the table indicated in FROM statement.

--
-- Show name, office and sales volume of every employee.
--
SELECT name, salesreps.sales, city
FROM salesreps, offices
WHERE rep_office = office

All Fields Retrieval

SELECT * command is used for the retrieval of all fields of the table indicated in FROM statement. In multi-table query star (double star) means the retrieval of all fields out of all tables indicated in FROM statement.

--
-- Report all information about employees and offices they work at.
--
SELECT * 
FROM salesreps, offices
WHERE rep_office = office

Self Joins

Some multi-table queries use the relations existing inside one of the tables. It can be represented as “imagine table copy”. Instead of making table copy DBMS just give you the possibility to refer to it using another name called table alias.

In FROM statement for each table “copy” alias is assigned, e.g. you put alias just after the real table name. If FROM statement contains table alias it means that in full field reference the alias - not table real name should be used.

Of course, it is necessary to apply an alias only for one of two table “copies”.

--
-- Show the list of all employees and their managers.
--
SELECT salesreps.name, mgrs.name
FROM salesreps, salesreps mgrs
WHERE salesreps.manager = mgrs.empl_num

In the given example alias mgrs is assigned only for one table “copy”, for another one we use the original table name.

Table Aliases

Table aliases are necessary for queries contained self-joins. However, aliases can be used in any query (for example, if the query concerns the table of the other user or if the table name is very long and it is hard to use it in qualified field names).

FROM statement for multi-table queries with table aliasing has two important features:

  • All the tables which data used in the query are listed in FROM statement. Any field indicated in SELECT statement should belong to one of the tables mentioned in FROM statement. (There is an exception for outer references contained in sub-query).
  • FROM statement contains table name or alias which is used for table identification in field name qualifying in the query. If table alias is specified in FROM statement it should be used wherever in the query instead of the real table name.

Note: Table aliases in FROM statement should be unique.

AS keyword might be placed between the table name and the alias. It would make the query more readable.

For the table, alias designation correlation name is used. Correlation name assignment and its purpose fully coincide with the given description of alias assignment.

Multi-table Joins

One of the considerable preferences of extended statement FROM is that it ensures the clear join specification of three and more tables. While building so complex joins any join expression can be included in round brackets. You can use the resulting expression for the creating of the other join expression as if it was an ordinal table. Also, there is a possibility to create complex expressions for joins.

SELECT *
FROM T1 OUTER JOIN T2 ON T1.ID *=T2.T1Ptr 
        OUTER JOIN T3 ON T2.ID *= T3.T2Ptr

In this case you clearly define, without any join-order ambiguity, what DBMS should do; and there is no any vagueness towards the join type definitions.

VALENTINA also supports obsolete kind of such query:

SELECT *
FROM T1, T2, T3
WHERE T1.ID *= T2.T1Ptr AND T2.ID *= T3.T2Ptr

In this case, you should execute the operations indicated in FROM statement first; including all joins or join inquiries. When this execution is finishing the selection conditions mentioned in WHERE statement are applying to the table results. Thus, the selection conditions applying to separate joins are mastering in ON statement; the selection condition applying to the resulting table as a whole is mastering in WHERE statement.

The Rules for Multi-table Queries

These rules describe the steps to get the result of table join.

  1. If the query is the union-type (UNION, INTERSECT, DIFFERENCE), you should execute the steps 2 -5 for each part of this complex query and get the separate intermediate tables.
  2. To form the product of the tables listed in FROM statement. If only one table is indicated in FROM statement, it is the product itself.
  3. If there is WHERE statement, you should apply the selection condition to each source record and leave in it the only records for which this condition is true; you should ignore the records for which the selection condition has FALSE or NULL value.
  4. For each record, you should get the value of each element mentioned in select-list and create a single record to the query results’ table. With any field references, you should take the field value for the current record.
  5. If DISTINCT predicate is mentioned you should remove all duplicated records from the query results’ table.
  6. If the query is the union-type query (UNION, INTERSECT, DIFFERENCE), you should perform appropriate set-operation for getting single result table. You should suppress the duplicated records out of this table if ALL keyword is absent.
  7. If there is ORDER BY statement you should sort the query results.

Outer Table Join

The join operation joins the data from two tables forming pairs of linked records from these two tables. Joined table is created from the pair of records of different tables which contain the same values in the linked fields. If the record of one of the tables has no pair the joining can bring to unexpected results.

Another join operation named outer join is possible. Exists LEFT, RIGHT and FULL OUTER JOIN operations.

Example of LEFT OUTER JOIN

--
-- Show the list of employees and cities they work in.
--
SELECT name, city
FROM salesreps LEFT OUTER JOIN offices ON rep_office *= office
--
-- Show the list of employees and cities they work in.
-- OLD STYLE!
--
SELECT name, city
FROM salesreps, offices
WHERE rep_office *= office

Example of FULL OUTER JOIN

--
-- Show the list of employees and cities they work in.
--
SELECT name, city
FROM salesreps FULL OUTER JOIN offices ON rep_office = office
--
-- Show the list of employees and cities they work in.
-- OLD STYLE!
--
SELECT name, city
FROM salesreps, offices 
WHEREN rep_office *=* office

Creating OUTER join

  1. Create inner join of two tables in the usual way.
  2. Each record of the first table which has no link with any record of the second table adds to query results putting NULL value to all second table fields.
  3. Each record of the second table which has no link with any record of the first table adds to query results putting NULL value to all first table fields.
  4. The resulting table is the outer join of two tables.

The result of such join will be the same even in case the tables are swapped.

There are two more kinds of *outer joins* the results of which depend on the position of the tables.

Left and Right Outer Joins

You get left outer join of two tables if you execute steps 1 and 2 from the rules listed above and miss step 3. Thus, left outer join includes all unlinked records of the first (left) table supplementing them with NULL values, but doesn’t include all unlinked records of the second (right) table.

SELECT name, city
FROM salesreps LEFT OUTER JOIN offices ON rep_office = office

You get right outer join of two tables if you execute steps 1 and 3 from the rules listed above and miss point 2. Thus, right outer join includes all unlinked records of the second (right) table supplementing them with NULL values, but doesn’t include all unlinked records of the first (left) table.

SELECT name, city
FROM salesreps RIGHT OUTER JOIN offices ON rep_office = office

"Filtered" JOIN

It can be different join result depends on the moment of applying some condition - before or after join operation. Obviously, there is no difference for inner joins. But it is - for outer ones.

Generally, the order of applying conditions placed in “where” clause is undefined but such conditions must be applied to join result. So we consider “WHERE” conditions as “after-join conditions”. Actually, Valentina always tries to make some optimizations before join, transforming initial query to the form when part of (or whole) “where” clause may be applied to each joined table before joining. And only then, join two sets of records instead of whole tables.

But sometimes we need to solve something like this: Assume we have two tables - Managers and Orders. As usual, it can be many orders linked to one manager. The question is: show the list of ALL managers with their orders. But ignore orders less than $100 (such orders are out of consideration in the manner as there are no such orders at all).

Consider some cases:

SELECT *
FROM t1 JOIN t2 ON id=ptr AND f1=10
SELECT *
FROM t1 JOIN t2 ON id=ptr WHERE f1=10
SELECT *
FROM t1, t2 WHERE id=ptr AND f1=10

Putting some condition in “FROM” clause - you require this condition to be applied BEFORE join.

In the first case, we want condition (f1=10) to be applied BEFORE joining operation rather than second and third cases - where condition will be applied AFTER join.

Get back to the initial question. Obviously, we should use some left outer join to select all managers with their orders (or nulls if no orders exist for the particular manager). But it is not enough. We have to show the only some orders and ignore the rest. In other words - if some “order” is filtered out (not passed throws the filter) then such record is considered as unlinked to any manager.

CREATE TABLE Managers ( id long );
CREATE TABLE Orders ( managerID long, price long );
 
INSERT INTO Managers VALUES( 1 );
INSERT INTO Orders VALUES( 1, 200 );
INSERT INTO Orders VALUES( 1, 50 );

Before-Join Condition

SELECT m.RecID, o.recID 
FROM managers m LEFT OUTER JOIN orders o ON id = managerID AND price > 100
m.RecID o.recID
1 1
1 NULL

(It should be 2 but NULL - because of filtering-condition.)

No condition

SELECT m.RecID, o.recID 
FROM managers m LEFT OUTER JOIN orders o ON id = managerID
m.RecID o.recID
1 1
1 2

After-Join Condition

SELECT m.RecID, o.recID 
FROM managers m LEFT OUTER JOIN orders o ON id = managerID 
WHERE price > 100
m.RecID o.recID
1 1