Table of Contents
JOIN
Multi-table queries allow to get result of joining data from several tables.
Table Equality Join
The process of record forming by the comparison of corresponding field’s contents is called the table join.
Join on the basis of exact equality between two fields is called equality join. Joins can be based on other kinds of fields’ comparison.
Joins are the basis of multi-table queries. All possible relations between tables can be formed by the comparison of corresponding field’s contents. Thus, joins are the powerful tool for the eliciting the relations existed between data.
SELECT command for a multi-table query should contain the selection condition which defines the relation between fields.
Show the list of all orders including number and price of the order, client name and his credit limit.
-- -- SQL Standard style, specify JOIN in the FROM clause -- SELECT order_num, amount, company, credit_limit FROM orders JOIN customers ON cust = cust_num
-- -- Old Style (1980-2000), when join conditions are specified in the WHERE clause. -- SELECT order_num, amount, company, credit_limit FROM orders, customers WHERE cust = cust_num
In the given example two fields from two different tables are compared. These fields are called linked. And CUST = CUST_NUM condition is called relationship condition.
Any two fields can be linked if they have a comparative data type.
Queries with Relationship Ancestor/Descendant Usage
Inside the multi-table queries, the most widespread are queries for two tables linked by the ancestor/descendant relationship.
In the relational database primary and foreign keys form the ancestor/descendant 1:M relationship. The table containing foreign key is a descendant (aka Many-Table, aka Child-Table), a table containing primary key is an ancestor (aka One-Table, aka Parent-Table).
In order to use in the query the ancestor/descendant relationship, you should use the selection condition in which primary key is compared with foreign one.
-- -- Show the list of all employees including the regions they work at. -- SELECT name, city, region FROM salesreps JOIN offices ON rep_office = office
You should pay attention that two tables are joining in an explicit way with the JOIN operation and the selection condition which describes the join is in ON statement inside FROM statement, e.g. all WHERE statement contents just go to ON statement.
JOIN ON link_name
Valentina database has more than just foreign key based relations. It could be ObjectPtr link or Binary Link. In case of ObjectPtr link you still can write something like:
SELECT name, city, region FROM salesreps JOIN offices ON salesreps.office_ptr = offices.RecID
where salesreps.office_ptr
is ObjectPtr, which points to offices table.
But in case of binary link such syntax is impossible because this kind of link is not based on fields at all. Instead, you can use link name in a join condition for any type of link:
SELECT name, city, region FROM salesreps JOIN offices ON Lnk_salesreps_office
where Lnk_salesreps_office is a link between salesreps and office tables. It is no matter what kind the link is.
JOIN USING
One more variant of a query for the simple inner join of the tables is possible. If linked fields of two tables have the same names and are compared equally, you can use an alternative form of ON statement. There is just a USING keyword followed by a comma-separated list of linked fields’ names:
SELECT name, city, region FROM salesreps JOIN offices USING(office)
Such field names should be identical in both tables. This statement is fully equivalent to ON statement in which pair of linked fields is pointed in explicit way but much more compact.
SELECT name, city, region FROM salesreps JOIN offices ON office = office
Record Selection Condition
You can combine the selection condition with indicated linked fields (join condition) with the other search condition in order to narrow the query results.
-- -- Show the list of offices in which sales plan exceeds $30000.0 -- ATTN! This is old style. Prefer to specify JOIN conditions in the FROM clause. -- SELECT city, name, title FROM offices, salesreps WHERE mgr = empl_num AND target > 30000.0
In the given example owing to the usage of additional selection condition the number of records in query results’ table decreased.
Retrieval Queries for Three and More Tables
You can join data from three and more tables using the same approach as one which is used for the data join to form two tables.
-- -- Show the order list with the price over $25000, including the name of employee got the order and the name of client do it. -- SELECT order_num, amount, company, name FROM orders, customers, salesreps WHERE cust = cust_num AND rep = empl_num AND amount > $25000
Multitable Queries Features
- sometimes in multi-table queries it is necessary to use qualified fields’ names in order to delete ambiguous references on the fields;
- in multi-table queries the selection of all fields has the special sense SELECT
*); - self-joins are created for the making of multi-table queries linking the table in itself;
- the table aliases are used very often in FROM statement in order to simplify fields’ names qualifying and to differ field references in self-join case.
Qualified Field Names
In the case when two tables contain the fields with the same names you have to use field names qualified with table name instead of simple ones in SELECT command. The table name should correspond to one of the table indicated in FROM statement.
-- -- Show name, office and sales volume of every employee. -- SELECT name, salesreps.sales, city FROM salesreps, offices WHERE rep_office = office
All Fields Retrieval
SELECT * command is used for the retrieval of all fields of the table indicated in FROM statement. In multi-table query star (double star) means the retrieval of all fields out of all tables indicated in FROM statement.
-- -- Report all information about employees and offices they work at. -- SELECT * FROM salesreps, offices WHERE rep_office = office
Self Joins
Some multi-table queries use the relations existing inside one of the tables. It can be represented as “imagine table copy”. Instead of making table copy DBMS just give you the possibility to refer to it using another name called table alias.
In FROM statement for each table “copy” alias is assigned, e.g. you put alias just after the real table name. If FROM statement contains table alias it means that in full field reference the alias - not table real name should be used.
Of course, it is necessary to apply an alias only for one of two table “copies”.
-- -- Show the list of all employees and their managers. -- SELECT salesreps.name, mgrs.name FROM salesreps, salesreps mgrs WHERE salesreps.manager = mgrs.empl_num
In the given example alias mgrs is assigned only for one table “copy”, for another one we use the original table name.
Table Aliases
Table aliases are necessary for queries contained self-joins. However, aliases can be used in any query (for example, if the query concerns the table of the other user or if the table name is very long and it is hard to use it in qualified field names).
FROM statement for multi-table queries with table aliasing has two important features:
- All the tables which data used in the query are listed in FROM statement. Any field indicated in SELECT statement should belong to one of the tables mentioned in FROM statement. (There is an exception for outer references contained in sub-query).
- FROM statement contains table name or alias which is used for table identification in field name qualifying in the query. If table alias is specified in FROM statement it should be used wherever in the query instead of the real table name.
Note: Table aliases in FROM statement should be unique.
AS keyword might be placed between the table name and the alias. It would make the query more readable.
For the table, alias designation correlation name is used. Correlation name assignment and its purpose fully coincide with the given description of alias assignment.
Multi-table Joins
One of the considerable preferences of extended statement FROM is that it ensures the clear join specification of three and more tables. While building so complex joins any join expression can be included in round brackets. You can use the resulting expression for the creating of the other join expression as if it was an ordinal table. Also, there is a possibility to create complex expressions for joins.
SELECT * FROM T1 OUTER JOIN T2 ON T1.ID *=T2.T1Ptr OUTER JOIN T3 ON T2.ID *= T3.T2Ptr
In this case you clearly define, without any join-order ambiguity, what DBMS should do; and there is no any vagueness towards the join type definitions.
VALENTINA also supports obsolete kind of such query:
SELECT * FROM T1, T2, T3 WHERE T1.ID *= T2.T1Ptr AND T2.ID *= T3.T2Ptr
In this case, you should execute the operations indicated in FROM statement first; including all joins or join inquiries. When this execution is finishing the selection conditions mentioned in WHERE statement are applying to the table results. Thus, the selection conditions applying to separate joins are mastering in ON statement; the selection condition applying to the resulting table as a whole is mastering in WHERE statement.
The Rules for Multi-table Queries
These rules describe the steps to get the result of table join.
- If the query is the union-type (UNION, INTERSECT, DIFFERENCE), you should execute the steps 2 -5 for each part of this complex query and get the separate intermediate tables.
- To form the product of the tables listed in FROM statement. If only one table is indicated in FROM statement, it is the product itself.
- If there is WHERE statement, you should apply the selection condition to each source record and leave in it the only records for which this condition is true; you should ignore the records for which the selection condition has FALSE or NULL value.
- For each record, you should get the value of each element mentioned in select-list and create a single record to the query results’ table. With any field references, you should take the field value for the current record.
- If DISTINCT predicate is mentioned you should remove all duplicated records from the query results’ table.
- If the query is the union-type query (UNION, INTERSECT, DIFFERENCE), you should perform appropriate set-operation for getting single result table. You should suppress the duplicated records out of this table if ALL keyword is absent.
- If there is ORDER BY statement you should sort the query results.
Outer Table Join
The join operation joins the data from two tables forming pairs of linked records from these two tables. Joined table is created from the pair of records of different tables which contain the same values in the linked fields. If the record of one of the tables has no pair the joining can bring to unexpected results.
Another join operation named outer join is possible. Exists LEFT, RIGHT and FULL OUTER JOIN operations.
Example of LEFT OUTER JOIN
-- -- Show the list of employees and cities they work in. -- SELECT name, city FROM salesreps LEFT OUTER JOIN offices ON rep_office *= office
-- -- Show the list of employees and cities they work in. -- OLD STYLE! -- SELECT name, city FROM salesreps, offices WHERE rep_office *= office
Example of FULL OUTER JOIN
-- -- Show the list of employees and cities they work in. -- SELECT name, city FROM salesreps FULL OUTER JOIN offices ON rep_office = office
-- -- Show the list of employees and cities they work in. -- OLD STYLE! -- SELECT name, city FROM salesreps, offices WHEREN rep_office *=* office
Creating OUTER join
- Create inner join of two tables in the usual way.
- Each record of the first table which has no link with any record of the second table adds to query results putting NULL value to all second table fields.
- Each record of the second table which has no link with any record of the first table adds to query results putting NULL value to all first table fields.
- The resulting table is the outer join of two tables.
The result of such join will be the same even in case the tables are swapped.
There are two more kinds of *outer joins* the results of which depend on the position of the tables.
Left and Right Outer Joins
You get left outer join of two tables if you execute steps 1 and 2 from the rules listed above and miss step 3. Thus, left outer join includes all unlinked records of the first (left) table supplementing them with NULL values, but doesn’t include all unlinked records of the second (right) table.
SELECT name, city FROM salesreps LEFT OUTER JOIN offices ON rep_office = office
You get right outer join of two tables if you execute steps 1 and 3 from the rules listed above and miss point 2. Thus, right outer join includes all unlinked records of the second (right) table supplementing them with NULL values, but doesn’t include all unlinked records of the first (left) table.
SELECT name, city FROM salesreps RIGHT OUTER JOIN offices ON rep_office = office
"Filtered" JOIN
It can be different join result depends on the moment of applying some condition - before or after join operation. Obviously, there is no difference for inner joins. But it is - for outer ones.
Generally, the order of applying conditions placed in “where” clause is undefined but such conditions must be applied to join result. So we consider “WHERE” conditions as “after-join conditions”. Actually, Valentina always tries to make some optimizations before join, transforming initial query to the form when part of (or whole) “where” clause may be applied to each joined table before joining. And only then, join two sets of records instead of whole tables.
But sometimes we need to solve something like this: Assume we have two tables - Managers and Orders. As usual, it can be many orders linked to one manager. The question is: show the list of ALL managers with their orders. But ignore orders less than $100 (such orders are out of consideration in the manner as there are no such orders at all).
Consider some cases:
SELECT * FROM t1 JOIN t2 ON id=ptr AND f1=10
SELECT * FROM t1 JOIN t2 ON id=ptr WHERE f1=10
SELECT * FROM t1, t2 WHERE id=ptr AND f1=10
Putting some condition in “FROM” clause - you require this condition to be applied BEFORE join.
In the first case, we want condition (f1=10) to be applied BEFORE joining operation rather than second and third cases - where condition will be applied AFTER join.
Get back to the initial question. Obviously, we should use some left outer join to select all managers with their orders (or nulls if no orders exist for the particular manager). But it is not enough. We have to show the only some orders and ignore the rest. In other words - if some “order” is filtered out (not passed throws the filter) then such record is considered as unlinked to any manager.
CREATE TABLE Managers ( id long ); CREATE TABLE Orders ( managerID long, price long ); INSERT INTO Managers VALUES( 1 ); INSERT INTO Orders VALUES( 1, 200 ); INSERT INTO Orders VALUES( 1, 50 );
Before-Join Condition
SELECT m.RecID, o.recID FROM managers m LEFT OUTER JOIN orders o ON id = managerID AND price > 100
m.RecID | o.recID |
---|---|
1 | 1 |
1 | NULL |
(It should be 2 but NULL - because of filtering-condition.)
No condition
SELECT m.RecID, o.recID FROM managers m LEFT OUTER JOIN orders o ON id = managerID
m.RecID | o.recID |
---|---|
1 | 1 |
1 | 2 |
After-Join Condition
SELECT m.RecID, o.recID FROM managers m LEFT OUTER JOIN orders o ON id = managerID WHERE price > 100
m.RecID | o.recID |
---|---|
1 | 1 |