Table of Contents
JOINS
Multi-table queries allow to get result of joining data from several tables.
Table Equality Join
Process of record forming by the comparison of corresponding field’s contents is called the table join.
Join on the basis of exact equality between two fields is called equality join. Joins can be based on other kinds of fields’ comparison.
Joins are the basis of multi-table queries. All possible relations between tables can be formed by the comparison of corresponding field’s contents. Thus, joins are the powerful tool for the eliciting the relations existed between data.
SELECT command for multi-table query should contain the selection condition which defines relation between fields.
Show the list of all orders including number and price of the order, client name and his credit limit.
SELECT order_num, amount, company, credit_limit FROM orders, customers WHERE cust = cust_num
In the given example two fields from two different tables are compared. These fields are called linked. And CUST = CUST_NUM condition is called relationship condition.
Any two fields can be linked if they have comparative data type.
Queries with Relationship Ancestor/Descendant Usage
Inside the multi-table queries the most widespread are queries for two tables linked by the ancestor/descendant relationship.
In the relational database primary and foreign keys form the ancestor/descendant relationship. The table containing foreign key is a descendant, a table containing primary key is an ancestor. In order to use in query the ancestor/descendant relationship, you should use the selection condition in which primary key is compared with foreign one.
Show the list of all employees including the regions they work at.
SELECT name, city, region FROM salesreps JOIN offices ON rep_office = office
You should pay attention that two tables are joining in explicit way with the JOIN operation and the selection condition which describes the join is in ON statement inside FROM statement, e.g. all WHERE statement contents just goes to ON statement.
JOIN USING LINK NAME
Valentina database has more than just foreign key based relations. It could be ObjectPtr link or Binary Link. In case of ObjectPtr link you still can write something like:
SELECT name, city, region FROM salesreps JOIN offices ON salesreps.office_ptr = offices.RecID
where salesreps.office_ptr is ObjectPtr which points to offices table. But in case of binary link such syntax is impossible because this kind of link is not based on fields at all.
So you can use link name instead of join condition for any type of link:
SELECT name, city, region FROM salesreps JOIN offices ON Lnk_salesreps_office
where Lnk_salesreps_office is link between salesreps and office tables. It is no matter what kind the link is.
JOIN USING
One more variant of query for simple inner join of the tables is possible. If linked fields of two tables have the same names and are compared in equality, you can use alternative form of ON statement. There is just a USING keyword followed by comma-separated list of linked fields’ names:
SELECT name, city, region FROM salesreps JOIN offices USING(office)
Such field names should be identical in both tables. This statement is fully equivalent to ON statement in which pair of linked fields is pointed in explicit way but much more compact.
SELECT name, city, region FROM salesreps JOIN offices ON office = office
NATURAL JOIN
The join of two tables in which linked fields have identical names is called natural join as usually it is indeed the most “natural” way of two tables join.
SELECT name, city, region FROM salesreps NATURAL JOIN offices
If NATURAL keyword is indicated, ON and USING statements can be omitted in join query as in natural join the selection condition for two tables join is exactly defined - we compare all fields with identical names in both tables.
VALENTINA also supports more obsolete but still widely used syntax of inner join:
SELECT name, city, region FROM salesreps, offices WHERE rep_office = office
Usually it is not necessary to include linked fields into multi-table query results. The matter is that primary and foreign keys are identifiers - difficult to remember for the user while it is much easier to remember corresponding names. That’s why in WHERE statement for two tables’ join we use identifiers and in SELECT statement for the creation of query results’ fields - we use names more convenient for perception.
Record Selection Condition
You can combine the selection condition with indicated linked fields (join condition) with the other selection condition in order to narrow the query results.
Show the list of offices in which sales plan exceeds $30000.0
SELECT city, name, title FROM offices, salesreps WHERE mgr = empl_num AND target > 30000.0
In the given example owing to the usage of additional selection condition the number of records in query results’ table decreased.
Retrieval Queries for Three and More Tables
You can join data from three and more tables using the same approach as one which is used for the data join form two tables.
Show the order list with the price over $25000, including the name of employee got the order and the name of client done it.
SELECT order_num, amount, company, name FROM orders, customers, salesreps WHERE cust = cust_num AND rep = empl_num AND amount > $25000
Multitable Queries Features
- sometimes in multi-table queries it is necessary to use qualified fields’ names in order to delete ambiguous references on the fields;
- in multi-table queries the selection of all fields has the special sense SELECT
*); - self joins are created for the making of multi-table queries linking the table in itself;
- the table aliases are used very often in FROM statement in order to simplify fields’ names qualifying and to differ field references in self join case.
Qualified Field Names
In the case when two tables contain the fields with the same names you have to use field names qualified with table name instead of simple ones in SELECT command. The table name should correspond to one of the table indicated in FROM statement.
Show name, office and sales volume of every employee.
SELECT name, salesreps.sales, city FROM salesreps, offices WHERE rep_office = office
All Fields Retrieval
SELECT * command is used for the retrieval of all fields of the table indicated in FROM statement. In multi-table query star (double star) means the retrieval of all fields out of all tables indicated in FROM statement.
Report all information about employees and offices they work at.
SELECT * FROM salesreps, offices WHERE rep_office = office
Self Joins
Some multi-table queries use the relations existing inside one of the tables. It can be represented as “imagine table copy”. Instead of making table copy DBMS just give you the possibility to refer on it using another name called table alias.
In FROM statement for each table “copy” alias is assigned, e.g. you put alias just after the real table name. If FROM statement contains table alias it means that in full field reference the alias - not table real name should be used.
Of course, it is necessary to apply an alias only for one of two table “copies”.
Show the list of all employees and their managers.
SELECT salesreps.name, mgrs.name FROM salesreps, salesreps mgrs WHERE salesreps.manager = mgrs.empl_num
In the given example alias MGRS is assigned only for one table “copy”, for another one we use the proper (real) table name.
Table Aliases
Table aliases are necessary in queries contained self joins. However aliases can be used in any query (for example, if the query concerns the table of the other user or if the table name is very long and it is hard to use it in qualified field names).
FROM statement for multi-table queries with table aliasing has two important features:
- All the tables which data used in the query are listed in FROM statement. Any field indicated in SELECT statement should belong to one of the tables mentioned in FROM statement. (There is an exception for outer references contained in sub-query).
- FROM statement contains table name or alias which is used for table identification in field name qualifying in the query. If table alias is specified in FROM statement it should be used wherever in the query instead of real table name.
Note: Table aliases in FROM statement should be unique.
AS keyword might be placed between the table name and the alias. It would make query more readable.
For the table alias designation correlation name is used. Correlation name assignment and its purpose are fully coincided with the given description of alias assignment.
Multi-table Joins
One of the considerable preferences of extended statement FROM is that it ensures the clear join specification of three and more tables. While building so complex joins any join expression can be included in round brackets. You can use resulting expression for the creating of the other join expression as if it was ordinal table. Also there is a possibility to create complex expressions for joins.
SELECT * FROM T1 OUTER JOIN T2 ON T1.ID *=T2.T1Ptr OUTER JOIN T3 ON T2.ID *= T3.T2Ptr
In this case you clearly define, without any join-order ambiguity, what DBMS should do; and there is no any vagueness towards the join type definitions.
VALENTINA also supports obsolete kind of such query:
SELECT * FROM T1, T2, T3 WHERE T1.ID *= T2.T1Ptr AND T2.ID *= T3.T2Ptr
In this case you should execute the operations indicated in FROM statement first; including all joins or join enquiries. When this execution is finishing the selection conditions mentioned in WHERE statement are applying to the table results. Thus, the selection conditions applying to separate joins are mastering in ON statement; the selection condition applying to the resulting table as a whole is mastering in WHERE statement.
The Rules for Multi-table Queries
These rules describe the steps to get result of table join.
- If the query is the union-type (UNION, INTERSECT, DIFFERENCE), you should execute the steps 2 -5 for each part of this complex query and get the separate intermediate tables.
- To form the product of the tables listed in FROM statement. If only one table is indicated in FROM statement, it is the product itself.
- If there is WHERE statement, you should apply the selection condition to each source record and leave in it the only records for which this condition is true; you should ignore the records for which the selection condition has FALSE or NULL value.
- For each record you should get the value of each element mentioned in select-list and create single record to the query results’ table. With any field references you should take the field value for the current record.
- If DISTINCT predicate is mentioned you should remove all duplicated records from the query results’ table.
- If the query is the union-type query (UNION, INTERSECT, DIFFERENCE), you should perform appropriate set-operation for getting single result table. You should suppress the duplicated records out of this table if ALL keyword is absent.
- If there is ORDER BY statement you should sort the query results.
Outer Table Join
The join operation joins the data from two tables forming pairs of linked records from these two tables. Joined table is created from the pair of records of different tables which contain the same values in the linked fields. If the record of one of the tables has no pair the joining can bring to unexpected results.
Another join operation named outer table join is possible.
Full outer join is created in such a way:
Show the list of employees and cities they work in.
SELECT name, city FROM salesreps FULL OUTER JOIN offices ON rep_office = office
OUTER key word as INNER key word can be omitted.
With FULL word DBMS defines itself that outer join is enquired.
VALENTINA also supports more obsolete but still widely used syntax of outer join:
Show the list of employees and cities they work in.
SELECT name, city FROM salesreps, offices WHERE rep_office *= office
Creating OUTER join
- Create inner join of two tables in usual way.
- Each record of the first table which has no link with any record of the second table add to query results putting NULL value to all second table fields.
- Each record of the second table which has no link with any record of the first table add to query results putting NULL value to all first table fields.
- Resulting table is the outer join of two tables.
Such join is symmetrical relatively to the both tables.
There are two more kinds of outer joins which are not symmetrical relatively to the source tables.
Left and Right Outer Joins
You get left outer join of two tables if you execute steps 1 and 2 from the rules listed above and miss step 3. Thus, left outer join includes all unlinked records of the first (left) table supplementing them with NULL values, but doesn’t include all unlinked records of the second (right) table.
SELECT name, city FROM salesreps LEFT OUTER JOIN offices ON rep_office = office
You get right outer join of two tables if you execute steps 1 and 3 from the rules listed above and miss point 2. Thus, right outer join includes all unlinked records of the second (right) table supplementing them with NULL values, but doesn’t include all unlinked records of the first (left) table.
SELECT name, city FROM salesreps RIGHT OUTER JOIN offices ON rep_office = office
The source table which is produced all records to join is called the main table and opposite table failing elements of which fill with NULL values is called supplementary one. In practice left and right outer joins are more useful than full outer join, especially if tables are linked through foreign and primary keys.
Show the list of employees and cities they work in.
SELECT name, city FROM salesreps, offices WHERE rep_office *= office
In such join table-descendant (SALESREPS, contained foreign key) is the main one, and table-ancestor (OFFICES) - is supplementary one. The query aim is to preserve in the query results the records of child table which contains NULL values in the foreign key field. No matter what kind of join is used - left or right one.
Sometimes the queries in which parent table is supplementary one is applicable.
Product Operation - CROSS JOIN
Cartesian product (cross product) contains all possible record pairs from two tables. It is the “multiplication result” of two tables.
No one “linked fields” or “join conditions” accompany to cross joins, thus ON and USING statements are not allowed in them.
Query which produces cross-joined result from SALESREPS and OFFICES:
SELECT * FROM salesreps CROSS JOIN offices
Note: Explicit “CROSS JOIN” is an only available form, because implicit “inner join without join condition” is a short form for “single-way linked tables”.
-- Specific to Valentina! -- There are INNER JOINs examples - not CROSS joins. -- Such "short" forms expect existence of a single link (RDB/ObjectPtr/BinaryLink) between joined tables. -- SELECT * FROM salesreps INNER JOIN offices; SELECT * FROM salesreps, offices
"Filtered" JOIN
It can be different join result depends on moment of applying some condition - before or after join operation. Obviously, there is no difference for inner joins. But it is - for outer ones.
Generally, the order of applying conditions placed in “where” clause is undefined but such conditions must be applied to join result. So we consider “WHERE” conditions as “after-join conditions”. Actually, Valentina always tries to make some optimizations before join, transforming initial query to the form when part of (or whole) “where” clause may be applied to each joined table before join. And only then, join two sets of records instead of whole tables.
But sometime we need to solve something like this: Assume we have two tables - Managers and Orders. As usual, it can be many orders linked to one manager. The question is: show the list of ALL managers with their orders. But ignore orders less than $100 (such orders are out of consideration in the manner as there are no such orders at all).
Consider some cases:
SELECT * FROM t1 JOIN t2 ON id=ptr AND f1=10
SELECT * FROM t1 JOIN t2 ON id=ptr WHERE f1=10
SELECT * FROM t1, t2 WHERE id=ptr AND f1=10
Putting some condition in “FROM” clause - you require this condition to be applied BEFORE join.
In the first case we want condition (f1=10) to be applied BEFORE join operation rather then second and third cases - where condition will be applied AFTER join.
Get back to initial question. Obviously, we should use some left outer join to select all managers with their orders (or nulls if no orders exist for particular manager). But it is not enough. We have to show the only some orders and ignore the rest. In other words - if some “order” is filtered out (not passed throws the filter) then such record is considered as unlinked to any manager.
CREATE TABLE Managers ( id long ); CREATE TABLE Orders ( managerID long, price long ); INSERT INTO Managers VALUES( 1 ); INSERT INTO Orders VALUES( 1, 200 ); INSERT INTO Orders VALUES( 1, 50 );
Before-Join Condition
SELECT m.RecID, o.recID FROM managers m LEFT OUTER JOIN orders o ON id = managerID AND price > 100
m.RecID | o.recID |
---|---|
1 | 1 |
1 | NULL |
(It should be 2 but NULL - because of filtering-condition.)
No condition
SELECT m.RecID, o.recID FROM managers m LEFT OUTER JOIN orders o ON id = managerID
m.RecID | o.recID |
---|---|
1 | 1 |
1 | 2 |
After-Join Condition
SELECT m.RecID, o.recID FROM managers m LEFT OUTER JOIN orders o ON id = managerID WHERE price > 100
m.RecID | o.recID |
---|---|
1 | 1 |