Switch to: V12V11V10V9V8V7V6V5

DISTINCTROW in Valentina SQL

Learn about how to select distinct rows from a join result with Valentina SQL.

MS Access offers own additional keyword - DISTINCTROW to get rid of row-duplicates in the result (see difference between row-duplicates and value-duplicates below).

Valentina offers another approach based on “GROUP BY” and aggregative functions like FIRST()/LAST().

Preface

Let's consider some example - there are two tables - one for persons and second for their phones.

CREATE TABLE Person ( id LONG, fName STRING(20), lName STRING(20) );
CREATE TABLE Phone ( id LONG, person_id LONG, number LONG );

INSERT INTO Person VALUES ( 1, 'John', 'Doe' );
INSERT INTO Person VALUES ( 2, 'Ivan', 'Ivanov' );
INSERT INTO Person VALUES ( 3, 'John', 'Doe' );

INSERT INTO Phone VALUES ( 1, 1, 555111 );
INSERT INTO Phone VALUES ( 2, 1, 555222 );
INSERT INTO Phone VALUES ( 3, 2, 555333 );
INSERT INTO Phone VALUES ( 4, 3, 555444 );

Now we want to select all persons who have phone(s). If we do it as simple as:

SELECT fName, lName 
FROM Person JOIN Phone ON Person.id=Phone.person_id;

we will get a correct result but there are some duplicated records because of 2 phones belonged to the first “John Doe”.

fName lName
John Doe
John Doe
Ivan Ivanov
John Doe

We can use DISTINCT to remove identical values from the result of any query:

SELECT DISTINCT fName, lName 
FROM Person JOIN Phone ON Person.id=Phone.person_id;
fName lName
Ivan Ivanov
John Doe

One of the problems is performance - such a DISTINCT cause sorting all the fields in SELECT list and removing all (lName+fName) duplicates. Note, that it could be number of fields including lengthy ones, like a TEXT or String(2000) - so sorting out such a data may take a lot of time and requires more memory.

Another problem with such a query is possibility to remove unintended rows - one of the “John Doe” person was wrongfully removed (pay attention that we have two different persons named “John Doe”). In other words “DISTINCT” is about to remove value-duplicates.

Sure, this problem can be solved in this way:

SELECT DISTINCT Person.id, fName, lName 
FROM Person JOIN Phone ON Person.id=Phone.person_id;
id fName lName
2 Ivan Ivanov
1 John Doe
3 John Doe

But having additional field in the select list might be inconvenient.

Solutions

MS Access offers own additional keyword - DISTINCTROW, which can be used in place of DISTINCT. This keeps resulting rows which have identical values, but come from different records from collapsing into a single row. With DISTINCTROW they don't have to include the key in the output.

Valentina offers another way based on grouping by “key”:

SELECT FIRST(fName), FIRST(lName) 
FROM Person JOIN Phone ON Person.id=Phone.person_id 
GROUP BY Person.RecID;
fName lName
John Doe
Ivan Ivanov
John Doe

Sure, you can use your own primary key like id field:

SELECT FIRST(fName), FIRST(lName) 
FROM Person JOIN Phone ON Person.id=Phone.person_id 
GROUP BY Person.id;

but grouping by RecID much more effective and you should prefer this way.