The OPTIONAL MATCH clause is a way of fetching related data off a node, without affecting the main query. It allows us to perform some joins similar to the ones we know from SQL, see image below. We will discuss in detail how MATCH and OPTIONAL MATCH clauses can lend themselves towards constructing the joins.

Note: We will not be covering right joins as they are costly to perform in graph database, it is much more efficient to swap the sides and perform a flavour of left join. We also are not going to talk about Self Join.

This document expects you to have a firm understanding of how MATCH clauses perform in Neo4J 2. Please refer to the article on Neo4j 2 – Basics of MATCH clause for a refresher.

Outer Join

MATCH (agency:Agency { Key: ‘training’ })
MATCH (person:Person { UniqueId: 1 })
RETURN agency, person

This query follows from the example used in the previous chapter. It will only return the result-set, if both the MATCH clauses have a result. If any one of the MATCH clause returns a null, the whole result-set will be null. It also does not verify that there is a link between the agency and the person. It simply fetches two nodes matching the filter and returns them, just like an Outer Join.

Rule of thumb: When there are multiple match clauses following each other, all need to return a result for the query to work.

Left Inner Join

The main query in the image below fetches all persons over the age of 18. In addition to that it also fetches their home phone numbers and email addresses, if they were to have any. The result-set should contain the coloured nodes.

The main query as well as its result is underlined in red, it fetches three people out of four that match the age criteria.

The OPTIONAL MATCH clause that fetches phone numbers has a WHERE clause to filter its result-set by type, which is highlighted in yellow. Although Kate didn’t have a home phone number, she was not excluded from the main result-set; this is because the WHERE clause only performs filtering against the OPTIONAL MATCH it follows. If the WHERE clause for the phone numbers were to be placed after the OPTIONAL MATCH for fetching email addresses, then the WHERE clause will be ignored.

The OPTIONAL MATCH clause that fetches email addresses was only able to navigate off Tom and Kate, because Mary was not included in the main result-set. Since there was no filtering on email addresses, it fetched any and all email addresses those people had, in this case, only Tom had an email address.

Note: You can have as many OPTIONAL MATCH clauses as you want following a MATCH clause, there is no legal limit.

Rule of Thumb: To filter an OPTIONAL MATCH, always follow it immediately with a WHERE clause.

Writing using Match filter

The above query can be written using the MATCH filter as shown below:

MATCH (person:Person) WHERE person.Age > 18
OPTIONAL MATCH person-[:HAS_PHONE]-(phone { Type:’Home’ })
OPTIONAL MATCH person-[:HAS_EMAIL]-email
RETURN person, phone, email

 

INNER Join

The query in the image below fetches all persons over the age of 15 and only those with an email address. In addition to that, it also fetches any work phone numbers they might have. The ‘and only’ in the previous sentence is the key for Inner Join. The additional fetching of phone numbers is the LEFT INNER JOIN that we discussed earlier. The result-set should contain the coloured nodes.

With inner join, what we actually mean is that we want everything from both the collections that match the criteria. The red underlined MATCH clauses and the nodes in the resulting graph represent this behaviour.

Rule of thumb: Filter as early as possible.

Even when you have multiple MATCH clauses for which the WHERE clause will apply to, splitting the WHERE clause to perform filter early as possible helps to reduce the subset that is required to be passed on to the next MATCH clause.

Writing using Optional Match

The following query achieves the same result-set as the one above, but using optional match clause. We will soon discover that this is an expensive job. The goal is to help you recognise and rectify bad patterns in your queries.

The query above fetches 5 nodes more than it needs when we compare it with the inner join query we discussed earlier. This is due to the optional match clause fetching Sim’s email addresses only to exclude it in the WHERE clause following it. I hope you can understand the kind of effect that fetching of throw-away nodes can have when run against a bigger dataset.

Note: The WITH/WHERE combo acts as the HAVING in SQL, for grouping result-sets.

Rule of thumb: Whenever an optional match identifier is used within a WITH/WHERE combo as an IS NOT NULL filter, replace it with a MATCH.

Left Outer Join

The query in the image below fetches all persons who do not have an email address. The result-set should contain the coloured nodes. To perform an efficient Left Outer Join, we will be making use of path filtering in WHERE clause.

The Where clause in the above query searches for the given path for each person returned from the match clause. If a path is found for a given person, it is excluded from the subset it returns due to the NOT operator in the Where clause.

Writing using Optional Match

When we discussed the Inner Join, we saw how the same query can be written using optional match and why we shouldn’t do so. In this section, we will again experiment using the optional match clause to achieve the same result-set as above and explain why we shouldn’t be doing that. The goal is to help you recognise and rectify bad patterns in your queries.

The optional match fetches all email addresses for all the people in the graph. Immediately after that it filters them out in the WITH/WHERE combo. In a bigger dataset, fetching nodes to exclude them in the next statement is quite expensive.

Rule of thumb: Whenever an optional match identifier is used within a WITH/WHERE combo to exclude a certain relationship, replace the WITH/WHERE combo with a WHERE clause that performs path filtering.

Rule of thumb: Whenever you see a WITH/WHERE combo filtering an optional match identifier, it’s a smell, so fix it. It is performing an Inner Join or Left Outer Join, either way it is expensive.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s