More fun with Patterns

Priya Jacob
7 min readSep 11, 2023

--

Working with Quantified Path Patterns in Neo4j 5.x is twice the fun as working with Graph Pattern Matching. Quantified Path Patterns (QPP) are part of the Graph Pattern Matching (GPM) feature. GPM is one of the main components of the upcoming GQL standard for querying Graph Databases. GPM includes more components that will come in later releases i.e. GQL > GPM > QPP.

Prior to the introduction of QPP and quantified relationships in Neo4j 5.9, the only way to match paths of variable length in Cypher was with a variable-length relationship, with a few limitations;
- no support for predicates in the relationship pattern (cannot pre-filter on path expansion)
- no support for referencing node variables along the path (cannot control expansion based on nodes or properties)

One often then took recourse to the apoc library with the use of the path expand procedure apoc.path.expandConfig(), but this again
- cannot filter on properties
- is a separate syntax to learn that is inconsistent with Cypher ASCII art style

We then have the Neo4j Traversal API that
- cannot be leveraged on Neo4j Aura
- puts the onus of security and memory considerations on you
- does not adapt to the changing data statistics

Read all about the concepts and syntax & semantics on QPP from the extensive Neo4j 5.x Cypher documentation on the same, while I attempt to illustrate its usage with a few examples on a small dataset viz. the Mumbai local train service. The dataset is simple with lines, stations on each line, and distance between stations with time to travel. It comes with no time table, so not a lot of journey planning with this one, though you could augment it with the required information and extend its use.

Here is the script to load the dataset. I am using an AuraDB Free database here.

CREATE CONSTRAINT con_uq_station_code FOR (n:Station) REQUIRE n.stationCode IS UNIQUE;

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/priya-jacob/QPP/main/MumbaiLocalTrainDataset.csv' AS row
WITH row
MERGE (to:Station {stationCode: row.StationCode})
SET to.station = row.Station,
to.line = row.Line,
to.yearOfOpening = toInteger(row.YearOfOpening),
to.tracks = toInteger(row.Tracks),
to.platforms = toInteger(row.Platforms)

WITH to, row
CALL apoc.create.addLabels(to, [row.Line]) YIELD node

MERGE (from:Station {stationCode: row.PreviousStation})
MERGE (from)-[link:NEXT]->(to)
SET link.distanceKms = toInteger(split(row.DistanceFromPreviousStation,' ')[0]),
link.timeMins = toInteger(split(row.TimeTakenFromPreviousStation,' ')[0]);

MATCH (a:Station)-[r:NEXT]->(a)
DELETE r;

Let us start with some really simple queries, just querying the stations along a line.

MATCH path=(:Station&Western)(()-[:NEXT]->()){1,}(:Station&Western)
WHERE all(x IN nodes(path) WHERE x:Station&Western)
RETURN path
MATCH path=(:Station&`Central 1`)(()-[:NEXT]->()){1,}(:Station&`Central 1`)
WHERE all(x IN nodes(path) WHERE x:Station&`Central 1`)
RETURN path
MATCH path=(:Station&`Central 2`)(()-[:NEXT]->()){1,}(:Station&`Central 2`)
WHERE all(x IN nodes(path) WHERE x:Station&`Central 2`)
RETURN path
MATCH path=(:Station&Harbour)(()-[:NEXT]->())+(:Station&Harbour)
WHERE all(x IN nodes(path) WHERE x:Station&Harbour)
RETURN path

Back in my day, it was a thing to express polar, antithesis or contradictory behavior or states by drawing parallels to Bandra and Kurla stations, they being in totally opposite directions. So how do we get from Bandra to Kurla? And no, we’re not taking the road here. I provide a random maximum bound just to check what paths come through.

MATCH path=(:Station {station:'Bandra'})(()-[:NEXT]-()){1,25}(:Station {station: 'Kurla'})
RETURN path

If I leave it unbounded with ambition, it runs rogue, and I obviously give up waiting.

Let us query the bounded path expansion with actual stops, the total distance and time to travel using quantified relationships.

MATCH path=(:Station {station:'Bandra'})-[links:NEXT]-{1,25}(:Station {station: 'Kurla'})
RETURN reduce(d = 0, link IN links | d + link.distanceKms) AS distanceKms,
reduce(t = 0, link IN links | t + link.timeMins) AS timeMins,
tail([n IN nodes(path) | n.station]) AS stops
ORDER BY distanceKms LIMIT 5;

You could do the same query using QPP and leveraging group variables instead.

MATCH path=(:Station {station:'Bandra'})(()-[links:NEXT]-(s)){1,25}(:Station {station: 'Kurla'})
RETURN reduce(d = 0, link IN links | d + link.distanceKms) AS distanceKms,
reduce(t = 0, link IN links | t + link.timeMins) AS timeMins,
[n IN s | n.station] AS stops
ORDER BY distanceKms LIMIT 5;

Instead of letting my query run rogue with an unbounded variable length expansion, I could run an all shortest paths function, which returns just one path in this case, because there is only one path with the same length as the shortest.

MATCH (d:Station {station:'Bandra'}),(k:Station {station: 'Kurla'})
MATCH p=allShortestPaths((d)-[links:NEXT*]-(k))
RETURN reduce(d = 0, link IN links | d + link.distanceKms) AS distanceKms,
reduce(t = 0, link IN links | t + link.timeMins) AS timeMins,
tail([n IN nodes(p) | n.station]) AS stops
ORDER BY distanceKms LIMIT 5;

Maybe, we try that with another route, say, how do I get to CSMT from Panvel.

MATCH (d:Station {station:'Panvel'}),(k:Station {station: 'CSMT'})
MATCH p=allShortestPaths((d)-[links:NEXT*]-(k))
RETURN reduce(d = 0, link IN links | d + link.distanceKms) AS distanceKms,
reduce(t = 0, link IN links | t + link.timeMins) AS timeMins,
tail([n IN nodes(p) | n.station]) AS stops,
size([r IN relationships(p) WHERE r:NEXT | r]) AS numStops,
size(links) AS numStopsAlt
ORDER BY distanceKms LIMIT 5;

I could query just the shortest path by filtering on the distance making sure the function does not return me a random path of the possible two.

MATCH (d:Station {station:'Panvel'}),(k:Station {station: 'CSMT'})
MATCH p=shortestPath((d)-[links:NEXT*]-(k))
WHERE reduce(d = 0, link IN links | d + link.distanceKms) <= 60
RETURN reduce(d = 0, link IN links | d + link.distanceKms) AS distanceKms,
reduce(t = 0, link IN links | t + link.timeMins) AS timeMins,
[n IN nodes(p) | n.station] AS stops,
size([r IN relationships(p) WHERE r:NEXT | r]) AS numStops,
size(links) AS numStopsAlt

Right, now for some journey planning!

Photo by Kina on Unsplash

Let us say I want to travel from Panvel to Churchgate via Dadar station.

MATCH path=(s:Station {station:'Panvel'})(()<-[:NEXT]-())+(c:Station WHERE c.station = 'Dadar')(()<-[:NEXT]-())+(d:Station {station:'Churchgate'})
RETURN path

Quite a lot of options here, I could query the stops I would need to change stations at from the available routes.

MATCH path=(s:Station {station:'Panvel'})(()<-[:NEXT]-())+(c:Station WHERE c.station = 'Dadar')(()<-[:NEXT]-())+(d:Station {station:'Churchgate'})
WITH s.station AS depart, d.station AS destination, apoc.coll.pairsMin(nodes(path)) AS list
RETURN DISTINCT depart, destination,
reduce(changeAt = '', l IN list | CASE WHEN size(labels(l[1]))>size(labels(l[0])) AND size(apoc.coll.subtract(labels(l[0]),labels(l[1])))=0 THEN changeAt + ' >> ' + l[1].station ELSE changeAt END) AS changeAt

How about wanting to know the last stop of each of the possible routes that depart from CSMT station?

MATCH (:Station {station: 'CSMT'})-[:NEXT]->+
(d:Station WHERE NOT EXISTS { (d)-[:NEXT]->(:Station) })
RETURN DISTINCT d.station AS finalDestination, apoc.coll.subtract(labels(d),['Station']) AS line

Now for some points of interest along the routes! Say, I want to query for stops that were built in the 19th century while sticking to the lines.

Photo by Davide Baraldi on Unsplash
MATCH (s:Station {station: 'CSMT'})((a)-[:NEXT]->())+
(d:Station WHERE NOT EXISTS { (d)-[:NEXT]->(:Station) })
WHERE
all(x IN a WHERE x:(Station&`Central 1`)) OR all(x IN a WHERE x:(Station&`Central 2`)) OR all(x IN a WHERE x:(Station&Harbour))
AND any(x IN a WHERE 1800 <= x.yearOfOpening <= 1870)
RETURN
d.station AS finalDestination,
apoc.coll.subtract(labels(d),['Station'])[0] AS finalDestinationLine,
[y IN a WHERE 1800 <= y.yearOfOpening <= 1870 | y.station] AS pointsOfInterest,
[z IN a WHERE z.station <> 'CSMT' | z.station] AS stops,
size(a)-1 AS numStops

In my final query, I demonstrate use of QPP predicates for filtering. Here, I look for stops between Churchgate and Borivali where the distance is no more than 2Kms.

MATCH (:Station {station: 'Churchgate'})
((s)-[l1:NEXT]->() WHERE l1.distanceKms <= 2)+
(b:Station WHERE exists { (b)-[l:NEXT]->() WHERE l.distanceKms > 2 })(()-[l2:NEXT]->())*
(:Station {station: 'Borivali'})
RETURN [x IN s | x.station] AS lessThanTwoKms, b.station AS breakingPoint, [x IN l2 | x.distanceKms][0] AS distance

QPP is a powerful feature in Cypher and lets you exploit variable length expansions with a lot more flexibility than what was possible earlier. I’m sure there is a lot more coming in GPM, but until then, Happy Graphing!

Photo by José Martín Ramírez Carrasco on Unsplash

--

--