Fun with Patterns
If you haven’t heard of the next latest improvements to Neo4j 4.4, do catch the announcement on Neo4j 5.0 here. There’s also a product feature brief of which I’ll be exploring some of the improvements & optimizations to Cypher. Refer to Neo4j 5.0 release notes here.
Cypher was purpose-built for Graph Databases & Neo4j. It is a declarative and an expressive (ASCII art) query language. Cypher is contributing in a big way to a standard for Graph query languages viz. GQL, by the same ISO committee that laid down SQL as a standard for Relational Databases (RDBMS). With Neo4j 5.0, Cypher is being enhanced to align with GQL in terms of label expressions, relationship expressions and graph element filtering capabilities.
I’m going to take you through what these improvements are using the sample Movies graph in my AuraDB Free 5.1.0 instance, that I’ve also additionally enhanced with data from IMDB. If you want to know how you could do so using the low-code-no-code Neo4j Data Importer, watch my recent Intro-to-Neo4j workshop where I demonstrate the same. I’ve done a few additional transforms such as label “Person” nodes as “Actor” & “Director” based on their outgoing :ACTED_IN & :DIRECTED relationships, and label “Movie” nodes by their many genres (as enriched with from the IMDB dataset). My graph has therefore grown from the sample 171 nodes & 253 relationships to 1239 nodes & 4318 relationships. And with that, we’re good to go s̶o̶u̶l̶ ̶s̶e̶a̶r̶c̶h̶i̶n̶g̶ 🅟🅐🅣🅣🅔🅡🅝-🄼🄰🅃🄲🄷🄸🄽🄶 !
Cypher (with Neo4j 5.0) now has syntax for label and relationship type expressions, allowing Users to specify Disjunction (OR), Negation (NOT), and Conjunction (AND) operators between individual labels and relationship types.
Here is a summary of the same;
Let’s start with node pattern expressions.
//just counting//all nodes
MATCH ()
RETURN count(*)
- 1239//all persons
MATCH (n:Person)
RETURN count(n)
- 1108//only movies, but also nodes without labels if present
MATCH (n:!Person)
RETURN count(n)
- 131//persons again
MATCH (n:!!Person)
RETURN count(n)
- 1108//you can't be something and nothing the same time
MATCH (n:Person&!Person)
RETURN count(n)
- 0//to be or not to be - matches everything (including nodes without labels)
MATCH (n:Person|!Person)
RETURN count(n)
- 1239//nodes with a atleast one label
MATCH (n:%)
RETURN count(n)
- 1239//nodes without labels
MATCH (n:!%)
RETURN count(n)
- 0//nodes without labels or not
MATCH (n:%|!%)
RETURN count(n)
- 1239//both without labels and not - returns nothing
MATCH (n:%&!%)
RETURN count(n)
- 0//persons and a little more
MATCH (n:Person&%)
RETURN count(n)
- 1108//persons or whatever (no nodes without labels here)
MATCH (n:Person|%)
RETURN count(n)
- 1239//no person, no cry
(reduces to all nodes with at least one or more labels of which none Person)
MATCH (n:!(Person&%)&%)
RETURN count(n)
- 131//dual role play
MATCH (n:Actor&Director)
RETURN count(n)
- 11//be something (returns distinct set)
MATCH (n:Actor|Director)
RETURN count(n)
- 540//mix 'n match
MATCH (m:(Adventure&Children)&!(War&Crime))
RETURN
count(m) AS moviesForChildren,
collect(m.title) AS theMovies, apoc.coll.subtract(apoc.coll.flatten(collect(labels(m))),['Movie']) AS theGenresMATCH (m:(Adventure&Children)&!(War&Crime))
RETURN
count(m) AS moviesForChildren,
collect(m.title) AS theMovies, apoc.coll.subtract(apoc.coll.flatten(collect(labels(m))),['Movie']) AS theGenres,
apoc.coll.toSet(apoc.coll.flatten(collect([(a:Person)-[:ACTED_IN]->(m)|a.name]))) AS popularWithTheChildren,
apoc.coll.toSet(apoc.coll.flatten(collect([(a:Person)-[:DIRECTED]->(m)|a.name]))) AS makesMoviesForChildren
Now, for relationship pattern expressions.
//all relationships
MATCH ()-[r]->()
RETURN count(r)
- 4318//ACTED_IN relationships
MATCH ()-[r:ACTED_IN]->()
RETURN count(r)
- 544//what you asked!
MATCH ()-[r:!!ACTED_IN]->()
RETURN count(r)
- 544//everything but no ACTED_IN relationships
MATCH ()-[r:!ACTED_IN]->()
RETURN count(r)
- 3774//you can't be something and nothing the same time
MATCH ()-[r:ACTED_IN&!ACTED_IN]->()
RETURN count(r)
- 0//just about everything
MATCH ()-[r:ACTED_IN|!ACTED_IN]->()
RETURN count(r)
- 4318//a relationship always must have a type!
(% will always match a relationship, as all relationships have a type)
MATCH ()-[r:%]->()
RETURN count(r)
- 4318//and so..
MATCH ()-[r:!%]->()
RETURN count(r)
- 0//everything
MATCH ()-[r:%|!%]->()
RETURN count(r)
- 4318//can't be full and empty the same time
MATCH ()-[r:%&!%]->()
RETURN count(r)
- 0//just actors.. again because a relationship can only be one type
(expressions such as [r:R1&R2] will never match, as no relationships have more than one type)
MATCH ()-[r:ACTED_IN&%]->()
RETURN count(r)
- 544//just about everything
MATCH ()-[r:ACTED_IN|%]->()
RETURN count(r)
- 4318//everything but ACTED_IN
MATCH ()-[r:!(ACTED_IN&%)&%]->()
RETURN count(r)
- 3774//acted in or directed (erstwhile supported syntax)
MATCH ()-[r:ACTED_IN|DIRECTED]->()
RETURN count(r)
- 687//not what you thought this would do!
(expressions such as [r:R1&R2] will never match, as no relationships have more than one type)
MATCH ()-[r:ACTED_IN&DIRECTED]->()
RETURN count(r)
- 0//on the same lines...
MATCH (n:Person)-[r WHERE r:!ACTED_IN|!DIRECTED]->(m:Movie)
RETURN n.name, type(r), m.title
- 4315MATCH (n:Person)-[r]->(m:Movie)
WHERE NOT type(r) IN ['ACTED_IN','DIRECTED']
RETURN n.name, type(r), m.title
- 3628MATCH (n:Person)-[r WHERE r:!ACTED_IN&!DIRECTED]->(m:Movie)
RETURN n.name, type(r), m.title
- 3628MATCH (n:Person)-[r WHERE r:!(ACTED_IN|DIRECTED)]->(m:Movie)
RETURN n.name, type(r), m.title
- 3628MATCH (n:Person)-[r WHERE r:ACTED_IN&DIRECTED]->(m:Movie)
RETURN n.name, type(r), m.title
- 0
If you didn’t quite get that, let me just have Tom Hanks explain that to you.
This is what all one-hop paths from Tom Hanks look like in my graph.
//how many patterns? (as seen from above visualization)
MATCH (:Person {name:'Tom Hanks'})-[]->(:Movie)
RETURN count(*)
- 14//what is the distribution? (as seen from above visualization)
MATCH (:Person {name:'Tom Hanks'})-[r]->(:Movie)
RETURN type(r), count(*)
╒══════════╤══════════╕
│"type(r)" │"count(*)"│
╞══════════╪══════════╡
│"ACTED_IN"│13 │
├──────────┼──────────┤
│"DIRECTED"│1 │
└──────────┴──────────┘//filtering out :DIRECTED
MATCH (:Person {name:'Tom Hanks'})-[:ACTED_IN|!DIRECTED]->(:Movie)
RETURN count(*)
- 13//filtering out :WROTE & :PRODUCED
MATCH (:Person {name:'Tom Hanks'})-[:ACTED_IN|!(WROTE|PRODUCED)]->(:Movie)
RETURN count(*)
- 14//just :ACTED_IN (but this probably reads like :ACTED_IN and NOT :DIRECTED the same movie, which could be misleading)
MATCH (:Person {name:'Tom Hanks'})-[:ACTED_IN&!DIRECTED]->(:Movie)
RETURN count(*)
- 13//:ACTED_IN and NOT :DIRECTED the same movie
MATCH (:Person {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie)
WHERE NOT (p)-[:DIRECTED]->(m)
RETURN count(*)
- 12//:ACTED_IN and NOT :DIRECTED any movie
MATCH (:Person {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie)
WHERE NOT (p)-[:DIRECTED]->()
RETURN count(*)
- 0//a relationship type can only be one (but this probably reads like :ACTED_IN and :DIRECTED the same movie, which could be misleading)
MATCH (p:Person {name:'Tom Hanks'})-[:ACTED_IN&DIRECTED]->(m:Movie)
RETURN count(*)
- 0//:ACTED_IN and :DIRECTED the same movie
MATCH (p:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m:Movie)
WHERE (p)-[:DIRECTED]->(m)
RETURN count(*)
- 1//:ACTED_IN and :DIRECTED the same movie - this also works
MATCH (p:Person {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:DIRECTED]-(p)
RETURN count(*)
- 1//:ACTED_IN and anything else in the same movie
MATCH (p:Person {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:!ACTED_IN]-(p)
RETURN count(*)
- 1//this amounts to :ACTED_IN on both ends - remember the 7 bridges of Konigsberg? Cypher won't go over the same relationship twice in the same pattern (relationship isomorphism)
MATCH (p:Person {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:!DIRECTED]-(p)
RETURN count(*)
- 0//same thing ^^
MATCH (p:Person {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(p)
RETURN count(*)
- 0//again this matches the pattern where the relationship isn't the same on both ends
MATCH (p:Person {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[]-(p)
RETURN count(*)
- 1
To conclude, negation here has nothing to do with the existence of a relationship of a particular type. It simply has to do with filtering. On the contrary, for node labels, it is about existence.
More fun with patterns;
//not quite what I want (I want directors who were actors in the same movie)
MATCH (n WHERE n:Actor&Director)-[r:ACTED_IN|DIRECTED]->(m)
RETURN
n.name AS theGoGetter,
type(r) AS theRole,
collect(m.title) AS theMovies//the disgrace
MATCH (n WHERE n:Actor&Director)-[r:ACTED_IN|DIRECTED]->(m)
WITH n.name AS theExtraMiler, collect(CASE WHEN type(r) = 'ACTED_IN' THEN m.title END) AS thoseActedIn,
collect(CASE WHEN type(r) = 'DIRECTED' THEN m.title END) AS thoseDirected
WITH theExtraMiler, apoc.coll.intersection(thoseActedIn, thoseDirected) AS theMovies
WHERE size(theMovies) > 0
RETURN theExtraMiler, theMovies//simple pattern matching does it!
MATCH (n WHERE n:Actor&Director)-[:ACTED_IN]->(m)<-[:DIRECTED]-(n)
RETURN
n.name AS theExtraMiler,
collect(m.title) AS theMoviesMATCH (n:Person)-[:ACTED_IN]->(m)<-[:DIRECTED]-(n)
RETURN
n.name AS theExtraMiler,
collect(m.title) AS theMovies//be something more
MATCH (n WHERE n:Director)-[r:!DIRECTED]->(m)
RETURN
n.name AS theMightyDirector,
type(r) AS theRole,
collect(m.title) AS theMoviesMATCH (n WHERE n:Actor)-[r:!ACTED_IN]->(m)
RETURN
n.name AS theGoGetter,
type(r) AS theRole,
collect(m.title) AS theMovies
Right, we now move on to pattern predicates. Let’s try that with paths first;
MATCH path=((n)-[r:REVIEWED]->(m:Movie) WHERE n:Person AND r.rating > 3 AND m:Comedy:Romance AND m.released > 2000)
RETURN path
…and that returns with an error
Neo.DatabaseError.General.UnknownError
No support for WHERE in ParenthesizedPath yet.
This however works;
MATCH (n)-[r:REVIEWED]->(m:Movie)
WHERE n:Person AND 5 >= r.rating >= (m.imdbRating/2) AND m:Animation|Fantasy AND m.released > 1990
RETURN
m.title AS theMovie,
count(r) AS reviewCount,
max(r.rating) AS topRating,
CASE WHEN m:Children THEN false ELSE true END AS parentalGuidance, apoc.coll.subtract(labels(m),['Movie']) AS theGenres,
m.imdbRating/2 AS IMDBRating
ORDER BY reviewCount DESC
LIMIT 5
In Neo4j 4.4, only node pattern predicates are possible. However, in 5.0, pattern predicate support has been extended to relationship patterns as well.
MATCH (n WHERE n:Person)-[r:REVIEWED WHERE 5 >= r.rating >= (m.imdbRating/2)]->(m:Movie WHERE m:Animation|Fantasy AND m.released > 1990)
RETURN
m.title AS theMovie,
count(r) AS reviewCount,
max(r.rating) AS topRating,
CASE WHEN m:Children THEN false ELSE true END AS parentalGuidance, apoc.coll.subtract(labels(m),['Movie']) AS theGenres,
m.imdbRating/2 AS IMDBRating
ORDER BY reviewCount DESC
LIMIT 5
The above will throw you an error in 4.4 as below
Neo.ClientError.Statement.SyntaxError
Invalid input ‘WHERE’: expected
“*”
“]”
“{“
“|”
a parameter (line 1, column 38 (offset: 37))
“MATCH (n WHERE n:Person)-[r:REVIEWED WHERE 5 >= r.rating >= (m.imdbRating/2)]->(m:Movie WHERE m:Animation|Fantasy AND m.released > 1990)”
The new relationship type expressions cannot currently be used inside variable length relationships. This limitation will be lifted in future versions of Neo4j 5.0, when a new quantification expression is likely to be introduced.
MATCH path=shortestPath((:Actor {name:'Tom Hanks'})-[!REVIEWED*]-(:Actor {name:'Clint Eastwood'}))
RETURN path
So this fails with
Neo.ClientError.Statement.SyntaxError
Invalid input ‘!’: expected
“*”
“:”
“]”
“{“
a parameter
an identifier (line 1, column 54 (offset: 53))
“MATCH path=shortestPath((:Actor {name:’Tom Hanks’})-[!REVIEWED*]-(:Actor {name:’Clint Eastwood’}))”
…so for now, we’ll just have to go old-school;
MATCH path=shortestPath((:Person {name:'Tom Hanks'})-[:ACTED_IN|DIRECTED|PRODUCED|WROTE*]-(:Person {name:'Clint Eastwood'}))
RETURN path
And now, a note on mixers.
Mixing old syntax rules with new syntax rules within the same expression is not permitted. So the below query
MATCH (:Movie & ((:Romance:Comedy)|(:Animation:Fantasy)))
RETURN
m.title AS theMovie,
apoc.coll.subtract(labels(m),['Movie']) AS theGenres
returns with an error
Neo.ClientError.Statement.SyntaxError
Invalid input ‘:’: expected “%”, “(“ or an identifier (line 1, column 19 (offset: 18))
“MATCH (:Movie & ((:Romance:Comedy) | (:Animation:Fantasy)))”
However, either of these work;
MATCH (m:Movie)
WHERE m:Romance:Comedy OR m:Animation:Fantasy
RETURN
m.title AS theMovie,
apoc.coll.subtract(labels(m),['Movie']) AS theGenresMATCH (m:Movie & ((Romance&Comedy)|(Animation&Fantasy)))
RETURN
m.title AS theMovie,
apoc.coll.subtract(labels(m),['Movie']) AS theGenres
Mixing old syntax rules with new syntax rules within the same statement is also not permitted. So this query
MATCH (n:Person)-[r:!REVIEWED&!ACTED_IN]->(m:`Sci-Fi`:Thriller)
RETURN
n.name AS theCrew,
m.title AS movie,
collect(type(r)) AS theRoles,
apoc.coll.subtract(labels(m),['Movie']) AS theGenres
returns with an error
Neo.ClientError.Statement.SyntaxError
Mixing label expression symbols (‘|’, ‘&’, ‘!’, and ‘%’) with colon (‘:’) is not allowed. Please only use one set of symbols. This expression could be expressed as :`Sci-Fi`&Thriller. (line 1, column 54 (offset: 53))
“MATCH (n:Person)-[r:!REVIEWED&!ACTED_IN]->(m:`Sci-Fi`:Thriller)”
But these work;
MATCH (n:Person)-[r:!REVIEWED&!ACTED_IN]->(m:`Sci-Fi`&Thriller)
RETURN
n.name AS theCrew,
m.title AS movie,
collect(type(r)) AS theRoles,
apoc.coll.subtract(labels(m),['Movie']) AS theGenresMATCH (n:Person)-[r:!REVIEWED&!ACTED_IN]->(m:(`Sci-Fi`&Thriller)|(Action&Crime))
RETURN
n.name AS theCrew,
m.title AS movie,
collect(type(r)) AS theRoles,
apoc.coll.subtract(labels(m),['Movie']) AS theGenres
And to wrap up, a note on recommendations!
We at Neo4j always teach one recommendation query to those coming to our Cypher Intro class and it has to do with recommending Actors that Tom Hanks can work with in the sample Movies graph.
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(m1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(m2:Movie)<-[:ACTED_IN]-(cocoActor:Person)
WHERE NOT (tom)-[:ACTED_IN]->()<-[:ACTED_IN]-(cocoActor)
AND tom <> cocoActor
RETURN
cocoActor.name AS recommendedActor,
collect(DISTINCT coActor.name) AS commonCoActors,
count(DISTINCT coActor.name) AS strength
ORDER BY strength DESC
LIMIT 5
Here, I’m taking the same query, throwing Movie genres into the mix and considering all of the relationships (:ACTED_IN, :DIRECTED, :WROTE, :PRODUCED) except for reviews (:REVIEWED). So here, we’re recommending Al Pacino new connections, through mutual connections via his associated Movies, that also then belong to a specific class of genres.
MATCH (p:Person WHERE p.name = 'Al Pacino')-[]->(m1:Movie & (Drama|Thriller|Action))<-[:!REVIEWED]-(co:Person)-[:!REVIEWED]->(m2:Movie & (Drama|Adventure))<-[:!REVIEWED]-(coCo)
WHERE p <> coCo AND NOT exists { (p)-[:!REVIEWED]->(:Movie)<-[:!REVIEWED]-(coCo) }
RETURN
coCo.name AS personToReckonWith,
collect(DISTINCT co.name) AS mutualConnections,
count(DISTINCT co.name) AS strength, apoc.coll.subtract(apoc.coll.toSet(apoc.coll.flatten(collect(labels(m2)))),['Movie']) AS trySomethingNew
ORDER BY strength DESC
And with that, I’ve taken you through some compelling and convincing changes to pattern expressions in Cypher with Neo4j 5.0, that I’m certain will make it even easier to write complex pattern-matching queries! Do give it a go with your own data and questions you’re looking to answer with Cypher!