Network analysis using Graph Query Language - Complex patterns matching in Graph Traversal

In using Graph Query Language (GQL) to query Property Graphs, pattern matching is done by specifying one or more path patterns in the MATCH clause. A single path pattern matches a linear path of vertices and edges, while more complex patterns can be matched by combining multiple path patterns, separated by comma. This is very useful when analyzing Social Network graphs. In Solving the Transitive Closure problem using a Property Graph and Graph Query Language and Associative Rule Mining using Property Graphs posts we used a single path in the MATCH clause. In this post we will be specifying multiple paths in the MATCH clause to get to the desired outcome.

One of the most common query pattern in analyzing Social Networks is the degree of separation. We may be, for example, only interested in interaction with a direct connection i.e. 1st degree and the 2nd degree i.e. friend of friend. This type of analysis fairly easy with Property Graphs. For this example, let’s say we want to pull all the comments on posts that are by either the 1st level connection or the 2nd level connection of the Original Poster. We are not interested in comments from 3rd level, 4th level connections etc. For this we will use the following Property Graph

Note that we have three edges— post_owned_by, commented_on_post and is_connected_to.

And here is our Social Network:

Let’s start with generating some sample data:

create table users (user_id number, user_name varchar(80));
create table posts (post_id number, post_text varchar(1000), user_id number);
create table connections (user1_id number, user2_id number);
create table comments(comment_id number, user_id number, post_id number, comment_text varchar(1000));

insert into users values (1, 'Saqib'), (2, 'Angela'), (3, 'Scott'), (4, 'Uroosa');
insert into posts values (1, 'First Post by Saqib', 1);
insert into posts values (2, 'First Post by Angela', 2);

insert into connections values (1, 2), (2, 3), (3, 4);
insert into comments values (1, 2, 1, 'Comment from Angela'), (1, 3, 1, 'Comment from Scott'), (1, 4, 1, 'Comment from Uroosa');
--insert into comments values (1, 3, 2, 'Comment from Scott'), (1, 4, 2, 'Comment from Uroosa');

Now let’s build out the Property Graph:

CREATE OR REPLACE PROPERTY GRAPH post_interaction_graph
  VERTEX TABLES (
    users KEY ( user_id ) PROPERTIES ARE ALL COLUMNS
    , posts KEY ( post_id ) PROPERTIES ARE ALL COLUMNS
)
  EDGE TABLES (
   connections AS is_connected_to
     key (user1_id, user2_id)
     SOURCE KEY ( user1_id ) REFERENCES users( user_id )
     DESTINATION KEY ( user2_id ) REFERENCES users( user_id )
     PROPERTIES ARE ALL COLUMNS

   , comments as commented_on_post
     KEY (comment_id) 
     SOURCE KEY (user_id ) REFERENCES users(user_id)
     DESTINATION KEY (post_id) REFERENCES posts(post_id)
     PROPERTIES ARE ALL COLUMNS

   , posts as post_owned_by
     KEY (post_id, user_id) 
     SOURCE KEY (post_id) REFERENCES posts(post_id)
     DESTINATION KEY (user_id ) REFERENCES users(user_id)
     PROPERTIES ARE ALL COLUMNS
);

Let’s write the GQL to query the above Property Graph to get all the Comments from Users who are either 1st or 2nd Level connections of the Original Poster

SELECT  *
FROM GRAPH_TABLE(post_interaction_graph
MATCH (post1 is posts) <- [e1 is commented_on_post] - (commenting_user is users)
  , (post1 is posts) - [e2 is post_owned_by] -> (post_owner is users)
  , (post_owner is users) - [e3 is is_connected_to] ->{,2} (commenting_user is users)
COLUMNS ( commenting_user.user_name as commenting_user_name, e1.comment_text, post_owner.user_name, post1.post_text)
); 

Here is the output of the query.