Affinity analysis in Social Networks using Property Graphs

In the context of social networks, affinity analysis examines the relationships and connections between individuals, groups, or organizations, revealing patterns of shared interests, behaviors, or connections. It helps understand how social structures influence different aspects of individuals, groups, or organizations.

Graph Databases are excellent for Affinity Analysis. We will look at a simple example of identifying Sharing Affinity for posts (messages) in a Social Network i.e. finding all the messages that were “liked” by users where the user know each other. Here is an example of Sharing Affinity.

  1. Zainab knows Fatima and they both “Liked” the message 2
  2. Uroosa knows Fatima and they both “Liked” the message 1.

We will use Property Graphs to identify such Sharing Affinity in a Social Network.

Consider the four relational tables in the database:

Let’s start by creating some sample data:

create table person (person_name varchar(80), person_city varchar(80), person_state varchar(80));
create table message(message_id int, message_content varchar(80));
create table knows(person_name_1 varchar(80), person_name_2 varchar(80), know_date date);
create table likes(person_name varchar(80), message_id int, like_date date);


insert into person values 
  ('Uroosa', 'Tempe', 'AZ')
  , ('Fatima', 'Tempe', 'AZ')
  , ('Zainab', 'Tempe', 'AZ');

insert into message values(1, '...'), (2, '...');

insert into knows values 
  ('Uroosa', 'Fatima', to_date('2023-01-15', 'YYYY-MM-DD'))
  , ('Fatima', 'Zainab', to_date('2023-02-18', 'YYYY-MM-DD'))
;

insert into likes VALUES
  ('Uroosa', 1, to_date('2024-03-31', 'YYYY-MM-DD'))
  , ('Fatima', 1, to_date('2024-03-28', 'YYYY-MM-DD'))
  , ('Fatima', 2, to_date('2024-03-20', 'YYYY-MM-DD'))
  , ('Zainab', 2, to_date('2024-03-21', 'YYYY-MM-DD'));

Using SQL/PGQ, a property graph social_network is articulated as a GRAPH_TABLE, established on the basis of the tables. In this mapping, rows from Person and Message are interpreted as vertices with labels “Person” and “Message” respectively, while rows from Like represent edges with the label “Likes”, and rows from Knows represent edges with the label “Likes”.

CREATE OR REPLACE PROPERTY GRAPH social_network
  VERTEX TABLES (
    person KEY (person_name) PROPERTIES ARE ALL COLUMNS
    , message KEY(message_id) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    likes 
      key (person_name, message_id) 
      SOURCE KEY (person_name) REFERENCES person(person_name)
      DESTINATION KEY(message_id) REFERENCES message(message_id)
      PROPERTIES ARE ALL COLUMNS

    , knows
      key (person_name_1, person_name_2)
      SOURCE KEY (person_name_1) REFERENCES person(person_name)
      DESTINATION KEY (person_name_2) REFERENCES person(person_name)
      PROPERTIES ARE ALL COLUMNS
);

An SQL/PGQ query to discover friends that share an affinity for the same message, can be formulated as following:

SELECT person_1_name, person_2_name, message_id
FROM GRAPH_TABLE (social_network
MATCH
  (person_1 is person) - [e1 is likes] -> (m is message)
  , (person_2 is person) - [e2 is likes] -> (m)
  , (person_1) - [e3 is knows] -> (person_2)
COLUMNS (person_1.person_name AS person_1_name, person_2.person_name as person_2_name, m.message_id as message_id)
);

In graph social_network, a Graph Pattern Matching is employed to decode the intricate relationships between persons and messages. Upon executing the pattern matching, a COLUMNS clause projects the results into a tabular format, enumerating essential attributes. Note that a MATCH clause may have more than one path pattern, in a comma-separated list. Any graph pattern variables in common between two path patterns denotes an overlap between the path patterns. In the preceding example, the vertex variable m is shared. Note that the variable m must bind to the same graph element table in each element pattern of the graph pattern, and thus there is an implicit natural inner join on such repeated graph pattern variables.