How do I calculate for each person in the network how many people agree with their opinion?

1

I have a signed bipartite graph in which the nodes are (1) students and (2) topics. An edge is drawn between a student and topic node if the student mentions their opinion about the topic in a short answer (i.e., some students have an opinion about one topic but not another). The valence of the edge indicates whether the opinion is positive or negative.

My question is: how do I find out how many other students agree with a particular student? In terms of not just which topics they have an opinion on, but also what that opinion is (positive/negative).

EDIT: Based on a comment below

1) What exactly do you mean by agree? Should all existing opinions align or only those on a specific topic? What if one student has additional opinions about a topic?

All existing opinions (including topics and the valence) should align. If a student gives an same opinion about the same topics as another, but happens to talk about another topic as well, that would not count as full agreement. Perhaps there is a way to calculate partial agreement as well?

2) What exactly is your problem? Defining the characteristics is straightforward: Just count. Do you perhaps struggle with an algorithmic implementation?

Algorithmic implementation is perhaps what I am going for. Since I have 100 students, it would be difficult to hand-count the number of peers that agree with them. So if there is a way to calculate a value for each student, that would be helpful.

iamnarra

Posted 2017-09-14T07:52:56.543

Reputation: 131

Thank you for your edit. With respect to the remaining question, all I could answer right now is: Write a computer program. – Wrzlprmft – 2017-09-14T08:38:15.730

@Wrzlprmft Thanks for the very helpful feedback. What kind of computer program would that be? – iamnarra – 2017-09-14T08:41:58.273

Well, what programming experience do you have? – Wrzlprmft – 2017-09-14T08:51:43.053

@Wrzlprmft All I am asking for here is if there an algorithm or formula that could calculate what I am looking for. Obviously I know that a script has to be written but I haven't the slightest clue what that script is. – iamnarra – 2017-09-14T08:53:57.847

Answers

2

As a fast answer, you can represent each student as a vector with $K$ elements (where $K$ is the number of topics) and values $\{+1, 0, -1\}$, denoting positive/non-existent/negative opinion about this topic.

Then, a simple measure of agreement between two students is the element-wise product between two student-vectors. That is the product will be: $similarity = \sum_{i=1}^{K}st_1[i]*st_2[i]$, where $st_1,st_2$ are the student-vectors. Obviously, only the topics where both students have aligned opinions will boost the total [e.g. $1*1=1$ and $(-1)*(-1)=1]$, while misaligned opinions will decrease the sum. If any of the two students haven't expressed an opinion about a topic, then this topic won't matter in the sum.

In that sense, you can find the most like-minded students to a specific student, as the ones with the highest $similarity$. If what you really need is a number of agreeing students for each unique student, then a threshold on the $similarity$ score can be set. The value of the threshold can be decided empirically from your data.

This is easily implemented and if you are comfortable with coding, I could post a sample script in python. One thing to consider though, is in what format is the bipartite graph (a .csv, a graph file of some kind etc.).

EDIT: MINOR EXAMPLE. Fetch example .csv file used from here.

import pandas as pd
import numpy as np

# Change location of file according to your needs
with open('students_example.csv', 'r') as f:
    df = pd.read_csv(f)
# Print for visualization
print(df.head())
print("~"*25)

# Delete column containing the student_id
del df['Student_ID']
# Parse the pandas DataFrame as matrix
student_vectors = df.as_matrix()
# The number of students at hand, let it be N.
N_students = student_vectors.shape[0]
# Initialize empty matrix of similarity between students
# Its size will be NxN (each student with each other)
similarity_scores = np.zeros((N_students, N_students))
# Iterate over each student vector and calculate the
# similarity with all students
for i, student in enumerate(student_vectors):
    # Reshaping and transposing to get the dot product between each student
    # And all the student vectors
    similarity_scores[i,:] = np.dot(student.reshape(1,-1), student_vectors.T)
# Fill the diagonal (that is the similarity of each student with him/herself)
# with low similarity scores so as not to confuse them with other possibly
# agreeing students
np.fill_diagonal(similarity_scores, -1000)

# Random wanted student for example purposes
wanted_id = 3
# Print Students Opinion
print("Wanted Students Opinion:")
print(df.loc[wanted_id].to_string())
print("~"*25)
print("Most similar:(Student ID = %d)"% np.argsort(similarity_scores[wanted_id,:])[::-1][0])
print df.loc[np.argsort(similarity_scores[wanted_id,:])[::-1][0]].to_string()
print("~"*25)
print("Second most similar:(Student ID = %d)"% np.argsort(similarity_scores[wanted_id,:])[::-1][1])
print df.loc[np.argsort(similarity_scores[wanted_id,:])[::-1][1]].to_string()
print("~"*25)

If you follow the example, the output for the wanted student (with $student_{ID}=3$) with opinions: {Trump -1, Net Neutrality -1,Vaccination 1, Obamacare -1}

will give you two other students with the same opinions and their ids.

You can modify the script to fit your needs accordingly.

P.S.: Sorry for the messy code, it was written rather hastily. Also, i tried it with Python 2.7.

Bogas

Posted 2017-09-14T07:52:56.543

Reputation: 556

Thanks a lot for this. I am still learning Python so it would be great if you could post a samples script. The data is in .csv. – iamnarra – 2017-09-14T09:42:29.790

1@iamnarra Ok. Edited the answer and added an example script, alongside a toy dataset in .csv. Hope this is a good starting point and helps you achieve what you want. – Bogas – 2017-09-14T12:48:37.903