A comprehensive review of security threats and solutions for the online social networks industry

View article
PeerJ Computer Science

Main article text

 

Introduction

Rationale of the study

Background

Social media on human communication

Social media and public communication

Goals

Methodology

Review plan

  • Research objectives

  • Specifying research questions (RQs)

  • Organizing searches of databases

  • Studies selection

  • Screening relevant studies

  • Data extraction

  • Results synthesizing

  • Finalizing the review report

Defining question

Search strategy

Selection based on inclusion/exclusion criteria

Selection based on snowballing

Overview of intermediate selection process outcome

Overview of selected studies

What Security Threats are There on OSN Which Affect the OSN’s Users?

Classical threats

Malware

Phishing attacks

Spam attack

Cross-site scripting (XSS)

Modern threats

Clickjacking

De-anonymization attacks

Sybil attack or fake profile

Identity clone attacks

Inference attacks

Information leakage

Location leakage

Surveillance

Insider threat

Multimedia threats

Threats targeting children

Harmful or aggressive content

Scams

Fake friends entering the chat room

What Techniques and Solutions are Used to Secure Online Social Networks, and What are Their Limitations?

Operator solutions

Authentication mechanism

  1. Hash-based authentication protocol: A hash-based authentication protocol is suited for resource-limited devices and needs less computational cost.

  2. Proxy-based protocol: This protocol is used for exchanging information between users and is based on asymmetric encryption.

  3. Certificate-based protocol: Certificate-Based protocol assures non-repudiation of transactions by signatures (Facebook Immune System , FIS).

Security and privacy setting

Report users

Internal protection mechanism

Security solutions

Watermarking.
Co-ownership.
Steganalysis.
Digital oblivion.
Storage encryption.
Metadata removal and security.
Intrusion detection.

Commercial solutions

Internet security solution

AVG privacy fix

FB phishing protector

Norton safe web

McAfee social protection

MyPermission

NoScript security suite

Privacy scanner for facebook

Defensio

ZoneAlarm privacy scan

Net Nanny

Minor Monitor

Academic solutions

Improving privacy setting interfaces

Phishing detection

Spammer detection

Cloned profile detection

Sybil detection

Detection of information and location leakage

Malicious account detection solutions

Crowdsourcing.
Graph-based.
Trust propagation.
Graph clustering.
Graph Metric and properties.
Machine learning.
Supervised learning.
  1. Bayes theorem: In Bayes’ theorem, the probability of a hypothesis is described under given conditions. This theorem shows how much proof influences the probability that a given hypothesis is correct. In various domains, the Bayes theorem has an application that starts from topic modeling to spam filtering in the social network. On top of the Bayes Theorem, Bayesian Network and naiveBayes are built to detect malicious accounts and URLs in social networks and perform well. Hypothesis H is given, and evidence E, Bayes’ theorem declares that the relationship between the probability of the hypothesis before obtaining the P(H) evidence and the probability of the hypothesis after obtaining the evidence P(H\E) is P(H|E)=P(E|H).P(H)P(E)

  2. Meta based: In supervised learning, meta-based classifier is used to enhance the generalized ability of learned models. Based on the nature of the data set, a meta-based classifier is used to predict the classifier that is good for a given task. The meta-based classifier uses other classification algorithms and does not use its own to perform the task. Hence, this classifier is used to help the users for choosing an algorithm suitable for their given problem.

  3. Support vector machine: A support vector machine (SVM) is used to detect malicious accounts, conserving high-performance accuracy and deducting the errors. This machine is used to analyze the data and also detect patterns by using label samples. At AT&T Bell Laboratories, the support vector machine evolved, which can be used for classification and regression problems. By explaining a separating plane, a support vector machine is used to isolate the boundary among different classes in the dataset called a hyperplane. SVM uses kernel functions of nonlinearly separable problems to obtain optimal separating hyperplane.

  4. Neural networks: In speech processing, image processing, pattern recognition, and disease diagnoses, in all these application domains, the neural network has been used. Because the neural network has a high computational requirement, social networks found little application in detecting malicious accounts. The neural network includes a multilayer perceptron (MLP). MLP contains activation units, mentioned as artificial neurons and weights, and multilayer perceptron is the class of free-forward artificial neural networks. By involving multiple layers, multilayer perceptron improved the standard linear perceptron like input, hidden, and output layers that are used to solve linear and nonlinear classification problems. The algorithm is used to map the input data to the proper output.

  5. Tree-based: An algorithm uses the ability of a decision tree in which the classifier is trained with the structure of a tree. The test of an attribute value shows the node on the tree, and the test result is represented by the branch. Random forest and J48 (C4.5) are decision tree algorithms identifying phishing attacks and spam on online social networks. C4.5 is a decision tree algorithm developed by Ross Quinlan. C4.5 is an extension of the earlier Quinlan ID3 algorithm. C4.5-generated decision trees can be used for classification. On C4.5, J48 is based, and this algorithm is the extension of Iterative Dichotomiser 3 (ID3). ID3 is the algorithm used to make a decision tree from the dataset and is typically used in natural language processing domains and machine learning. To select the most suitable attribute at each node of the tree, C4.5 uses the information gain. That attribute shows the best candidate used to decide on the splitting of the tree. By creating diverse decision trees applying random feature selection and bagging approach at training time, Random Forest produces an ensemble of the classifier. The decision tree has two types of nodes, the first is the leaf node labeled as a class, and the second is the interior node related to the feature.

Unsupervised learning.
  1. Hierarchical: Using a tree structure, Hierarchical clustering (HC) groups the data over a variety of scales. The tree is a multilevel hierarchy to obtain clusters at the next level; clusters present at one level are merged or split. Hierarchical clustering is either bottom-up called agglomerative or top-down called divisive. By using the bottom-up approach, agglomerative clustering builds hierarchy, presuming that each instance originally forms its cluster. The algorithm then continually merges the pairs of clusters while one moves up the tree. The top-down divisive approach worked contrarily and presumed that each instance remains in the beginning in one cluster. The algorithm constantly sorts the cluster while it traverses down the tree.

  2. Partitional: A cluster is defined when each set of instances are split in such a way that there remains no overlapping. K-means algorithm is one example of a partitional clustering method with various application areas. K-means, a heuristic clustering algorithm, clusters the dataset into user-defined K clusters by reducing the amount of squared distance in each cluster. Using the K-means algorithms, it is required to compute the distance between a point to its centroid.

  3. PCA-based: For spotting patterns in high dimensional data, principal component analysis is used as a statistical tool. The principal component analysis (PCA) is used for detecting the variations in a dataset. It is a principled candidate for detecting destructive behavior in online social networks.

  4. Stream-based: The stream-based approach is stimulated by the issuance of the stream clustering algorithm used to separate malicious accounts from legitimate accounts. Stream and StreamKM++ are the two stream-based clustering algorithms that are used to identify malicious accounts on Twitter. Stream, a clustering algorithm, expands the traditional batch learning that is the DBSCAN algorithm by explaining the core-micro-objects instead of the core objects concept used in DBSCAN. StreamKM++ algorithm is developed because the K-means algorithm needs a predefined number of clusters and a random initial centroid selection.

  5. Pairwise similarity: The pairwise similarity detect malicious activities by comparing two accounts. This method is used to detect anomalies in the online social network. If the user’s account is compromised, study the legitimate user’s behavior history, which is used for a particular period. Investigate that the clickstream activities utilized the user’s extroversive and introversive social behavioral pattern to form the effective behavioral model. Euclidean distance is applied to determine the difference between the two profiles. Assume that both profiles are P and Q, consisting of extroversive and introversive feature vectors. Suppose A =(a1, a2, a3,…an) and B = (b1, b2, b3,…bn) indicate the feature vector for P and Q. Euclidean distance among A and B is evaluated as shown in Eq. (a) and Eq. (b) that indicates the computation of Euclidean norm among profiles P and Q on the Euclidean distance for all feature vector. If the value of the distance is higher, the more significant two profiles differ. ‘m’ in Eq. (b) indicates the number of the feature vector. There are eight extroversive and introversive behavior considered.

E(A,B)=nk=1(akbk)2 Dist(P,Q)=mf=1(Et)2
Semi-supervised learning.

Hybrid real-time social networks protector (HRSP) system model

HRSP URL classification model.
HRSP content classification model.

Future Directions

Conclusion

Additional Information and Declarations

Competing Interests

Adnan Abid is an Academic Editor for PeerJ Computer Science.

Author Contributions

Naeem A. Nawaz analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Kashif Ishaq conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Uzma Farooq performed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Amna Khalil conceived and designed the experiments, performed the experiments, performed the computation work, prepared figures and/or tables, and approved the final draft.

Saim Rasheed performed the experiments, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Adnan Abid conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Fadhilah Rosdi analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

This is a literature review.

Funding

The authors received no funding for this work.

8 Citations 2,543 Views 353 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more