Social Hacking

Rapid Attack Detection, Isolation and Characterization (RADICS)

J. Reeves, S. Bratus, S.W. Smith, P. Anantharaman, M. Millian (students) with SRI, NYU, NARF, EPRI

Objectives

  • Recover from attacks against critical infrastructure
  • Identify the methods and behaviors of malware present in a compromised substation
  • Reconfigure and harden the substation against future attacks

Key Science Methods & Advances

  • Defined secure subsets of popular ICS protocols (DNP3, Modbus, IEC 61850, etc.) using our LangSec principles
  • Implemented specialized input parsers based on these subsets to protect devices from malformed and/or malicious packets
  • Identified and cataloged data/configuration changes made by compromised devices
  • Investigated ways to modify packets to signify when devices are clean and detect if they are re-compromised.
  • Incorporated our design into TIGR, a custom-built appliance that can be plugged into a compromised substation to gather info and begin recovery efforts.

Results & Impacts

  • In sponsor exercises, TIGR was able to identify the devices, protocols, and malware found inside an example substation.
  • Standalone TIGR prototypes are currently under construction.

FORCASTING ANDROID BAD BEHAVIOR

V.S. Subrahmanian (Dartmouth) with S. Li (University of Maryland), S. Kumar (Standford), T. Dumitras (University of Maryland)

Objectives

  • “Bad” apps are ones that exhibit bad behaviour, but are not necessarily malware.
  • Can we characterize how the behavior of malicious Android apps evolves over time?
  • Is it possible to use consumer-generated reviews of Android apps to predict which apps should be short-listed for being considered malicious and which ones should not?

Key Science Methods & Advances

  • Crawled 100K apps from the Google Play Store during 2 separate one month windows.
  • Developed 3 novel types of features (review based, developer based, sibling based) associated with Android apps in the Google Play Store.
  • Developed keywords alleging malicious behavior and defined a “flagging” review to be one that both gives low star rating and alleges malicious behavior.
  • Developed novel features based on Weighted Moving Averages (WMA) and the derivative(s) of WMA.
  • Developed FAABB (Forecasting Android Adversarial Bad Behavior) machine learning model to predict malicious apps.

Results & Impacts

  • FAABB predicts bad apps accurately: AUC of 0.86, false positive rate of 0.10 and true positive rate of 0.67.
  • Bad apps often start “good gain followers, and then turn bad. 77% of bad apps turn bad once, but some do so many times. Prob of turning bad, given the app turned bad once (twice) before is 23% (resp. 36%).
  • Games, tools (e.g. power management, disk use), and entertainment categories are most heavily linked to bad apps. 68% of bad apps are “Potentially Unwanted Programs”.
  • Reported 123 active apps as “bad” to Google: 35 were not on the Play Store after our report.

Predictive Modeling of Android Spyware

V.S. Subrahmanian (Dartmouth) with Fabio Pierazzi (UC-London), G. Mezzour (University of Rabat), M. Colajanni (University of Modena)

Objectives

  • Can we accurately predict whether an Android app is spyware or not?
  • Identify factors that:
    • Distinguish Android spyware from goodware and from other types of malware.
    • Characterize the behaviour of specific families of Android spyware (e.g. AceCard, HeHe, Pincer, UAPush, USBCleaver).

Key Science Methods & Advances

  • Developed a data set with 15K Android APKs in all (5K each of spyware, goodware, and other malware). Samples are recent, drawn from 2016-2017.
  • Built novel static and dynamic features for each APK.
  • Developed machine learning methods to separate spyware from goodware and other malware.
  • Developed a detailed study of the behaviors of 5 major recent Android spyware families: AceCard, HeHe, Pincer, UAPush, USBCleaver.

Results & Impacts

  • Separating spyware from goodware achieves 97% AUC and F1-Score, 0.95% false positive rate.
  • Separating spyware from other malware achieves 96% AUC and F1-Score, and a 2.95% false positive rate.
  • Key factors distinguishing spyware:
    • 50% of spyware wish to write SMSs, while almost no goodware do so
    • Tend to have fewer components (providers, activities, intents, receivers)
    • Tend to access finer grained permissions (e.g. record audio, request fine location
  • Other malware tend wish to start admin services, spyware usually does not.

Predictive Modeling of Android Banking Trojans

V.S. Subrahmanian (Dartmouth) with C. Bai, Q. Han (Dartmouth), Fabio Pierazzi (UC-London)m G, Mezzour (University of Rabat)

Objectives

  • Can we accurately predict whether an Android app is a banking trojan or not? How does this predictive ability change as the adversary knows more and more about our training set?
  • Identify factors that:
    • Distinguish Android banking trojans from goodware and from other types of malware.
    • Characterize the behaviour of specific families of Android Banking Trojans (ABTs).

Key Science Methods & Advances

  • Developed a data set with 15K Android APKs in all (5K each of spyware, goodware, and other malware).  Samples are recent, drawn from 2016-2017.
  • Developed a novel set of features based on the new notions of suspicion graphs and suspicion rank.
  • Developed machine learning methods to separate Android Banking Trojans from goodware and other malware.
  • Developed a detailed study of the behaviors of 5 major recent Android Banking Trojan families: BankBot, AsaCub, Hqwar, Marcher, Zbot.
  • Developed Android Banking Trojan detector.

Results & Impacts

  • Separating ABTs from goodware achieves 99.9% AUC and F1-Score, 0.95% false positive rate.
  • Discovered and reported (to Google), a new Android Banking Trojan before any of the 63 anti-virus vendors on VirusTotal. Also detected an ABT after just one AV vendor on VirusTotal discovered them.
  • ABT detector is highly resilient to:
    • Adversary knowing large parts of the training set and
    • Diverse types of adversarial obfuscation

EC2 Ensemble Clustering & Classification for Predicting Android Malware Families

V.S. Subrahmanian (Dartmouth), Tanmoy Chakraborty (IIIT-Delhi), Fabio Pierazzi (Royal Holloway University of London)

Objectives

  • As hackers adapt malware to evade detection, anti virus companies must adapt signatures to detect numerous variants of the same malware.
  • At the same time, A-V companies must identify new malware strains as they emerge.
  • Technical problem statement: Given a malware sample S, malware families F1,…,Fk,F_1,…,F_k, identify either the family FiF_i to which S belongs or report that it is a new malware family.
  • Challenge: Highly skewed distribution of sizes of families.

Key Science Methods & Advances

  • Using a set of static and dynamic features, we showed that classification methods work well on large families, but not on small ones.
  • But clustering works well on small families.
  • The EC2 algorithm is a novel combination of clustering and classification that exploits these strengths to accurately classify both small and large families.
  • Our best algorithms exhibit AUCs exceeding 90%.

Results & Impacts

  • Highly accurate in predicting the family to which a malware sample belongs.
  • Problem and/or results discussed with Symantec and Google.

The Network Structure of Echo Chambers

Adam M. Kleinbaum, Carolyn Parkinson (UCLA), Thalia Wheatley, Balazs Kovacs (Yale)

Objectives

  • In this work, we show that for some time-varying node attributes, networks evolve through two fundamental processes:
    • Assortative selection into friendship
    • Attribute convergence among friends
  • The consequence of these dual effects is that networks become increasingly fragmented over time, creating “echo chambers” in which people are surrounded by similar others, with implications for people, firms, and society

Key Science Methods & Advances

  • We are working to show evidence of both selection and convergence effects in:
    • We experience the world (i.e., neural response to a common stimulus, measured via fMRI)
    • How we talk about the world (i.e., linguistic similarity, measured via longitudinal computational linguistic analysis of textual corpi for 2 populations)
  • How we also run a computational simulation, modeling the consequences of these effects for overall network topology

Results & Impacts

  • We hope to elucidate the fundamental social network mechanisms by which social echo chambers emerge
  • In doing so, we hope to provide actionable guidance on how leaders can actively diversify their networks to avoid information isolation

Sockpuppet Detection in Online Platforms

V.S. Subrahmanian (Dartmouth) wtih S. Kumar, J. Cheng, and J. Leskovec (Standford)

Objectives

  • Sockpuppet accounts refer to multiple social accounts operated by the same person/group. Such accounts are frequently used for illicit influence operations, trolling, and other malicious behaviour.

Goals

  • Understand behavioural differences between socks and non-sock accounts.
  • Predict if a pair of accounts are socks.

Key Science Methods & Advances

  • Developed a data set from Disqus (which powers online news discussion forums) with 2.9M users, 2.1M articles, 62M posts covering CNN, Breitbart News etc.
  • Used IP address info to identify sock ground truth.
  • Showed that socks agree more than non-socks, write short sentences, use first person verbs more.
  • Showed that socks that support each other are more common than socks that tend to contradict each other.
  • Showed that approx. 2/3 of sock pairs have deceptive IDs, while 1/3 have similar IDs, suggesting most socks are meant for deceptive/illicit purposes. 
  • Applied machine learning methods for sock prediction and to check if a pair of accounts are socks.

Results & Impacts

  • We are able to predict if a pair of accounts constitute a sock pair with high accuracy: AUC of 0.91.
  • Features linked to account activities are the most important.
  • Predicting if an account is a sock is much harder: 0.68 AUC.
  • Work briefed to Twitter, Reddit, Wikipedia. We are told that Reddit and Wikipedia are in initial stages of incorporating our sockpuppet results into sock detectors.

Predicting Twitter Bots in Elections

V.S. Subrahmanian (Dartmouth) with J. Dickerson and V. Kagan (Sentimetrix)

Objectives

  • We realized as early as 2013 that bots will play a role in manipulating elections.
  • Can we identify features of Twitter accounts that distinguish bots from humans?
  • How well can we automatically predict if a Twitter handle represents a human or a bot?

Key Science Methods & Advances

  • Tracked 2014 Indian election over a 10-month period (July 2013-May 2014): 17M users, 25M follower-followee edges, 45M tweets (after trimming).
  • Created ground truth for a subset of the above accounts using Amazon Mechanical Turk.
  • Defined novel sentiment based features – e.g. sentiment flip flop score, pos/negative sentiment strength, agreement ranks, dissonance ranks.
  • Showed that bots tend to flip flop less, tend to be more strongly positive/negative when they are positive/negative, tend to disagree less with the rest of Twitter accounts.

Results & Impacts

  • Showed that using sentiment related features significantly improved predictive accuracy (AUC = 0.73) compared to not using them (AUC = 0.65).
  • Showed that bots try harder to “fit in” with the rest of the surrounding network than humans, suggesting that this might be a strategy to stay covert.
  • Used the results to win DARPA’s 2015 Twitter Bot Detection Challenge under the SMISC program.

The DARPA Twitter Bot Challenge

V.S. Subrahmanian (Dartmouth) with S. Durst, V. Kagan, A. Stevens (Sentimetrix) and others

Objectives

  • Can we predict which accounts on Twitter are covert bots seeking to influence opinion on a specific topic t?
  • How can influence bots on a new topic be predicted with no training data/ground truth? How quickly can these bots be discovered?
  • How can bots generated by multiple independent teams be discovered?

Key Science Methods & Advances

  • DARPA’s challenge was a live 4 week contest to identify pro-vaccination bots created by 2 separate teams (this was known only after the challenge).
  • Developed a generic method to identify influence bots on almost any given topic t.
  • Used a human in the loop process with an ensemble:
    • clustering to identify clusters of accounts
    • anomaly detection to identify anomalous accounts. Steps (i) and (ii) enabled us to build a ground-truthed data step.
    • used supervised classification to guess all remaining bots

Results & Impacts

  • Led the team that won the DARPA SMISC Twitter Bot Challenge, beating the 2nd best team in discovering all bots in 6+ days.
  • Predicted all 39 influence bots in 40 guesses – 97.5% precision and 100% recall.
  • Identified the key features that distinguish influence bots from non-influence bots.
  • Later used for monitoring Brexit campaign and elections in Kenya and Guatemala

MALTP: Parallel Prediction of Malicious Tweets

V.S. Subrahmanian (Dartmouth), E. Lancaster (University of Maryland), and T. Chakraborty (IIIT-Delhi)

Objectives

  • Can we predict which accounts on Twitter are covert bots seeking to influence opinion on a specific topic?
  • How can influence bots on a new topic be predicted with no training data/ground truth? How quickly can these bots be discovered?
  • How can bots generated by multiple independent teams be discovered?

Key Science Methods & Advances

  • Developed a novel concept of a tweet grap
  • Showed how the notion of a meta-path (previously defined by other researchers) can be used to define a suite of meta-path based features.
  • Developed novel set of sentimen, multimodal, and Alexa rank (popularity) based features.
  • MALTP adapts an existing collective classification algorithm (due to others) to predict whether a tweet is malicious or not.
  • Ran extensive experiments showing that MALTP outperforms past work on two substantive datasets.

Results & Impacts

  • Meta-path based features using MALTP algotithm and existing classifiers generate outstanding results, exceeding AUCs of 0.9 on both datasets
  • We find that the number of malicious nodes in metapaths of length 3 (url- user - url) paths of length 3, sentiment scores, and Alexa rank are jointly excellent predictors of malicious tweets.
  • MALTP, being parallel, also provides good speedups.

Forecasting Android Bad Behavior

V.S. Subrahmanian (Dartmouth) with S. Li (University of Maryland), S. Kumar (Stanford), T. Dumitras (University of Maryland)

Objectives

  • Bad” apps are ones that exhibit bad behaviour, but are not necessarily malware.
  • Can we characterize how the behavior of malicious Android apps evolves over time?
  • Is it possible to use consumer-generated reviews of Android apps to predict which apps should be short-listed for being considered malicious and which ones should not?

Key Science Methods & Advances

  • Crawled 100K apps from the Google Play Store during 2 separate one month windows.
  • Developed 3 novel types of features (review based, developer based, sibling based) associated with Android apps in the Google Play Store.
  • Developed keywords alleging malicious behavior and defined a “flagging” review to be one that both gives low star rating and alleges malicious behavior.
  • Developed novel features based on Weighted Moving Averages (WMA) and the derivative(s) of WMA.
  • Developed FAABB (Forecasting Android Adversarial Bad Behavior) machine learning model to predict malicious apps.

Results & Impacts

  • FAABB predicts bad apps accurately: AUC of 0.86, false positive rate of 0.10 and true positive rate of 0.67.
  • Bad apps often start “good gain followers, and then turn bad. 77% of bad apps turn bad once, but some do so many times. Prob of turning bad, given the app turned bad once (twice) before is 23% (resp. 36%).
  • Games, tools (e.g. power management, disk use), and entertainment categories are most heavily linked to bad apps. 68% of bad apps are “Potentially Unwanted Programs”.
  • Reported 123 active apps as “bad” to Google: 35 were not on the Play Store after our report.

Diffusion of Rumors Online

Soroush Vosoughi with Deb Roy-Mit and Sinan Aral-Mit

Objectives

  • Advancements in AI and the rapid rise in the use of social media have greatly amplified the creation and spread of fake content. The spread of fake content can have real, tangible impact on our democratic institutions.
  • Being able to map and analyse this impact is of great importance.
  • Problem statement: Collect, quantify, measure and explain the difference between the diffusion of true and false rumor cascades online.
  • Challenge: Identifying and collecting rumor cascades online.

Key Science Methods & Advances

  • Using NLP and other data-mining methods, we were able to collect hundreds of thousands of contested news stories spanning a decade.
  • We showed that false news travels farther, faster, deeper and more broadly than the truth online in all categories.
  • We also showed that humans, not robots, are mostly responsible for this.
  • The data support a “novelty hypothesis.” False news was more novel than the truth.

Results & Impacts

  • Largest and most comprehensive study of the diffusion of rumors online to date.
  • Published in the March 9th issue of Science as their cover story.
  • Invited to address NIH, WHO and other organizations about the impact of rumors.

Automatic Verification of Rumors Online

Soroush Vosoughi with Deb Roy-Mit

Objectives

  • The spread of malicious or accidental misinformation in social media, especially in time-sensitive situations such as real-world emergencies, can have harmful effects on individuals and society.
  • The ability to track rumors and predict their veracity can help minimize the impact of false information.
  • Problem statement: Given a story that is spreading online, predict its veracity before trusted sources.
  • Challenge: Have to process and analyse large volume of noisy data in real-time.

Key Science Methods & Advances

  • We identified a set of linguistic, user-based, and propagation features that are predictive of the veracity of rumors.
  • We showed that temporal models (i.e., HMMS and DTW) are better at veracity prediction than more commonly used “static” models.
  • Our best model is able to correctly verify 75% of the rumors (in a balanced dataset) before any trusted source.

Results & Impacts

  • One of the first near real-time veracity prediction algorithms.
  • Widespread attention from various sectors: emergency services, financial, institutions, journalists, health departments and the military.

Pareto-Optimal Game Models of Cyber-Defense

V.S. Subrahmanian (Dartmouth) with S. Jajodia (George Mason), A. Pugliese and A. Rullo (U. Calabria), E. Serra (Boise State)

Objectives

  • Cyber-defense involves anticipating attacker goals.
  • As the attacker moves through a victim’s network, he discovers vulnerabilities he can exploit.
  • How should the defender trade-off two actions he can take:
  • What software to patch
  • What software to deactivate

Key Science Methods & Advances

  • Proposed the novel concept of system vulnerability dependency graph (SVDG). Adversary learns the SVDG of the defender over time. Attacker strategy is a set of vulnerabilities to exploit.
  • We formalize these competing goals as a Pareto optimization problem. 
  • Because the problem is huge, we develop a mix of integer linear programming and greedy algorithms.
  • Defender has two major goals:
    • Minimize expected damage caused by attacker
    • Minimize anger of system users when software is uninstalled

Results & Impacts

  • Deciding whether there is a preferred attacker strategy is NP-complete
  • Deciding whether a point is a Pareto point is in DP2 and Σp2-hard
  • Develop a fast algorithm to quickly identify which patches to apply and which vulnerable software to deactivate.

Cyber Security Education

S.W. Smith, W. Nisen, S. Bratus with M. Locasto (SRI), A. Goldberg (Champlain)

Objectives

  • Bring security education to diverse audiences
  • Inside and outside Computer Science, Dartmouth, and formal academic programs

Key Science Methods & Advances

  • Developed standard courses on "Security and Privacy" and "Applied Cryptography"
  • Specialized courses on IoT Security, Usable Authentication, Cognitive Bias in Security, Reverse Engineering, others
  • Students, faculty, and IT staff working together on campus
  • security issues
  • Themed speaker series
  • NSA Center of Academic Excellence
  • Long-running summer programs:
    • SISMAT: for students at small colleges lacking hands-on research arm
    • Securing the E-Campus: for academic IT staff
    • GenCyber: for high school students