Center Project Reports

Privacy, a Machine Learning Perspective R. Nokhbeh Zaeem and K. Suzanne Barber. UT CID Report#: 20-07, May 2020.

Abstract
Show AbstractPrivacy policies have become the de facto way of communicating how a company or organization–and particularly its website–collects, shares, and uses personally identifiable information(PII). These privacy policies outline how the organization handles, shares, discloses, and uses PII of its consumers or clients. PII is defined as “any information relating to an identified or identifiable natural person”2 such as name, email address, and credit card number.

Access Publication: Download PDF of Report

Identity Theft and Fraud R. Nohkbeh Zaeem and K. Suzanne Barber.  UT CID Report#: 20-06, May 2020. 

Abstract
Show Abstract

Access Publication: Download PDF of Report

Comparing Privacy Policies of Government Agencies and Companies--a Study Using Privacy Policy Analysis Tools R. Nokhbeh Zaeem, K. Suzanne Barber, UT CID Report #20-04, March, 2020.

Abstract
Show Abstract

Companies and government agencies are subject to distinct regulations that govern their collection and use of personally identifiable information. Yet, do privacy policies of companies and government agencies reflect this distinction? In this paper, we take advantage of two of the most recent automatic privacy policy analysis tools, Polisis and PrivacyCheck, and five corpora of over 800 privacy policies to answer this question. We discover that government agencies are considerably better in protecting (or not collecting for that matter) sensitive financial information, social security numbers, and user location. On the other hand, many of them fail to directly address children’s privacy or describe security measures taken to protect user data. Furthermore, we observe the positive effect of European regulation, such as the GDPR, on European government agencies. E.U government agencies perform well, with respect to notifying users of policy change, giving users the right to edit/delete their data, and limiting data retention— all of which are GDPR tenets. Our work sheds light on the actual effect of regulating privacy policies, paves the way for lawmakers to improve such regulation, and assists the research community in enhancing the usability of privacy policies through studying their trends.

Access Publication: Download PDF of Report

The Effect of the GDPR on Privacy Policies- Recent Progress and Future Promise
R. Nokhbeh Zaeem, K. Suzanne Barber, UT CID Report #20-03, March, 2020.

Abstract
Show Abstract

The General Data Protection Regulation (GDPR) is considered by some to be the most important change in data privacy regulation in 20 years. Effective May 2018, the European Union GDPR privacy law applies to any organization that collects and processes the personal information of EU citizens within or outside the EU. In this work, we seek to quantify the progress the GDPR has made in improving privacy policies around the globe. We leverage our data mining tool, PrivacyCheck, to automatically compare three corpora (totaling 550) of privacy policies, pre- and post-GDPR. In addition, to evaluate the current level of compliance with the GDPR around the globe, we manually studied the policies within two corpora (450 policies). We find that the GDPR has made progress in protecting user data, but more progress is necessary—particularly in the area of giving users the right to edit and delete their information—to entirely fulfill the GDPR’s promise. We also observe that the GDPR encourages sharing user data with law enforcement, and, as a result, many policies have facilitated such sharing after the GDPR. Finally, we see that, when there is non-compliance with the GDPR, it is often in the form of failing to explicitly indicate compliance, showing an organization’s lack of transparency and disclosure regarding their processing and protection of personal information. If Personally Identifiable Information (PII) is the “currency of the Internet”, these findings mark continued alarm regarding an individual’s agency to protect and secure their PII assets.

Access Publication: Download PDF of Report

Is Your Phone You? How Privacy Policies of Mobile Apps Allow the Use of Your Personally Identifiable Information, K. Chang, R. Nokhbeh Zaeem K. Suzanne Barber, UT CID Report #20-02, March, 2020.

Abstract
Show Abstract

People continue to store their sensitive information in their smart-phone applications, knowingly or more often unknowingly. Users seldom read an app’s privacy policy to see how their information is being collected, used, and shared. In this paper, using a reference list of over 600 Personally Identifiable Information (PII) attributes, we investigate the privacy policies of 100 popular health and fitness mobile applications in both Android and iOS app markets to find the set of personal information these apps collect, use and share. The reference list of PII was independently built from a longitudinal study at The University of Texas investigating thousands of identity theft and fraud cases where PII attributes and associated value and risks were empirically quantified. This research leverages the reference PII list to identify and analyze the value of personal information collected by the mobile apps and the risk of disclosing this information. We found that the set of PII collected by these mobile apps covers 35% of the entire reference set of PII and, due to dependencies between PII attributes, these mobile apps have a likelihood of indirectly impacting 70% of the reference PII if breached. For a specific app, we discovered the monetary loss could reach $1M if the set of sensitive data it collects is breached. We finally utilize Bayesian inference to measure risks of a set of PII gathered by apps: the probability that fraudsters can discover, impersonate and cause harm to the user by misusing only the PII the mobile apps collected.

Access Publication: Download PDF of Report

Identifying Real-World Credible Experts in the Financial Domain to Avoid Fake News, T. Huang, R. Nokhbeh Zaeem, K. Suzanne Barber, UT CID Report #20-01, March, 2020.

Abstract
Show Abstract

Establishing a solid mechanism for finding credible and trustworthy people in online social networks is an important first step to avoid useless, misleading or even malicious information. Social network users can hide their intention or fabricate their virtual personality to gain trust of others. There is a body of existing work studying trustworthiness of social media users and finding credible sources in specific target domains. However, most of the related work lack the connection between the credibility in the real-world and credibility on the Internet, which makes the formation of social media credibility and trustworthiness incomplete. In this paper, working in the financial domain, we identify attributes that can distinguish credible users on the Internet who are indeed trustworthy experts in the real-world. To ensure objectivity, we gather the list of credible financial experts from real-world financial authorities. By analyzing the distribution of attributes of social media users using the random forest classifier, we can find which attributes are related to real-world expertise, and which attributes have higher potential of being forged by malicious users.

Access Publication: Download PDF of Report

The Identity Ecosystem, R. Nokhbeh Zaeem, D. Liau, S. Budalakoti, K.Suzanne Barber, UT CID Report #19-08, July, 2019.

Abstract
Show Abstract

As identity theft, fraud, and abuse continue to grow in terms of both scope and impact, individuals and organizations alike demand a deeper understanding of their vulnerabilities, risks, and resulting consequences. To address this demand, we present the Identity Ecosystem, a novel Bayesian model of Personal, Organizational, and Device Identifiable Information (PII/OII/DII) attributes and their relationships. We populate the Identity Ecosystem model with real-world data from approximately 6,000 reported identity theft and fraud cases. We leverage this populated model to provide unique, research-based insights into the variety of PII/OII/DII, their properties, and how they interact. Informed by the real-world data, we investigate the ecosystem of identifiable information in which criminals compromise PII/OII/DII and misuse them. We built the Identity Ecosystem into an online tool that answers sophisticated queries. As an example query, it predicts future risk and losses of losing a given set of PII and the liability associated with its fraudulent use. In the Bayesian model, each PII (e.g., Social Security Number) or OII (e.g., Employer Identification Number) or DII (e.g., IP Address) is modeled as a graph node. Probabilistic relationships between these attributes are modeled as graph edges. We leverage this Bayesian Belief Network to approximate the posterior probabilities of the model, assuming the given set of PII attributes is compromised, to answer the query. Hence, the Identity Ecosystem uncovers the identity attributes most vulnerable to theft, assesses their importance, and determines not only the PII but also the OII and DII most frequently targeted by thieves and fraudsters. The insights the Identity Ecosystem provides are significant, valuable, and sometimes very nonintuitive.

Access Publication: Download PDF of Report

2019 ITAP Report, J. Zaiss, R. Anderson. R. Nokhbeh Zaeem, K. Suzanne Barber, UT CID Report #19-07, July, 2019.

Abstract
Show Abstract

The Identity Threat Assessment and Prediction (ITAP) model and analytics provide unique, research-based insights into the habits and methods associated with identity threats, and into the various factors that contribute to higher levels of risk for the compromise and abuse of personally identifiable information (PII).  ITAP uncovers the identity attributes most vulnerable to compromise, assesses their importance, and identifies the types of PII most frequently targeted by thieves and fraudsters.

The analytical repository of ITAP offers valuable understanding of the actors, organizations, and devices involved in identity threats -- across multiple domains, including financial services, consumer services, healthcare, education, law enforcement, communications, and government.  ITAP characterizes the current identity threat landscape and aims to predict future identity threats.  Using a wealth of data and analytics, ITAP delivers concrete guidance for consumers, businesses, and government agencies on how to avoid or lessen the impact of identity theft, fraud, and abuse. In sum, ITAP delivers actionable knowledge grounded in analyses of past threats and countermeasures, current threats and solutions, and evidence-driven forecasts.

During 2018 and into 2019, the ITAP team focused primarily on adding international (i.e. non-US) incidents to the model.  There are now about 900 international incidents captured in ITAP, making up 16% of the total number.  Of the international cases, 95% were localized to a given country, while the remaining 5% were multi-national (or even worldwide) in scope.  This recent focus has expanded the breadth of the project, and enabled us to implement new analytics based on international incidents, including some that compare the effects of PII-compromise across different countries.  Unlike in previous annual ITAP reports, all of the charts in this 2019 ITAP Report are based purely on the international cases.  

Access Publication: Download PDF of Report

An Assessment of Blockchain Identity Solutions: Minimizing Risk and Liability of Authentication, R. Rana, R. Nokhbeh Zaeem, K. Suzanne Barber, UT CID Report #19-06, July, 2019.

Abstract
Show Abstract

Personally Identifiable Information (PII) is often used to perform authentication and acts as a gateway to personal and organizational information. One weak link in the architecture of identity management services is sufficient to cause exposure and risk identity. Recently, we have witnessed a shift in identity management solutions with the growth of blockchain. Blockchain—the decentralized ledger system— provides a unique answer addressing security and privacy with its embedded immutability. In a blockchain-based identity solution, the user is given the control of his/her identity by storing personal information on his/her device and having the choice of identity verification document used later to create blockchain attestations. Yet, the blockchain technology alone is not enough to produce a better identity solution. The user cannot make informed decisions as to which identity verification document to choose if he/she is not presented with tangible guidelines. In the absence of scientifically created practical guidelines, these solutions and the choices they offer may become overwhelming and even defeat the purpose of providing a more secure identity solution.

Access Publication: Download PDF of Report

Evaluation Framework for Future Privacy Protection System: A Dynamic Identity Ecosystem Approach, D. Liau, R. Nokhbeh Zaeem, K. Suzanne Barber, UT CID Report #19-05, July, 2019.

Abstract
Show Abstract

Today, more than ever, everyday authentication processes involve combinations of Personally Identifiable Information (PII) to verify a person’s identity. Meanwhile the number of identity thefts is increasing dramatically compared to the past decades. As a response to the phenomenon, numerous of privacy protection regulations, management frameworks and companies thrives luxuriantly in the industry as well. In this paper, we leverage previous work in the Identity Ecosystem, a Bayesian network mathematical representation of a person’s identity, to create a framework to evaluate identity protection systems. After reviewing the Identity Ecosystem, we populate a dynamic version of it and propose a protection game for a person’s PII given that the owner and the attacker both gain some level of control over the status of other PIIs within the dynamic Identity Ecosystem. We first present the game concept as a single round game with complete information. Then we formulate a stochastic shortest path game between the owner and the attacker on the dynamic Identity Ecosystem. The attacker is trying to expose the target PII as soon as possible while the owner is trying to protect the target PII from being exposed. We present a policy iteration algorithm to solve the optimal policy for the game and discuss its convergence. Finally, an evaluation and comparison of identity protection strategies is provided given that an optimal policy is used against different protection policies. This study is aimed to understand the evolutionary process of identity theft and provide a framework for evaluating different identity protection strategies.

Access Publication: Download PDF

Get Center for Identity Updates