liu.seSearch for publications in DiVA
Change search
Refine search result
1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Mohammadinodooshan, Alireza
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Data-driven Contributions to Understanding User Engagement Dynamics on Social Media2024Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Social media platforms have fundamentally transformed the way information is produced, distributed, and consumed. News digestion and dissemination are not an exception. A recent study by the Pew Research Center highlights that 53% of Twitter (renamed X) users, alongside notable percentages on Facebook (43%), Reddit (38%), and Instagram (34%), rely on these platforms for their daily news. Unfortunately, not all news is reliable and unbiased, which poses a significant societal challenge. Beyond news, content posted by influencers can also play an important role in shaping opinions and behaviors.

    Indeed, how users engage with different classes of content (including unreliable content) on social media can amplify their visibility and shape public perceptions and debates. Recognizing this, prior research has studied different aspects of user engagement dynamics with varying classes of content. However, several unexplored dimensions remain. To better understand these dynamics, this thesis addresses part of this research gap through eight comprehensive studies across four key dimensions, where we place particular focus on news content.

    The first dimension of this thesis presents a large-scale analysis of users' interactions with news publishers on Twitter. This analysis provides a fine-grained understanding of engagement patterns with various classes of publishers, with key findings indicating elevated engagement rates among unreliable news publishers. The second dimension examines the dynamics of interaction patterns between public and private (less public) sharing of news articles on Facebook. This dimension highlights deeper user engagement in private contexts compared to the public sphere, with both spheres showing the highest interaction levels with highly unreliable content. The third dimension investigates the drivers of popularity among news tweets to understand what makes some tweets more/less successful in gaining user engagement. For instance, this analysis reveals the negative impact of analytic language on user engagement, with the biggest engagement declines observed among unreliable publishers. Finally, the thesis emphasizes the importance of temporal dynamics in user engagement. For example, exploring the temporal user engagement with different news classes over time, we observe a positive correlation between the reliability of a post and the early interactions it receives on Facebook. While the thesis quantitatively assesses the effects of reliability across all dimensions, it also places additional focus on the role of bias in the observed patterns.

    These and other insights presented in the thesis offer actionable insights that can benefit multiple stakeholders, providing policymakers and content moderators with a comprehensive perspective for addressing the spread of problematic content. Moreover, platform designers can leverage the insights to build features that promote healthy online communities, while news outlets can use them to tailor content strategies based on target audiences, and individual users can use them to make informed decisions. Although the thesis has inherent limitations, it deepens our current understanding of engagement dynamics to foster a more secure and trustworthy social media experience that remains engaging.

    List of papers
    1. A Clone-based Analysis of the Content-Agnostic Factors Driving News Article Popularity on Twitter
    Open this publication in new window or tab >>A Clone-based Analysis of the Content-Agnostic Factors Driving News Article Popularity on Twitter
    2023 (English)In: PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, ASSOC COMPUTING MACHINERY , 2023, p. 17-24Conference paper, Published paper (Refereed)
    Abstract [en]

    The significant impact of Twitter in news dissemination underscores the need to understand what drives tweet popularity. While the content of an article plays a role, several "content-agnostic" factors also influence tweet popularity. Previous studies have faced challenges in differentiating the effects of content-agnostic factors from content variations. To address this, the paper presents a comprehensive analysis of tweet popularity using a "clone-based" approach. The methodology involves identifying tweets linking the same or similar articles (clones) and studying the factors that make some tweets within clone sets more successful in attracting retweets. The analysis reveals insights into clone set characteristics, winners' success patterns, retweet dynamics over time, domain-based competition, and predictors of success. The findings shed light on the complex nature of popularity and success in social media, providing a deeper understanding of the content-agnostic factors that influence tweet popularity.

    Place, publisher, year, edition, pages
    ASSOC COMPUTING MACHINERY, 2023
    Series
    Proceedings of the IEEE-ACM International Conference on Advances in Social Networks Analysis and Mining, ISSN 2473-9928
    National Category
    Media Studies
    Identifiers
    urn:nbn:se:liu:diva-202956 (URN)10.1145/3625007.3627520 (DOI)001191293500003 ()9798400704093 (ISBN)
    Conference
    15th IEEE/ACM Annual International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Kusadasi, TURKEY, nov 06-09, 2023
    Available from: 2024-04-23 Created: 2024-04-23 Last updated: 2024-05-03
    2. Effects of Political Bias and Reliability on Temporal User Engagement with News Articles Shared on Facebook
    Open this publication in new window or tab >>Effects of Political Bias and Reliability on Temporal User Engagement with News Articles Shared on Facebook
    2023 (English)In: PASSIVE AND ACTIVE MEASUREMENT, PAM 2023, SPRINGER INTERNATIONAL PUBLISHING AG , 2023, Vol. 13882, p. 160-187Conference paper, Published paper (Refereed)
    Abstract [en]

    The reliability and political bias differ substantially between news articles published on the Internet. Recent research has examined how these two variables impact user engagement on Facebook, reflected by measures like the volume of shares, likes, and other interactions. However, most of this research is based on the ratings of publishers (not news articles), considers only bias or reliability (not combined), focuses on a limited set of user interactions, and ignores the users engagement dynamics over time. To address these shortcomings, this paper presents a temporal study of user interactions with a large set of labeled news articles capturing the temporal user engagement dynamics, bias, and reliability ratings of each news article. For the analysis, we use the public Facebook posts sharing these articles and all user interactions observed over time for those posts. Using a broad range of bias/reliability categories, we then study how the bias and reliability of news articles impact users engagement and how it changes as posts become older. Our findings show that the temporal interaction level is best captured when bias, reliability, time, and interaction type are evaluated jointly. We highlight many statistically significant disparities in the temporal engagement patterns (as seen across several interaction types) for different bias-reliability categories. The shared insights into engagement dynamics can benefit both publishers (to augment their temporal interaction prediction models) and moderators (to adjust efforts to post category and lifecycle stage).

    Place, publisher, year, edition, pages
    SPRINGER INTERNATIONAL PUBLISHING AG, 2023
    Series
    Lecture Notes in Computer Science, ISSN 0302-9743
    Keywords
    User interactions; Bias; Reliability; Temporal dynamics
    National Category
    Media and Communication Technology
    Identifiers
    urn:nbn:se:liu:diva-196957 (URN)10.1007/978-3-031-28486-1_8 (DOI)001004071500008 ()978-3-031-28485-4 (ISBN)978-3-031-28486-1 (ISBN)
    Conference
    24th Annual International Conference on Passive and Active Measurement (PAM), ELECTR NETWORK, mar 21-23, 2023
    Available from: 2023-08-29 Created: 2023-08-29 Last updated: 2024-05-03
    3. Temporal Dynamics of User Engagement on Instagram: A Comparative Analysis of Album, Photo, and Video Interactions
    Open this publication in new window or tab >>Temporal Dynamics of User Engagement on Instagram: A Comparative Analysis of Album, Photo, and Video Interactions
    2024 (English)In: Proceedings of the 16th ACM Web Science Conference / [ed] Luca Maria Aiello, Yelena Mejova, Oshani Seneviratne, Jun Sun, Sierra Kaiser, Steffen Staab, New York: Association for Computing Machinery (ACM), 2024, p. 224-234Conference paper, Published paper (Refereed)
    Abstract [en]

    Despite Instagram being an integral part of many people's lives, it is relatively less studied than many other platforms (e.g., Twitter and Facebook). Furthermore, despite offering diverse content formats for user expression and interaction, prior works have not studied the temporal dynamics of user engagement across albums, photos, and videos. To address this gap, we present a pioneering temporal comparative analysis that unveils nuanced patterns in user interactions across content types. Our analysis sheds light on interaction longevity and disparities among album, photo, and video engagement. Additionally, it offers empirical comparisons through statistical tests, examines contributing factors such as post and uploader characteristics, and analyzes content composition’s impact on user engagement. The findings reveal distinct temporal engagement patterns. Despite initial spikes in interactions post-upload, albums exhibit somewhat more sustained interest, while photos and videos have shorter engagement lifespans. Moreover, a consistent trend between shallow (likes) and deep (comments) interactions persists across content types. Notably, concise content, characterized by shorter descriptions and minimal hashtags/mentions, consistently drives higher engagement, emphasizing its relevance across all content formats. These insights deepen comprehension of temporal nuances in user engagement on Instagram, offering valuable guidance for content creators and marketers to tailor strategies that evoke immediate and sustained user interest.

    Place, publisher, year, edition, pages
    New York: Association for Computing Machinery (ACM), 2024
    Keywords
    Album, Instagram, Interactions, Photo, Temporal dynamics, User engagement, Video
    National Category
    Media and Communication Technology
    Identifiers
    urn:nbn:se:liu:diva-203207 (URN)10.1145/3614419.3644029 (DOI)9798400703348 (ISBN)
    Conference
    Websci '24: 16th ACM Web Science Conference, Stuttgart, May 21 - 24, 2024
    Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-05-07Bibliographically approved
    4. Temporal Dynamics of User Engagement with U.S. News Sources on Facebook
    Open this publication in new window or tab >>Temporal Dynamics of User Engagement with U.S. News Sources on Facebook
    2022 (English)In: Proc. International Conference on Social Networks Analysis, Management and Security (SNAMS), Institute of Electrical and Electronics Engineers (IEEE), 2022Conference paper, Published paper (Refereed)
    Abstract [en]

    Recently, researchers have modeled how reliability and political bias of news may affect Facebook users' engagement, as measured using interaction metrics such as the number of shares, likes, etc. However, the temporal dynamics of Facebook users' engagement with news of varying degrees of bias and reliability is less studied. In light of the COVID-19 pandemic, it is also important to quantify how the pandemic changed user engagement with various news. This paper presents the first temporal study of Facebook users' interaction dynamics, accounting for both the bias and reliability of the publishers. We consider a dataset of 992 U.S. publishers, and the study spans the period from Jan. 2018 to July 2022. This allows us to accurately assess the effect of the covid outbreak on the temporal dynamics of Facebook users' interactions with different classes of news. Our study examines these two parameters' effect on Facebook user engagement using both per-publisher and aggregated statistics. Several findings are revealed by our analysis, including that publishers in different bias and reliability classes experienced significantly different levels of engagement dynamics during and following the covid outbreak. For example, we show that the least reliable news exhibited the most considerable growth of followers during the covid period and the most reliable news sources exhibited the greatest growth rate of followers during the post-covid period. We also show that the interaction rate (number of interactions normalized over the number of followers) with Facebook news posts during the post-covid period is smaller than it was even before the outbreak. Furthermore, we demonstrate how the COVID-19 outbreak caused statistically significant structural breaks in the temporal dynamics of engagement with several types of news, and quantify this effect. With social media becoming a popular news source during crises, the observed temporal dynamics provide important insights into how information was consumed over the recent years, benefiting both researchers and public sectors.

    Place, publisher, year, edition, pages
    Institute of Electrical and Electronics Engineers (IEEE), 2022
    Series
    2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS), ISSN 2831-7351, E-ISSN 2831-7343
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:liu:diva-199089 (URN)10.1109/SNAMS58071.2022.10062675 (DOI)2-s2.0-85158985063 (Scopus ID)9798350320480 (ISBN)9798350320497 (ISBN)
    Conference
    Proc. International Conference on Social Networks Analysis, Management and Security (SNAMS), Milan, Italy, 29 November 2022 - 01 December, 2022
    Available from: 2023-11-11 Created: 2023-11-11 Last updated: 2024-05-03Bibliographically approved
  • 2.
    Thorgren, Elin
    et al.
    Linköping University.
    Mohammadinodooshan, Alireza
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Carlsson, Niklas
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Temporal Dynamics of User Engagement on Instagram: A Comparative Analysis of Album, Photo, and Video Interactions2024In: Proceedings of the 16th ACM Web Science Conference / [ed] Luca Maria Aiello, Yelena Mejova, Oshani Seneviratne, Jun Sun, Sierra Kaiser, Steffen Staab, New York: Association for Computing Machinery (ACM), 2024, p. 224-234Conference paper (Refereed)
    Abstract [en]

    Despite Instagram being an integral part of many people's lives, it is relatively less studied than many other platforms (e.g., Twitter and Facebook). Furthermore, despite offering diverse content formats for user expression and interaction, prior works have not studied the temporal dynamics of user engagement across albums, photos, and videos. To address this gap, we present a pioneering temporal comparative analysis that unveils nuanced patterns in user interactions across content types. Our analysis sheds light on interaction longevity and disparities among album, photo, and video engagement. Additionally, it offers empirical comparisons through statistical tests, examines contributing factors such as post and uploader characteristics, and analyzes content composition’s impact on user engagement. The findings reveal distinct temporal engagement patterns. Despite initial spikes in interactions post-upload, albums exhibit somewhat more sustained interest, while photos and videos have shorter engagement lifespans. Moreover, a consistent trend between shallow (likes) and deep (comments) interactions persists across content types. Notably, concise content, characterized by shorter descriptions and minimal hashtags/mentions, consistently drives higher engagement, emphasizing its relevance across all content formats. These insights deepen comprehension of temporal nuances in user engagement on Instagram, offering valuable guidance for content creators and marketers to tailor strategies that evoke immediate and sustained user interest.

  • 3.
    Mohammadinodooshan, Alireza
    et al.
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Holmgren, William
    Linköping University.
    Christensson, Martin
    Linköping University.
    Carlsson, Niklas
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    A Clone-based Analysis of the Content-Agnostic Factors Driving News Article Popularity on Twitter2023In: PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, ASSOC COMPUTING MACHINERY , 2023, p. 17-24Conference paper (Refereed)
    Abstract [en]

    The significant impact of Twitter in news dissemination underscores the need to understand what drives tweet popularity. While the content of an article plays a role, several "content-agnostic" factors also influence tweet popularity. Previous studies have faced challenges in differentiating the effects of content-agnostic factors from content variations. To address this, the paper presents a comprehensive analysis of tweet popularity using a "clone-based" approach. The methodology involves identifying tweets linking the same or similar articles (clones) and studying the factors that make some tweets within clone sets more successful in attracting retweets. The analysis reveals insights into clone set characteristics, winners' success patterns, retweet dynamics over time, domain-based competition, and predictors of success. The findings shed light on the complex nature of popularity and success in social media, providing a deeper understanding of the content-agnostic factors that influence tweet popularity.

  • 4.
    Mohammadinodooshan, Alireza
    et al.
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Carlsson, Niklas
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Effects of Political Bias and Reliability on Temporal User Engagement with News Articles Shared on Facebook2023In: PASSIVE AND ACTIVE MEASUREMENT, PAM 2023, SPRINGER INTERNATIONAL PUBLISHING AG , 2023, Vol. 13882, p. 160-187Conference paper (Refereed)
    Abstract [en]

    The reliability and political bias differ substantially between news articles published on the Internet. Recent research has examined how these two variables impact user engagement on Facebook, reflected by measures like the volume of shares, likes, and other interactions. However, most of this research is based on the ratings of publishers (not news articles), considers only bias or reliability (not combined), focuses on a limited set of user interactions, and ignores the users engagement dynamics over time. To address these shortcomings, this paper presents a temporal study of user interactions with a large set of labeled news articles capturing the temporal user engagement dynamics, bias, and reliability ratings of each news article. For the analysis, we use the public Facebook posts sharing these articles and all user interactions observed over time for those posts. Using a broad range of bias/reliability categories, we then study how the bias and reliability of news articles impact users engagement and how it changes as posts become older. Our findings show that the temporal interaction level is best captured when bias, reliability, time, and interaction type are evaluated jointly. We highlight many statistically significant disparities in the temporal engagement patterns (as seen across several interaction types) for different bias-reliability categories. The shared insights into engagement dynamics can benefit both publishers (to augment their temporal interaction prediction models) and moderators (to adjust efforts to post category and lifecycle stage).

  • 5.
    Terve, Carl
    et al.
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Erlingsson, Mattias
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Mohammadinodooshan, Alireza
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Carlsson, Niklas
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Social Media Dynamics of Shorted Companies2022In: Proc. International Conference on Social Networks Analysis, Management and Security (SNAMS), Institute of Electrical and Electronics Engineers (IEEE), 2022Conference paper (Refereed)
    Abstract [en]

    The discussions on social-media forums can impact the sentiment of a company, and consequently also its stock price. As we show here, some of the most shorted companies have provided some of the clearest examples of this relationship. In light of these observations, this paper presents a longitudinal study of the cross-forum dynamics of ten highly shorted stocks that saw significant discussions on the popular forums Reddit, Twitter, and Seeking Alpha. Using the posts from these forums, their sentiments, and the daily snapshots of the stock price of each company, we use a combination of qualitative case studies and quantitative hypothesis testing to derive new insights. Through a combination of time-series analysis, clustering, and domain-optimized sentiment analysis, we study the relationship between the times that discussions peak on the different forums, the changes in sentiment, and the stock price movements. We find that all three forums are likely to experience peaks in their activity close to each other, that Reddit is most likely to peak first, and that the sentiment of Twitter discussions were more sensitive to the current derivative of the stock price than the sentiment observed on the other forums.

  • 6.
    Mohammadinodooshan, Alireza
    et al.
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Carlsson, Niklas
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Temporal Dynamics of User Engagement with U.S. News Sources on Facebook2022In: Proc. International Conference on Social Networks Analysis, Management and Security (SNAMS), Institute of Electrical and Electronics Engineers (IEEE), 2022Conference paper (Refereed)
    Abstract [en]

    Recently, researchers have modeled how reliability and political bias of news may affect Facebook users' engagement, as measured using interaction metrics such as the number of shares, likes, etc. However, the temporal dynamics of Facebook users' engagement with news of varying degrees of bias and reliability is less studied. In light of the COVID-19 pandemic, it is also important to quantify how the pandemic changed user engagement with various news. This paper presents the first temporal study of Facebook users' interaction dynamics, accounting for both the bias and reliability of the publishers. We consider a dataset of 992 U.S. publishers, and the study spans the period from Jan. 2018 to July 2022. This allows us to accurately assess the effect of the covid outbreak on the temporal dynamics of Facebook users' interactions with different classes of news. Our study examines these two parameters' effect on Facebook user engagement using both per-publisher and aggregated statistics. Several findings are revealed by our analysis, including that publishers in different bias and reliability classes experienced significantly different levels of engagement dynamics during and following the covid outbreak. For example, we show that the least reliable news exhibited the most considerable growth of followers during the covid period and the most reliable news sources exhibited the greatest growth rate of followers during the post-covid period. We also show that the interaction rate (number of interactions normalized over the number of followers) with Facebook news posts during the post-covid period is smaller than it was even before the outbreak. Furthermore, we demonstrate how the COVID-19 outbreak caused statistically significant structural breaks in the temporal dynamics of engagement with several types of news, and quantify this effect. With social media becoming a popular news source during crises, the observed temporal dynamics provide important insights into how information was consumed over the recent years, benefiting both researchers and public sectors.

  • 7.
    Mohammadinodooshan, Alireza
    et al.
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Kargén, Ulf
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Shahmehri, Nahid
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Comment on "AndrODet: An adaptive Android obfuscation detector"2020Other (Other academic)
    Abstract [en]

    We have identified a methodological problem in the empirical evaluation of the string encryption detection capabilities of the AndrODet system described by Mirzaei et al. in the recent paper "AndrODet: An adaptive Android obfuscation detector". The accuracy of string encryption detection is evaluated using samples from the AMD and PraGuard malware datasets. However, the authors failed to account for the fact that many of the AMD samples are highly similar due to the fact that they come from the same malware family. This introduces a risk that a machine learning system trained on these samples could fail to learn a generalizable model for string encryption detection, and might instead learn to classify samples based on characteristics of each malware family. Our own evaluation strongly indicates that the reported high accuracy of AndrODet's string encryption detection is indeed due to this phenomenon. When we evaluated AndrODet, we found that when we ensured that samples from the same family never appeared in both training and testing data, the accuracy dropped to around 50%. Moreover, the PraGuard dataset is not suitable for evaluating a static string encryption detector such as AndrODet, since the particular obfuscation tool used to produce the dataset effectively makes it impossible to extract meaningful features of static strings in Android apps.

  • 8.
    Mohammadinodooshan, Alireza
    et al.
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Kargén, Ulf
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Shahmehri, Nahid
    Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, Faculty of Science & Engineering.
    Robust Detection of Obfuscated Strings in Android Apps2019In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, New York, NY, USA: Association for Computing Machinery (ACM), 2019, p. 25-35, article id 42Conference paper (Refereed)
    Abstract [en]

    While string obfuscation is a common technique used by mobile developers to prevent reverse engineering of their apps, malware authors also often employ it to, for example, avoid detection by signature-based antivirus products. For this reason, robust techniques for detecting obfuscated strings in apps are an important step towards more effective means of combating obfuscated malware. In this paper, we discuss and empirically characterize four significant limitations of existing machine-learning approaches to string obfuscation detection, and propose a novel method to address these limitations. The key insight of our method is that discriminative classification methods, which try to fit a decision boundary based on a set of positive and negative samples, are inherently bound to generalize poorly when used for string obfuscation detection. Since many different string obfuscation techniques exist, both in the form of commercial tools and as custom implementations, it is close to impossible to construct a training set that is representative of all possible obfuscations. We instead propose a generative approach based on the Naive Bayes method. We first model the distribution of natural-language strings, using a large corpus of strings from 235 languages, and then base our classification on a measure of the confidence with which a language can be assigned to a string. Crucially, this allows us to completely eliminate the need for obfuscated training samples. In our experiments, this new method significantly outperformed both an n-gram based random forest classifier and an entropy-based classifier, in terms of accuracy and generalizability.

1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf