美团骑手检测出虚假定位
Coordination is one of the central features of information operations and disinformation campaigns, which can be defined as concerted efforts to target people with false or misleading information, often with some strategic objective (political, social, financial), while using the affordances provided by social media platforms.
协调是信息运营和虚假宣传运动的主要特征之一,可以定义为协同努力,以具有虚假或误导性信息的人们为目标,通常以某些战略目标(政治,社会,财务)为目标,同时使用社会提供的能力媒体平台。
Since 2017, platforms like Facebook have developed new framings, policies, and entire teams to counter bad actors using its platform to interfere in elections, drive political polarization, or manipulate public opinion. After several iterations, the platform has settled on describing malicious activity like this as “coordinated inauthentic behavior,” referring to a group of actors “working in concert to engage in inauthentic behavior…where the use of fake accounts is central to the operation.”
自2017年以来,Facebook之类的平台已开发出新的框架,政策和整个团队,以利用其平台干预选举,推动政治两极分化或操纵舆论来对抗不良行为者。 经过几次迭代,该平台决定将这样的恶意活动描述为“ 协调不真实的行为 ”,指的是一组参与者“协同工作以进行不真实的行为……其中使用伪造帐户是操作的中心。”
More broadly, Twitter uses the phrase “platform manipulation,” which includes a range of actions, but also focuses on “coordinated activity…that attempts to artificially influence conversations through the use of multiple accounts, fake accounts, automation and/or scripting.”
更广泛地讲,Twitter使用了“ 平台操纵 ”这个短语,其中包括一系列动作,但也侧重于“协调活动……它试图通过使用多个帐户,伪造帐户,自动化和/或脚本来人为地影响对话。”
Unfortunately, open source researchers have limited data at their disposal to assess some of the characteristics of disinformation campaigns. Figuring out the authenticity of an account requires far more information than publicly available (data like browser usage, IP logging, device IDs, accounts e-mails, etc.). And while we might be able to figure out when a group of accounts are using stock photos, we can’t say anything about the origin of the people using these accounts.
不幸的是,开源研究人员只能使用有限的数据来评估虚假信息宣传活动的某些特征。 要弄清帐户的真实性,需要的信息比公开可用的信息要多得多(浏览器使用情况,IP日志记录,设备ID,帐户电子邮件等数据)。 尽管我们也许可以弄清楚一组帐户何时使用了照片,但是我们却无法说出使用这些帐户的人的来历。
Even the terms “coordinated” and “inauthentic” are not straightforward or neutral judgments, a problem highlighted by scholars like Kate Starbird and Evelyn Douek. Starbird has used multiple case studies to show that there are not clear distinctions between “orchestrated” and “organic” activity:
甚至“协调”和“不真实”这两个词也不是简单或中立的判断,这是像凯特·斯塔伯德 ( Kate Starbird)和伊夫琳 ·杜耶克 ( Evelyn Douek)这样的学者所强调的问题。 Starbird使用多个案例研究表明,“精心策划”的活动与“有机”活动之间没有明显的区别:
In particular, our work reveals entanglements between orchestrated action and organic activity, including the proliferation of authentic accounts (real people, sincerely participating) within activities that are guided by and/or integrated into disinformation campaigns.
尤其是,我们的工作揭示了精心策划的行动与组织活动之间的纠缠,包括在虚假宣传活动指导和/或整合的活动中,真实账户(真实的人,真诚地参与)的扩散。
In this post, I focus specifically on the digital artifacts of “coordination” to show how this behavior can be used to identify accounts that may be part of a disinformation campaign. But I also show how coming to this conclusion always requires additional analysis, especially to distinguish between “orchestrated action and organic activity.”
在这篇文章中,我特别关注“协调”的数字工件,以展示如何将此行为用于识别可能是虚假信息活动一部分的帐户。 但是,我还展示了如何得出这个结论总是需要额外的分析,尤其是要区分“精心策划的行动和有机活动”。
We’ll look at original data I collected prior to a Facebook takedown of pro-Trump groups associated with “The Beauty of Life” (TheBL) media site, reportedly tied to the Epoch Media Group. The removal of these accounts came with great fanfare in late 2019, in part because some of the accounts used AI-generated photos to populate their profiles. But another behavioral trait of the accounts— and one more visible in digital traces— was their coordinated amplification of URLs to the thebl.com and other assets within the network of accounts. We’ll also look at data from QAnon accounts on Instagram to get a better sense of what coordination can look like within a highly active online community.
我们将查看我在Facebook删除与“生活之美”(TheBL)媒体网站相关的亲特朗普组织之前收集的原始数据,据报道,该网站与Epoch Media Group相关。 这些帐户的删除在2019年末大肆宣传 ,部分原因是其中一些帐户使用了AI生成的照片来填充个人资料。 但是,帐户的另一行为特征(在数字跟踪中更明显)是它们协调地放大了thebl.com和帐户网络中其他资产的URL。 我们还将查看来自Instagram上QAnon帐户的数据,以更好地了解高度活跃的在线社区中的协调情况。
Note: I’ll be using the programming language R in my analysis and things get a bit technical from here.
注意:我将在分析中使用编程语言R,因此此处的内容会有点技术性。
成对相关,聚类和{tidygraph} (Pairwise correlations, clustering, and {tidygraph})
One method I’ve used consistently to help identify coordination comes from David Robinson’s {widyr} package, which has functions that allow you to compute pairwise correlations, distance, cosine similarity, and counts, as well as k-means and hierarchical clustering (in development).
我一直用来帮助识别协调性的一种方法来自David Robinson的{ widyr }程序包,该程序包具有允许您计算成对相关性,距离,余弦相似度和计数以及k均值和层次聚类的功能。 (开发中)。
Using some of the functions from {widyr}, we can answer a question like: which accounts have a tendency to link to the same domains? This kind of question can be applied to other features as well, like sequences of hashtags (an example we’ll explore next), mentions, shared posts, URLs, text, and even content.
使用{widyr}中的某些功能,我们可以回答类似的问题: 哪些帐户倾向于链接到相同的域 ? 这种问题也可以应用于其他功能,例如井号标签序列(我们将在下面探讨的示例),提及,共享帖子,URL,文本甚至内容。
Here, I’ve extracted the domains from URLs shared by Facebook groups from TheBL takedown, filtered domains that occurred less than 20 times, removed social media links, and put it all into a dataframe called domains_shared.
在这里,我从TheBL移除中从Facebook组共享的URL中提取了域,过滤了发生次数少于20次的域,删除了社交媒体链接,并将其全部放入一个名为domains_shared的数据框中 。
Using {tidygraph} and {ggraph}, we can then visualize the relationships between these groups. In the graph below, I’ve filtered the data to show relationships with a correlation above .5 and highlighted those groups (nodes) which I know were removed for inauthentic behavior. This graph shows that the greatest correlation (phi coefficient) is between the groups engaged in coordinated behavior because they had a much higher tendency to share URLs to content on thebl.com. An analysis focusing on the page-sharing behavior of the groups would yield a similar graph, showing the groups that were more likely to share posts by TheBLcom page and other assets.
使用{tidygraph}和{ggraph},我们可以可视化这些组之间的关系。 在下面的图形中,我过滤了数据以显示具有大于.5的相关性的关系,并突出显示了我知道由于不真实行为而被删除的那些组(节点)。 该图显示,最大的相关性(phi系数)在参与协调行为的组之间,因为他们有更高的倾向将URL共享到blbl.com上的内容。 着重于各组页面共享行为的分析将得出相似的图形,显示出更有可能通过TheBLcom页面和其他资产共享帖子的组。
This method can quickly help an investigator cluster accounts based on behaviors — domain-sharing in this case — but it still requires further examination of the accounts and the content they shared. This may sound familiar to disinformation researchers because it’s essentially the ABCs of Disinformation, by Camille François, which focuses on manipulative actors, behaviors, and content.
这种方法可以根据行为(在这种情况下为域共享)快速地帮助调查人员群集帐户,但仍然需要进一步检查帐户及其共享的内容 。 这可能听起来很熟悉造谣研究人员,因为它本质上是造谣的基本知识 ,由卡米尔弗朗索瓦 ,其重点是操纵行为 , 行为和内容 。
In the graph, you’ll see many other groups (unlabeled in gray) that are linked based on their domain-sharing behavior, but with lower correlations. For each group an analyst would need to examine other details like: administrators and their associated information (profile photos, friends, timelines, tagged photos, etc.); other groups managed by the administrators; creation date of the groups; which specific domains linked the groups together; shared posting patterns within the groups, etc., all in order to determine if they too could have been inauthentic.
在图中,您将看到许多其他组(未标记为灰色),这些组基于它们的域共享行为进行链接,但相关性较低。 对于每个小组,分析人员都需要检查其他详细信息,例如:管理员及其相关信息(个人资料照片,朋友,时间表,带标签的照片等); 由管理员管理的其他组; 组的创建日期; 哪些特定领域将各组联系在一起; 组等中的所有共享发布模式,以确定它们是否也可能是不真实的。
In the case of TheBL takedown, group administrators showed many signals of inauthenticity, but almost none of those signals were available in digital trace data. (This is well illustrated in independent analysis by Graphika and the Atlantic Council’s Digital Forensic Research Lab (DFRLab), who were able to review a list of accounts prior to Facebook’s takedown).
在TheBL撤消的情况下,组管理员显示了许多不真实的信号,但在数字跟踪数据中几乎没有这些信号。 (这在Graphika和大西洋理事会的数字取证研究实验室(DFRLab)进行的独立分析中得到了很好的说明,他们能够在Facebook删除之前查看帐户清单)。
Post timestamps, however, are in the data, and using those we can visualize the groups’ temporal “signatures” for posts linking to thebl.com. The graph below shows that some groups had distinct hourly signatures, with some variation in their frequency of posting:
但是,发布时间戳是 在数据中,并使用这些数据,我们可以可视化小组的时态“签名”,以显示链接到thebl.com的帖子。 下图显示了某些组具有不同的小时签名,其发布频率有所不同:
We can also use k-means clustering to try and distinguish groups better by their posting patterns. The {widyr} package has a function, widely_kmeans, that makes this straightforward to accomplish. First, we start with a dataframe where I’ve aggregated the number of hourly posts to the bl.com for all groups, shown here as trump_frequencies.
我们还可以使用k-means聚类尝试通过组的发布方式更好地区分组。 {widyr}包具有一个函数wide_kmeans ,可以轻松完成此操作。 首先,我们从一个数据帧开始,在该数据帧中,我汇总了所有群组每小时在bl.com上发布的帖子数,此处显示为trump_frequencies 。
We then scale n and use widely_kmeans(), where group_name is our item, hour is our feature, and scaled_n our value. We can inspect one of the clusters and see how well it grouped inauthentic accounts together.
然后,我们缩放n并使用broadly_kmeans() ,其中group_name是我们的商品, hour是我们的功能,而scaled_n是我们的值。 我们可以检查其中一个群集,并查看其如何将非真实帐户分组在一起。
We can visualize the temporal signatures again, this time faceting by clusters. This graph shows the signatures of inauthentic groups in red and authentic groups in gray. We can clearly see the similarity of temporal signatures in each cluster — the inauthentic groups have very distinct signatures that set them apart from the rest. Even so, some inauthentic groups are clustered with many other authentic groups (Cluster 5), illustrating the need for manual verification.
我们可以再次可视化时间签名,这次是聚类。 此图以红色显示不真实组的签名,以灰色显示真实组的签名。 我们可以清楚地看到每个群集中的时间签名的相似性-不真实的组具有非常不同的签名,这使它们与其余的区别开来。 即便如此,一些不真实的组也与许多其他真实的组一起聚在一起(集群5),这说明需要手动验证。
协调链接共享行为 (Coordinated link sharing behavior)
Another method I’ve used comes from {CooRnet}, an R library created by researchers at the University of Urbino Carlo Bo and IT University of Copenhagen. This method focuses entirely on “coordinated link sharing behavior,” which “refers to a specific coordinated activity performed by a network of Facebook pages, groups and verified public profiles (Facebook public entities) that repeatedly shared the same news articles in a very short time from each other.” The package uses an algorithm to determine the time period in which coordinated link sharing is occurring (or you can specify it yourself) and groups accounts together based on this behavior. There are some false positives, especially for link-sharing in groups, but it’s a useful tool and it can be used to look at other kinds of coordinated behavior.
我使用的另一种方法来自{ CooRnet },这是一个由Urbino Carlo Bo大学和哥本哈根IT大学的研究人员创建的R库。 此方法完全专注于“协调的链接共享行为”,“这是指由Facebook页面,群组和经过验证的公共资料(Facebook公共实体)网络执行的特定协调活动,这些活动在很短的时间内重复共享了相同的新闻文章。彼此之间。” 该软件包使用一种算法来确定发生协调链接共享的时间段(或您可以自行指定),并根据此行为将帐户分组在一起。 有一些误报,尤其是对于组中的链接共享,它是一个有用的工具,可用于查看其他类型的协调行为。
Recently, I adapted code to use {CooRnet}’s get_coord_shares function — the primary way to detect “networks of entities” engaged in coordination— on data from Instagram. A working paper out of the Center for Complex Networks and Systems Research lays out a network-based framework for uncovering accounts that are engaged in coordination, relying on data from content, temporal activity, handle-sharing, and other digital traces.
最近,我修改了代码,以使用{CooRnet}的get_coord_shares函数(用于检测参与协调的“实体网络”的主要方式)对Instagram数据进行处理。 复杂网络和系统研究中心的工作文件提出了一个基于网络的框架,用于发现参与协调的帐户,这些帐户依赖于来自内容,临时活动,句柄共享和其他数字跟踪的数据。
In one case study the authors propose identifying coordinated accounts using highly similar sequences of hashtags across messages (see Figure 5, from their paper); they theorize that while assets may try to obfuscate their coordination by paraphrasing similar text in messages, “even paraphrased text is likely to include the same hashtags based on the targets of a coordinated campaign.”
在一个案例研究中,作者建议使用跨消息的标签标签序列高度相似来识别协调帐户(请参见论文中的图5)。 他们的理论是,尽管资产可能会试图通过在消息中用相似的文字来掩饰其协调性,但“即使是经过改写的文字也可能会基于协调运动的目标而包含相同的主题标签。”
Given the content of Instagram messages, I thought this would be a good opportunity to test this method out with{CooRnet}. Using CrowdTangle, I retrieved 166,808 Instagram messages mentioning “QAnon” or “wg1wga” since January 1, 2020. I then extracted the sequence of hashtags used in each message of the dataset and later removed sequences that had not been used more than 20 times. This resulted in hashtag sequences that look like this (the QAnon community isn’t exactly known for its brevity):
鉴于Instagram消息的内容,我认为这是一个很好的机会,可以使用{CooRnet}测试此方法。 自2020年1月1日以来,我使用CrowdTangle检索了166,808条Instagram消息,其中提及“ QAnon”或“ wg1wga”。然后,我提取了数据集中每条消息中使用的#标签序列,然后删除了未使用超过20次的序列。 这导致了如下所示的主题标签序列(QAnon社区并不以其简短而著称):
QAnon WWG1WGA UnitedNotDivided TheGreatAwakening Spexit España EspañaViva VivaEspaña ArribaEspaña NWO Bilderberg Rothschild MakeSpainGreatAgain AnteTodoEspaña Comunismo Marxismo Feminismo Socialismo PSOE UnidasPodemos FaseLibertad Masones Satanismo ObamaGate Pizzagate Pedogate QAnonEspaña DV1VTq qanon wwg1wga darktolight panicindc sheepnomore patriotshavenoskincolor secretspaceprogram thegreatawakeningasleepnomore savethechildren itsallaboutthekids protectthechildren momlife familyiseverything wqke wakeupamerica wakeupsheeple wwg1wga qanon🇺🇸 qanonarmy redpill stormiscoming stormishere trusttheplan darktolight freedomisnotfree q whitehats conspiracyrealist
Once I had my hashtag sequences, I could write code to feed data into the get_coord_shares function, setting a coordination interval of 300 seconds, or 5 minutes. This means that the algorithm would look for accounts that deployed identical hashtag sequences within a 5-minute window of one another. The algorithm detected 157 highly coordinated accounts, whose relationships I’ve visualized in the network graph below.
一旦有了我的标签序列,就可以编写代码以将数据输入到get_coord_shares函数中, 并将协调间隔设置为300秒或5分钟。 这意味着该算法将寻找在彼此的5分钟窗口内部署了相同主题标签序列的帐户。 该算法检测到157个高度协调的帐户,我在下面的网络图中可以看到它们的关系。
First, we can see small isolates that are clustered together and — even on first glance — are clearly related as backup accounts of one another: qanonhispania_ and qanonhispania2; redpillpusha and redpillpushaa; and other isolates like deplorabledefenders and theconservativesofamerica. We also find a larger isolate located in the right-middle of the graph, consisting of accounts like creepy.joee, the.new.federation, republican.s, and others.
首先,我们可以看到聚集在一起的小型隔离株,即使乍一看,也明显彼此之间相互关联:qanonhispania_和qanonhispania2; redpillpusha和redpillpushaa; 以及其他隔离株,例如令人反感的防御者和保守的美国足球协会。 我们还在图的右中间找到一个较大的隔离区,其中包含蠕变.joee,新联邦,共和党等帐户。
After inspecting each account in this isolate, I found that the accounts are tied to an organization called, “Red Elephant Marketing LLC.” The accounts are clearly coordinated, but it’s also obvious why this is the case. (After more open source research, I tied the accounts to a network of conservative media and LLCs, including Facebook pages that had no transparency about ownership, all of them likely linked to a single individual).
在检查了隔离物中的每个帐户之后,我发现这些帐户与一个名为“ Red Elephant Marketing LLC”的组织有关。 帐目明确协调,但显而易见的是为什么如此。 (经过更多开放源代码研究之后,我将这些帐户与保守的媒体和有限责任公司的网络联系在一起,包括对所有权不透明的Facebook页面,所有这些页面都可能链接到一个人)。
The dense cluster in the center of the graph looks like a giant soup of coordination — if we accept the algorithm’s output as gospel— but after inspecting the hashtags sequences here, I found that it is actually due to single sequences like “QAnon” and “wg1wga.” These were my initial search queries and they’re used frequently by many QAnon accounts. When I filtered out these single hashtag sequences, the number of coordinated accounts reduced dramatically. While we can’t rule out that this cluster is highly coordinated, it’s less likely, but to be certain we’d need to do additional analysis (by e.g., adjusting the coordination interval, checking results again, and examining other data like content).
图中心密集的簇看起来像一个巨大的协调汤匙(如果我们接受算法的输出为福音),但是在检查了此处的标签序列之后,我发现这实际上是由于诸如“ QAnon”和“ wg1wga。” 这些是我最初的搜索查询,许多QAnon帐户经常使用它们。 当我过滤掉这些单一的标签序列时,协调帐户的数量大大减少了。 尽管我们不能排除此集群的高度协调性,但可能性较小,但是可以肯定的是,我们需要进行其他分析(例如,调整协调间隔,再次检查结果以及检查其他数据,例如内容) 。
无论如何,什么是“协调”? (What is “coordination”, anyway?)
Part of what makes disinformation campaigns so difficult to detect is that so much of our online activity appears, and indeed is, coordinated. Hashtags can summon entire movements into being in just a few hours and diverse groups of people can appear in your replies in minutes. The crowdsourced mobilization of the #MeToo movement and #BlackLivesMatter are good examples of such events.
使虚假信息活动如此难以检测的部分原因是,我们的许多在线活动都出现了,并且确实是协调的。 标签可以在短短几个小时内就将整个动作召唤出来,而各种各样的人可以在几分钟内出现在您的回复中。 #MeToo运动和#BlackLivesMatter的众筹动员就是此类事件的很好例子。
But these networked publics are also playgrounds for a growing number of adversaries — scammers, hyper-partisan media, fringe conspirators, political consultants, state-backed baddies, and more. All of these actors usually engage in some level of coordination and inauthenticity, some more obliquely than others, and they are trying to hide among sincere activists all the time. Evelyn Douek makes this point quite well in, “What Does “Coordinated Inauthentic Behavior” Actually Mean?”:
但是,这些网络公众还是越来越多的对手的骗子,包括骗子,超级党派媒体,边缘串谋者,政治顾问,国家支持的坏蛋等等。 所有这些参与者通常都从事某种程度的协调和虚伪,比其他参与者更倾向于倾斜,他们一直试图躲在真诚的激进主义者中间。 伊夫琳·杜耶克(Evelyn Douek)在“协调的不真实行为实际上意味着什么? ”:
Coordination and authenticity are not binary states but matters of degree, and this ambiguity will be exploited by actors of all stripes.
协调性和真实性不是二元状态,而是程度的问题,各种歧义的参与者都将利用这种歧义。
A recent working paper by researchers at the University of Florida also provides some insight into the difficulties of this distinction. The researchers analyzed 51 million tweets from Twitter’s information operations archive in an attempt to distinguish between the coordination used in state-backed coordination and that found in online communities.
佛罗里达大学研究人员最近的一份工作论文也提供了一些了解这种区别的困难的见解。 研究人员分析了Twitter信息运营档案中的5100万条推文,试图区分由国家支持的协调所使用的协调与在线社区中所进行的协调。
Using six previously established “coordination patterns” from other disinformation studies, the researchers built a network-based classifier that looked at 10 state-backed campaigns and 4 online communities ranging from politics to academics and security researchers. The researchers found that coordination is indeed not uncommon on Twitter and highly-political communities are more likely to show patterns similar to those used by strategic information operations.
研究人员使用其他信息研究中先前建立的六个“协调模式”,建立了一个基于网络的分类器,该分类器研究了10个国家支持的竞选活动以及4个在线社区,从政治到学者和安全研究人员。 研究人员发现,在Twitter上进行协调确实并不罕见,高度政治化的社区更有可能显示出与战略信息运营所使用的模式相似的模式。
These findings suggest that identifying coordination patterns alone is not enough to detect disinformation campaigns. Analysts need to consider coordination alongside other pieces of evidence that suggest that a group of accounts might be connected. Do the accounts have a similar or identical temporal signature? Do they have the same number of page administrators or similar locations? Is there additional information on domain registrants? Do the accounts deploy similar narratives, images, or other content? How else might they be connected, and are there reasonable explanations for those connections? As researchers in a field burgeoning with disinformation, it’s critical we come to conclusions with multiple pieces of evidence, alternative hypotheses, and with reliable confidence that we have tested and can defend those theories.
这些发现表明, 仅识别协调模式不足以检测虚假信息活动。 分析师需要与其他证据一起考虑协调,这些证据表明可能存在一组帐户。 帐户是否具有相似或相同的时间签名? 他们是否具有相同数量的页面管理员或相似的位置? 是否有关于域名注册人的其他信息? 这些帐户是否部署了类似的叙述,图像或其他内容? 能怎样 可以连接它们吗?对于这些连接有合理的解释吗? 随着研究人员在虚假信息领域的蓬勃发展,我们得出结论的关键是多方面的证据,替代假设以及我们已经过测试并且可以捍卫这些理论的可靠信心。
翻译自: https://medium.com/swlh/detecting-coordination-in-disinformation-campaigns-7e9fa4ca44f3
美团骑手检测出虚假定位
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389908.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!