使用机器学习预测天气_如何使用机器学习根据文章标题预测喜欢和分享

使用机器学习预测天气

by Flavio H. Freitas

Flavio H.Freitas着

如何使用机器学习根据文章标题预测喜欢和分享 (How to predict likes and shares based on your article’s title using Machine Learning)

Choosing a good title for an article is an important step in the writing process. The more interesting the title seems, the higher the chance a reader will interact with the whole thing. Furthermore, showing the user content they prefer (to interact with) increases the user’s satisfaction.

为文章选择一个好的标题是写作过程中的重要一步。 标题似乎越有趣,读者与整个事物进行交互的机会就越高。 此外,显示他们喜欢(与之交互)的用户内容可以提高用户的满意度。

This is how my final project from the Machine Learning Engineer Nanodegree specialization started. I just finished it, and I feel so proud and happy ? that I wanted to share with you some insights I’ve had about the whole flow. Also, I promised Quincy Larson this article when I finished the project.

这就是我来自机器学习工程师纳米学位专业的最终项目的开始方式。 我刚完成,就感到如此自豪和幸福 ? 我想与您分享我对整个流程的一些见解。 另外,我在完成项目时向Q uincy Larson承诺了这篇文章。

If you want to see the final technical document click here. If you want the implementation of the code, check it out here or fork my project on GitHub. If you just want an overview using layperson’s terms, this is the right place — continue reading this article.

如果要查看最终技术文档, 请单击此处 。 如果您想执行代码,请在此处查看或在GitHub上分叉我的项目。 如果您只想使用通俗易懂的术语进行概述,那么这里是正确的地方-继续阅读本文。

Some of the most used platforms to spread ideas nowadays are Twitter and Medium (you are here!). On Twitter, articles are normally posted including external URLs and the title, where users can access the article and demonstrate their satisfaction with a like or a retweet of the original post.

如今,用于传播思想的一些最常用的平台是Twitter和Medium(您在这里!)。 在Twitter上,通常会发布包含外部URL和标题的文章,用户可以在其中访问文章并通过对原始帖子的赞或转发来表明其满意。

Medium shows the full text with tags (to classify the article) and claps (similar to Twitter’s likes) to show how much the users appreciate the content. A correlation between these two platforms can bring us valuable information.

中号显示带有标签(对文章进行分类)和拍手(类似于Twitter的赞)的全文,以显示用户对内容的欣赏程度。 这两个平台之间的关联可以为我们带来有价值的信息。

该项目 (The project)

The problem that I defined was a classification task using supervised learning: Predict the number of likes and retweets an article receives based on the title.

我定义的问题是使用监督学习的分类任务: 根据标题预测文章收到的喜欢和转发的次数。

Correlating the number of likes and retweets from Twitter with a Medium article is an attempt to isolate the effect of the number of reached readers and the number of Medium claps. Because the more the article is shared on different platforms, the more readers it will reach and the more Medium claps it will (likely) receive.

将来自Twitter的点赞和转发的次数与“中型”文章相关联,是一种尝试将达到的读者数量和“中型”拍手数量的影响分开的尝试。 由于在不同平台上分享的文章越多,读者就会越多,并且(可能)会收到更多的中奖。

Using only the Twitter statistic, we’d expect that the articles reached initially almost the same number of readers (those readers being the followers of the freeCodeCamp account on Twitter). Their performance and interactions, therefore, would be limited to the characteristics of the tweet — for example, the title of the article. And that is exactly what we want to measure.

我们仅使用Twitter统计信息,就可以预期文章最初吸引的读者人数几乎相同(这些读者是Twitter上freeCodeCamp帐户的追随者)。 因此,它们的性能和交互作用将仅限于该推文的特性,例如,文章标题。 而这正是我们要衡量的。

I chose the freeCodeCamp account for this project because the idea was to limit the scope of the subject of the articles and better predict the response on a specific field. The same title can perform well in one category (e.g. Technology), but not necessarily in a different one (e.g. Culinary). Also, this account posts the title of the original article and the URL on Medium as the tweet content.

我之所以选择该项目的freeCodeCamp帐户 ,是因为其想法是限制文章主题的范围,并更好地预测特定领域的响应。 同一标题在一个类别(例如技术)中可以表现良好,但不一定在另一个类别(例如烹饪)中表现良好。 另外,此帐户将原始文章的标题和URL张贴在Medium上作为推文内容。

数据看起来如何? (How does the data look?)

The first step of this project was to get the information from Twitter and Medium and then correlate it. The dataset can be found here and it has 711 data points. This is how the dataset looks like:

该项目的第一步是从Twitter和Medium获取信息,然后将其关联。 数据集可以在这里找到,它具有711个数据点。 数据集如下所示:

分析和学习数据 (Analyzing and learning with the data)

After analyzing the dataset and plotting some graphics, I found interesting information about it. For these analyses, the outliers were removed, and I just considered the 25% top performers for each feature (retweet, like, and clap).

在分析数据集并绘制一些图形之后,我发现了有关它的有趣信息。 对于这些分析, 离群值被删除了,我只是考虑了每个功能(转推,喜欢和鼓掌)中表现最好25%

So let’s take a look at what the numbers say for freeCodeCamp articles written on Medium and shared on Twitter.

因此,让我们看一下这些数字对在Medium上写并在Twitter上共享的freeCodeCamp文章的含义。

好的标题长度是多少? (What is a good title length?)

Writing titles that have a length greater than 50 and less than 110 characters helps to increase the chances of a successful article.

撰写长度超过50个字符且少于110个字符的标题有助于增加文章成功的机会。

标题中有多少个单词? (What is a good number of words in the title?)

The most effective number of words in the title is 9 to 17. To optimize the number of retweets and likes, try something from 9 to 18 words, and for claps from 7 to 17.

标题中最有效的单词数是9到17 。 要优化转发和点赞的次数,请尝试输入9到18个单词,拍手范围为7到17个单词。

哪些类别最适合标记? (Which are the best categories to tag?)

Programming, Tech, Technology, JavaScript and Web Development are categories you should consider when tagging your next article. They appear for all the three features as a good indicator.

编程技术技术JavaScriptWeb开发是标记下一篇文章时应考虑的类别。 对于所有这三个功能,它们都可以作为一个很好的指示。

最好使用哪些单词? (Which are the best words to use?)

In this lexical analysis, you’ll notice that some words get much more attention on the freeCodeCamp community than others. If the intention is to make the articles reach further in numbers, talking about JavaScript, React or CSS will increase how much it’s appreciated. Using the words “learn” or “guide” to describe will also make the probability higher.

在此词法分析中,您会注意到,在FreeCodeCamp社区中,某些单词比其他单词受到更多关注。 如果希望使文章的数量更多,那么谈论JavaScript,React或CSS将会增加它的赞赏程度。 使用“学习”或“指南”一词来描述也将使概率更高。

使用机器学习 (Using Machine Learning)

OK! After taking a look at the data and extracting some information from it, the goal was to create a Machine Learning model that makes predictions of the number of retweets, likes, and claps based on the title of the article.

好! 在查看了数据并从中提取了一些信息之后,目标是创建一个机器学习模型,该模型根据文章的标题来预测转发,喜欢和拍手的数量。

Predicting the number of retweets, likes, and claps of an article can be treated as a classification problem, and that is a common task of machine learning (ML). But for this, we need to use the output as discrete values (a range of numbers). The input will be the title of the articles with each word as a token (t1, t2, t3, … tn), the title length, and the number of words in the title.

预测文章的转发,喜欢和拍手的数量可以视为分类问题,而这是机器学习(ML)的常见任务。 但是为此,我们需要将输出用作离散值(数字范围)。 输入将是文章的标题,每个单词作为标记(t1,t2,t3,…tn),标题长度和标题中的单词数。

The ranges for our features are:

我们的功能范围是:

  • Retweets: 0–10, 10–30, 30+

    转推:0-10、10-30、30 +
  • Likes: 0–25, 25–60, 60+

    喜欢:0–25、25–60、60 +
  • Claps: 0–50, 50–400, 400+

    拍手:0–50、50–400、400 +

And finally, after preprocessing our dataset and evaluating some models (everything fully described here), we reached the conclusion that the MultinomialNB model performed better for retweets reaching an accuracy of 60.6%. Logistic regression reached 55.3% for likes and 49% for claps.

最后,在对数据集进行预处理并评估了一些模型( 此处已全面描述)后,我们得出的结论是,MultinomialNB模型对转发的性能更好,达到60.6%的准确性。 对喜欢的Logistic回归达到55.3%,对拍手的Logistic回归达到49%。

As an experiment for this article, I ran the prediction of the title of this article and the model predicted that:

作为本文的实验,我对本文标题进行了预测,该模型预测:

It will have 10–30 retweets and 25–60 favorites on Twitter and 400+ claps on Medium.

在Twitter上将有10–30条转发和25–60条收藏夹,在Medium将有400多个拍手。

How is this prediction? ?

这个预测如何? ?

Follow me if you want to read more of my articles ? And if you enjoyed this article, be sure to like it give me a lot of claps — it means the world to the writer.

如果您想我的文章,请 关注我而且,如果您喜欢这篇文章,请确保喜欢它给了我很多鼓掌-这对作家来说意味着世界。

Flávio H. de Freitas is an Entrepreneur, Engineer, Tech lover, Dreamer and Traveler. Has worked as CTO in Brazil, Silicon Valley and Europe.

FlávioH. de Freitas是一位企业家,工程师,技术爱好者,梦想家和旅行者。 曾在巴西硅谷和欧洲担任首席技术官

翻译自: https://www.freecodecamp.org/news/how-to-predict-likes-and-shares-based-on-your-articles-title-using-machine-learning-47f98f0612ea/

使用机器学习预测天气

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/393255.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

深入理解了MySQL,你才能说熟悉数据库

先抛出几个问题 1.为什么不建议使用订单号作为主键?2.为什么要在需要排序的字段上加索引?3.for update 的记录不存在会导致锁住全表?4.redolog 和 binlog 有什么区别?5.MySQL 如何回滚一条 sql ?6.char(50) 和 varchar(50) 效果是一样的么?索引知识回顾 对于 MySQL 数据库…

ibatis mysql 自增_mybatis自增主键

简单介绍:在使用mybats插入数据是,有很多需要和id关联的其他数据,所以在插入一条信息时获取其主键信息是很常见的操作。一 mysql数据库的主键自增(int类型的主键)1 创建一个表,设置表的id(此id必须是int类型),设置为au…

DataGridView控件用法二:常用属性

通常会设置的DataGridView的属性如下: AllowUserToAddRows - False指示是否向用户显示用于添加行的选项,列标题下面的一行空行将消失。一般让其消失。AllowUserToDeleteRows - False指示是否允许用户从DataGridView删除行。一般不允许。AllowUserToOrder…

leetcode面试题 16.21. 交换和(二分查找)

给定两个整数数组,请交换一对数值(每个数组中取一个数值),使得两个数组所有元素的和相等。 返回一个数组,第一个元素是第一个数组中要交换的元素,第二个元素是第二个数组中要交换的元素。若有多个答案&…

谈谈IP和MAC捆绑的破解之道

来源:[url]http://l-y.vicp.net[/url]我们学校最近将MAC和IP进行了捆绑,又在服务器(2K)上进行了上网时间的限制,真是烦死人了,我想我可是一个从不受限制的人啊,怎么可以就这样束手就擒呢!古话说…

如何在JavaScript中区分深层副本和浅层副本

by Lukas Gisder-Dub卢卡斯吉斯杜比(LukasGisder-Dub) 如何在JavaScript中区分深层副本和浅层副本 (How to differentiate between deep and shallow copies in JavaScript) New is always better!新总是更好! You have most certainly dealt with copies in Java…

网站QQ全屏PHP代码,QQ技术导航升级版 超级导航美化版带后台版 PHP源码

QQ技术导航升级版 超级导航美化版带后台版改进F2样式,主针对QQ教程网、卡盟、博客、提供更好收录的位置。改进QQ技术导航背景,增加整体美观效果。去掉死链页面,站长操作使用更加有扩大空间。优化后台登陆界面,去掉织梦后台携带的广…

MySQL基础操作(一)

MySQL操作 一、创建数据库 # utf-8 CREATE DATABASE 数据库名称 DEFAULT CHARSET utf8 COLLATE utf8_general_ci;# gbk CREATE DATABASE 数据库名称 DEFAULT CHARACTER SET gbk COLLATE gbk_chinese_ci; 二、用户管理 创建用户create user 用户名IP地址 identified by 密码; 删…

集合框架05

一、HashSet集合 1 public class Demo01 {2 /*3 * Set接口,特点不重复元素,没索引4 * Set接口的实现类,HashSet(哈希表)5 * 特点:无序集合,存储和取出的顺序不同,没有索引,不…

leetcode1233. 删除子文件夹

你是一位系统管理员,手里有一份文件夹列表 folder,你的任务是要删除该列表中的所有 子文件夹,并以 任意顺序 返回剩下的文件夹。 我们这样定义「子文件夹」: 如果文件夹 folder[i] 位于另一个文件夹 folder[j] 下,那…

HIVE-分桶表的详解和创建实例

我们学习一下分桶表,其实分区和分桶这两个概念对于初学者来说是比较难理解的。但对于理解了的人来说,发现又是如此简单。 我们先建立一个分桶表,并尝试直接上传一个数据 create table student4(sno int,sname string,sex string,sage int, sd…

51nod1270(dp)

题目链接:http://www.51nod.com/onlineJudge/questionCode.html#!problemId1270 题意:中文题诶~ 思路:dp sabs(a1-a0)abs(a2-a1).... 要使s尽量大,需要让abs(ai-ai-1)尽量大,那么可以让其中一个尽量小&…

Windows IIS 日志分析研究(Log Parser Log Parser Lizard Log Parser Studio) update...

Windows主要有以下三类日志记录系统事件:应用程序日志、系统日志和安全日志。 存放目录:X:\Windows\System32\winevt\Logs\ System.evtx 系统日志 Application.evtx 应用程序日志 Security.evtx 安全日志 审核策略与事件查看器 # 管理工具 → 本地安全…

ios php ide,最好的PHP IDE for Mac? (最好免费!)

这里是PHP的Mac IDE的下降NetBeans自由!此外,所有产品的最佳功能。包括内联数据库连接,代码完成,语法检查,颜色编码,分割视图等。下降:这是一个内存猪在Mac上。准备好允许一半的内存&#xff0c…

leetcode79. 24 点游戏

你有 4 张写有 1 到 9 数字的牌。你需要判断是否能通过 *,/,,-,(,) 的运算得到 24。 示例 1: 输入: [4, 1, 8, 7] 输出: True 解释: (8-4) * (7-1) 24 代码 class Solution {public boolean judgePoint24(int[] n…

Linux邮件系统整合windows 2008 R2 AD域认证更新

1. 安装只要执行install.sh即可。(安装包约40几M) 2.文档更新功能 (原v1.0文档链接:http://godoha.blog.51cto.com/108180/691376) 本文转自 godoha 51CTO博客,原文链接:http://blog.51cto.com/…

004:神秘的数组初始化_使容器神秘化101:面向初学者的深入研究容器技术

004:神秘的数组初始化by Will Wang王Will 介绍 (Introduction) Regardless of whether you are a student in school, a developer at some company, or a software enthusiast, chances are you heard of containers. You may have also heard that containers are lightweig…

php js动态显示系统时间,PHP+JS动态显示当前时间

header("content-type:text/html;charsetgb2312");date_default_timezone_set("PRC");echo var dayNames new Array("星期日","星期一","星期二","星期三","星期四","星期五","星期六&…

代码整洁之道,clean code

一、注释 1、不准确的注释比没有注释更令人头疼 尽量用语义化的代码来解释你的意图,而不是依赖注释来解释一段代码 原因很简单:程序员不能坚持维护注释。 代码在后期维护中,不断的优化、变动,很有可能最初的注释已和现有的代码没…

java 获取手机归属地,引起net.UnknownHostException错误

这个问题是请求,重定向了,跟入源码。修改了地址,变成302 Connection connect Jsoup.connect(url);connect.header("Host", "http://info.bet007.com");connect.header("User-Agent", " Mozilla/5.0 (Wi…