朴素贝叶斯 半朴素贝叶斯_使用朴素贝叶斯和N-Gram的Twitter情绪分析

朴素贝叶斯 半朴素贝叶斯

In this article, we’ll show you how to classify a tweet into either positive or negative, using two famous machine learning algorithms: Naive Bayes and N-Gram.

在本文中,我们将向您展示如何使用两种著名的机器学习算法:朴素贝叶斯(Naive Bayes)和N-Gram将推文分类为肯定或否定。

First, what is sentiment analysis?

首先,什么是情感分析?

Sentiment analysis is the automated process of analyzing text data and sorting it into sentiments positive, negative, or neutral. Using sentiment analysis tools to analyze opinions in Twitter data can help companies understand how people are talking about their brand.

情感分析是分析文本数据并将其分类为正面,负面或中性的自动化过程。 使用情绪分析工具分析Twitter数据中的观点可以帮助公司了解人们如何谈论自己的品牌。

Now that you know what sentiment analysis is, let’s start coding.

现在您已经了解了情感分析,让我们开始编码。

We have divided the whole program into three parts:

我们将整个程序分为三个部分:

  • Importing the datasets

    导入数据集
  • Preprocessing of datasets

    数据集的预处理
  • Applying machine learning algorithms

    应用机器学习算法

Note: We have used Jupyter Notebook but you can use the editor of your choice.

注意:我们使用了Jupyter Notebook,但您可以使用自己选择的编辑器。

步骤1:导入数据集 (Step 1: Importing the Datasets)

Displaying the top ten columns of the dataset:

显示数据集的前十列:

data.head(10)
Image for post

From the dataset above we can clearly see the use of the following (none of which is of any use in determining the sentiment of a tweet):

从上面的数据集中,我们可以清楚地看到以下内容的用途(在确定推文情感时,没有任何用处):

  • Acronyms

    缩略语
  • Sequences of repeated characters

    重复字符序列
  • Emoticons

    表情符号
  • Spelling mistakes

    拼写错误
  • Nouns

    名词

Let’s see if our dataset is balanced around the label class sentiment:

让我们看看我们的数据集是否在标签类情感上保持平衡:

plt.close()
fig, ax = plt.subplots()
counts, bins, patches = ax.hist(data.Sentiment.as_matrix(), edgecolor='gray')ax.set_title("Histogram of Sentiments")ax.set_xlabel("Sentiment")ax.set_ylabel("Frequecy")patches[0].set_facecolor("#5d4037")
patches[0].set_label("negative")patches[-1].set_facecolor("#ff9100")
patches[-1].set_label("positive")plt.legend()
Image for post

The dataset seems to be very balanced between negative and positive sentiment.

数据集似乎在消极情绪和积极情绪之间非常平衡。

Now, we need to import other datasets which will help us with the preprocessing, such as:

现在,我们需要导入其他可以帮助我们进行预处理的数据集,例如:

  • An emoticon dictionary regrouping 132 of the most used emoticons in western with their sentiment, negative or positive:

    表情符号字典将132个西方最常用的表情符号及其负面或正面情绪重新组合:
emoticons = pd.read_csv('data/smileys.csv')
positive_emoticons = emoticons[emoticons.Sentiment == 1]
negative_emoticons = emoticons[emoticons.Sentiment == 0]
emoticons.head(5)
Image for post
  • An acronym dictionary of 5465 acronyms with their translations:

    一个缩略词词典,包含5465个缩略语及其翻译:
acronyms = pd.read_csv('data/acronyms.csv')
acronyms.tail(5)
Image for post
  • A stop word dictionary, corresponding to words that are filtered out before or after processing of natural language data because they’re not useful in our case.

    停用词字典,对应于在处理自然语言数据之前或之后过滤掉的词,因为它们在我们的案例中没有用。
stops = pd.read_csv('data/stopwords.csv')
stops.columns = ['Word']
stops.head(5)
Image for post
  • A positive and negative word dictionary:

    正负词词典:
positive_words = pd.read_csv('data/positive-words.csv', sep='\t')
positive_words.columns = ['Word', 'Sentiment']
negative_words = pd.read_csv('data/negative-words.csv', sep='\t')
negative_words.columns = ['Word', 'Sentiment']
positive_words.head(5)
Image for post
negative_words.head(5)
Image for post

步骤2: 数据集的预处理 (Step 2: Preprocessing of Datasets)

什么是数据预处理? (What is data preprocessing?)

Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis.

数据预处理是一种用于将原始数据转换为干净数据集的技术。 换句话说,无论何时从不同来源收集数据,数据都以原始格式收集,这对于分析是不可行的。

Now, let's begin with the preprocessing part.

现在,让我们从预处理部分开始。

To do this we are going to pass our data through various steps:

为此,我们将通过各种步骤传递数据:

  • Replace all emoticons by their sentiment polarity ||pos||/||neg|| using the emoticon dictionary:

    用表情极性替换所有表情||pos|| / ||neg|| 使用表情词典:

import re  
def make_emoticon_pattern(emoticons):pattern = "|".join(map(re.escape, emoticons.Smiley))pattern = "(?<=\s)(" + pattern + ")(?=\s)"return pattern  
def find_with_pattern(pattern, replace=False, tag=None):if replace and tag == None:raise Exception("Parameter error", "If replace=True you should add the tag by which the pattern will be replaced")regex = re.compile(pattern)if replace:return data.SentimentText.apply(lambda tweet: re.sub(pattern, tag, " " + tweet + " "))return data.SentimentText.apply(lambda tweet: re.findall(pattern, " " + tweet + " "))
pos_emoticons_found = find_with_pattern(make_emoticon_pattern(positive_emoticons)) 
neg_emoticons_found = find_with_pattern(make_emoticon_pattern(negative_emoticons))  
nb_pos_emoticons = len(pos_emoticons_found[pos_emoticons_found.map(lambda emoticons : len(emoticons) > 0)]) 
nb_neg_emoticons = len(neg_emoticons_found[neg_emoticons_found.map(lambda emoticons : len(emoticons) > 0)]) 
print "Number of positive emoticons: " + str(nb_pos_emoticons) + " Number of negative emoticons: " + str(nb_neg_emoticons)
--------------------------------------------------------------------
data.SentimentText = find_with_pattern(make_emoticon_pattern(positive_emoticons), True, '||pos||') 
data.SentimentText = find_with_pattern(make_emoticon_pattern(negative_emoticons), True, '||neg||') data.head(10)
Image for post
  • Replace all URLs with a tag ||url||:

    用标签||url||替换所有URL。 :

pattern_url = re.compile(ur'(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?\xab\xbb\u201c\u201d\u2018\u2019]))')
url_found = find_with_pattern(pattern_url)data.SentimentText = find_with_pattern(pattern_url, True, '||url||') data[50:60]
Image for post
  • Remove unicode characters:

    删除unicode字符:
def remove_unicode(string):
try:
string = string.decode('unicode_escape').encode('ascii','ignore')
except UnicodeDecodeError:
pass
return string
data.SentimentText = data.SentimentText.apply(lambda tweet: remove_unicode(tweet))data[1578592:1578602]
Image for post
  • Decode HTML entities:

    解码HTML实体:
data.SentimentText[599982]
Image for post
import HTMLParser  
html_parser = HTMLParser.HTMLParser() data.SentimentText = data.SentimentText.apply(lambda tweet: html_parser.unescape(tweet)) data.SentimentText[599982]
Image for post
  • Reduce all letters to lowercase:

    将所有字母都减小为小写:
data.SentimentText = data.SentimentText.str.lower() data.head(10)
Image for post
  • Replace all usernames/targets @ with ||target||:

    将所有用户名/目标@替换为||target||

pattern_usernames = "@\w{1,}"usernames_found = find_with_pattern(pattern_usernames)data.SentimentText = find_with_pattern(pattern_usernames, True, '||target||')data[45:55]
Image for post
  • Replace all acronyms with their translation:

    用其翻译替换所有首字母缩写词:

https://gist.github.com/BetterProgramming/fdcccacf21fa02a8a4d697da24a8cd54.js

https://gist.github.com/BetterProgramming/fdcccacf21fa02a8a4d697da24a8cd54.js

Image for post
for i, (acronym, value) in enumerate(top20acronyms):
print str(i + 1) + ") " + acronym + " => " + acronym_dictionary[acronym] + " : " + str(value)
Image for post
plt.close()
top20acronym_keys = [x[0] for x in top20acronyms]
top20acronym_values = [x[1] for x in top20acronyms]
indexes = np.arange(len(top20acronym_keys))
width = 0.7
plt.bar(indexes, top20acronym_values, width)
plt.xticks(indexes + width * 0.5, top20acronym_keys, rotation="vertical")
Image for post
  • Replace all negations (e.g: not, no, never) by tag ||not||.

    用标签||not||替换所有否定(例如:不,不,从不) 。

negation_dictionary = dict(zip(negation_words.Negation, negation_words.Tag))   def replace_negation(tweet):
return [negation_dictionary[word] if negation_dictionary.has_key(word) else word for word in tweet] data.SentimentText = data.SentimentText.apply(lambda tweet: replace_negation(tweet)) print data.SentimentText[29]
Image for post
  • Replace a sequence of repeated characters with two characters (e.g: “helloooo” = “helloo”) to keep the emphasized usage of the word.

    用两个字符代替重复的字符序列(例如:“ helloooo” =“ helloo”),以保持单词的强调用法。
data[1578604:]
Image for post
pattern = re.compile(r'(.)\1*')  def reduce_sequence_word(word):
return ''.join([match.group()[:2] if len(match.group()) > 2 else match.group() for match in pattern.finditer(word)]) def reduce_sequence_tweet(tweet):
return [reduce_sequence_word(word) for word in tweet] data.SentimentText = data.SentimentText.apply(lambda tweet: reduce_sequence_tweet(tweet)) data[1578604:]
Image for post

We’ve finished with the most important and tricky part of our Twitter sentiment analysis project, we can now apply our machine learning algorithms to the processed datasets.

我们已经完成了Twitter情绪分析项目中最重要,最棘手的部分,现在我们可以将机器学习算法应用于处理后的数据集。

步骤3: 应用机器学习算法 (Step 3: Applying Machine Learning Algorithms)

什么是机器学习? (What is machine learning?)

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

机器学习是人工智能(AI)的一种应用,它使系统能够自动学习并从经验中进行改进,而无需进行明确的编程。 机器学习专注于计算机程序的开发,该程序可以访问数据并使用它自己学习。

There are three major methods used to classify a sentence in a given category, in our case, positive(1) or negative(0): SVM, Naive Bayes, and N-Gram.

在给定类别中,可以使用三种主要方法对句子进行分类,在我们的例子中,这是肯定(1)或否定(0):SVM,朴素贝叶斯和N-Gram。

We have used only Naive Bayes and N-Gram which are the most commonly used in determining the sentiment of tweets.

我们仅使用了朴素贝叶斯(Naive Bayes)和N-Gram,它们是确定推文情感最常用的方法。

Let us start with Naive Bayes.

让我们从朴素贝叶斯开始。

朴素贝叶斯 (Naive Bayes)

Image for post
TK
传统知识

There are different types of Naive Bayes classifiers but we’ll be using the Multinomial Naive Bayes.

朴素贝叶斯分类器有不同类型,但我们将使用多项朴素贝叶斯。

基准线 (Baseline)

We use the Multinomial Naive Bayes as the learning algorithm with Laplace smoothing representing the classic way of doing text classification. Since we need to extract features from our data set of tweets, we use the bag of words model to represent it.

我们使用多项式朴素贝叶斯作为学习算法,拉普拉斯平滑表示经典的文本分类方法。 由于我们需要从推文数据集中提取特征,因此我们使用词袋模型来表示它。

The bag of words model is a simplifying representation of a document where it’s represented as a bag of its words without taking consideration of the grammar or word order. In-text classification, the frequency of each word is used as a feature for training a classifier.

单词袋模型是文档的简化表示,其中将文档表示为单词袋,而无需考虑语法或单词顺序。 在文本分类中,每个单词的出现频率用作训练分类器的功能。

For simplicity, we use the library sci-kit-learn.

为简单起见,我们使用库sci-kit-learn。

Let’s first start by dividing our data set into training and test set:

首先,将数据集分为训练集和测试集:

def make_training_test_sets(data):data_shuffled = data.iloc[np.random.permutation(len(data))]data_shuffled = data_shuffled.reset_index(drop=True)data_shuffled.SentimentText = data_shuffled.SentimentText.apply(lambda tweet: " ".join(tweet))positive_tweets = data_shuffled[data_shuffled.Sentiment == 1]negative_tweets = data_shuffled[data_shuffled.Sentiment == 0]positive_tweets_cutoff = int(len(positive_tweets) * (3./4.))negative_tweets_cutoff = int(len(negative_tweets) * (3./4.))training_tweets = pd.concat([positive_tweets[:positive_tweets_cutoff], negative_tweets[:negative_tweets_cutoff]])test_tweets = pd.concat([positive_tweets[positive_tweets_cutoff:], negative_tweets[negative_tweets_cutoff:]])training_tweets = training_tweets.iloc[np.random.permutation(len(training_tweets))].reset_index(drop=True)test_tweets = test_tweets.iloc[np.random.permutation(len(test_tweets))].reset_index(drop=True)return training_tweets, test_tweetstraining_tweets, test_tweets = make_training_test_sets(data)print "size of training set: " + str(len(training_tweets))
print "size of test set: " + str(len(test_tweets))
  • Size of training set: 1183958

    培训规模:1183958
  • Size of test set: 394654

    测试集的大小:394654

Once the training set and the test set are created we need a third set of data called the validation set. This is really useful because it will be used to validate our model against unseen data and tune the possible parameters of the learning algorithm to avoid underfitting and overfitting, for example.

创建训练集和测试集后,我们需要称为验证集的第三组数据。 这真的很有用,因为它将用于针对看不见的数据验证我们的模型,并调整学习算法的可能参数,例如,避免欠拟合和过拟合。

We need this validation set because our test set should be used only to verify how well the model will generalize. If we use the test set rather than the validation set, our model could be overly optimistic and twist our results.

我们需要此验证集,因为我们的测试集仅应用于验证模型的泛化程度。 如果我们使用测试集而不是验证集,那么我们的模型可能会过于乐观并扭曲我们的结果。

To make the validation set, there are two main options:

要创建验证集,有两个主要选项:

  • Split the training set into two parts (60%/20%) with a ratio of 2:8 where each part contains an equal distribution of example types. We train the classifier with the largest part and make predictions with the smaller one to validate the model. This technique works well but has the disadvantage of our classifier not getting trained and validated on all examples in the data set (without counting the test set).

    将训练集按2:8的比例分为两部分(60%/ 20%),其中每个部分包含示例类型的相等分布。 我们训练分类器的最大部分,并用较小的部分进行预测以验证模型。 该技术效果很好,但缺点是我们的分类器没有针对数据集中的所有示例进行训练和验证(不对测试集进行计数)。
  • The K-fold cross-validation. We split the data set into k parts, hold out one, combine the others and train on them, then validate against the held-out portion. We repeat that process k times (each fold), holding out a different portion each time. Then we average the score measured for each fold to get a more accurate estimation of our model’s performance.

    K折交叉验证。 我们将数据集分为k部分,提供一个部分,合并其他部分并对其进行训练,然后针对保留部分进行验证。 我们重复该过程k次(每次折叠),每次都保留不同的部分。 然后,我们对每次折叠的得分进行平均,以更准确地估算模型的性能。

We split the training data into ten folds and cross-validate them using scikit-learn:

我们将训练数据分为十个部分,并使用scikit-learn对其进行交叉验证:

from sklearn.cross_validation import KFold
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNBdef classify(training_tweets, test_tweets, ngram=(1, 1)):scores = []k_fold = KFold(n=len(training_tweets), n_folds=10)count_vectorizer = CountVectorizer(ngram_range=ngram)confusion = np.array([[0, 0], [0, 0]])for training_indices, validation_indices in k_fold:training_features = count_vectorizer.fit_transform(training_tweets.iloc[training_indices]['SentimentText'].values)training_labels = training_tweets.iloc[training_indices]['Sentiment'].valuesvalidation_features = count_vectorizer.transform(training_tweets.iloc[validation_indices]['SentimentText'].values)validation_labels = training_tweets.iloc[validation_indices]['Sentiment'].valuesclassifier = MultinomialNB()classifier.fit(training_features, training_labels)validation_predictions = classifier.predict(validation_features)confusion += confusion_matrix(validation_labels, validation_predictions)score = f1_score(validation_labels, validation_predictions)scores.append(score)return (sum(scores) / len(scores)), confusionscore, confusion = classify(training_tweets, test_tweets)print 'Total tweets classified: ' + str(len(training_tweets))
print 'Score: ' +  str(sum(scores) / len(scores))
print 'Confusion matrix:'
print(confusion)

Total tweets classified: 1183958

分类的总推文:1183958

Score: 0.77653600187

得分:0.77653600187

Confusion matrix: [[465021 126305][136321 456311]]

混淆矩阵:[[465021 126305] [136321 456311]]

We get about 0.77 using our baseline.

使用基线,我们得到约0.77。

N-Gram(语言模型) (N-Gram (Language Models ))

Image for post

Note: An important note is that n-gram classifiers are in fact a generalization of Naive Bayes. A unigram classifier with Laplace smoothing corresponds exactly to the traditional naive Bayes classifier.

注意 :重要的一点是,n-gram分类器实际上是朴素贝叶斯的概括。 具有拉普拉斯平滑的unigram分类器与传统的朴素贝叶斯分类器完全对应。

Since we use bag of words model, meaning we translate this sentence: “I don’t like chocolate” into “I”, “don’t”, “like”, “chocolate”, we could try to use bigram model to take care of negation with “don’t like” for this example. We are still going to use Laplace smoothing but we use the parameter ngram_range in CountVectorizer to add the bigram features.

由于我们使用词袋模型,这意味着我们将以下句子翻译:“我不喜欢巧克力”转换为“我”,“不喜欢”,“喜欢”,“巧克力”,我们可以尝试使用bigram模型在这个例子中,用“不喜欢”表示否定。 我们仍将使用拉普拉斯平滑,但我们在CountVectorizer中使用参数ngram_range来添加bigram功能。

score, confusion = classify(training_tweets, test_tweets, (2, 2))print 'Total tweets classified: ' + str(len(training_tweets)) 
print 'Score: ' + str(score)
print 'Confusion matrix:' print(confusion)
Image for post

Using only bigram features we have slightly improved our accuracy score of about 0.01. Based on that we could think of adding unigram and bigram should increase the accuracy score more.

仅使用bigram功能,我们的准确性得分略有提高,约为0.01。 基于此,我们可以考虑添加unigram和bigram可以进一步提高准确性得分。

score, confusion = classify(training_tweets, test_tweets, (1, 2))print 'Total tweets classified: ' + str(len(training_tweets))
print 'Score: ' + str(score)
print 'Confusion matrix:'
print(confusion)
Image for post

Indeed, the accuracy score of about 0.02 has improved compared to the baseline.

实际上,与基线相比,大约0.02的准确性得分有所提高。

结论 (Conclusion)

In this project, we tried to show a basic way of classifying tweets into positive or negative categories using Naive Bayes as a baseline. We also tried to show how language models are related to the Naive Bayes and can produce better results.

在此项目中,我们试图展示一种以朴素贝叶斯为基准将推文分为正面或负面类别的基本方法。 我们还试图说明语言模型与朴素贝叶斯的关系,并可以产生更好的结果。

This was our group’s final year project. We faced a lot of challenges digging into the details and selecting the right algorithm for the task. I hope you guys don’t have to go through the same process!

这是我们小组的最后一个项目。 我们在挖掘细节并为任务选择正确的算法时面临许多挑战。 希望你们不必经历相同的过程!

Since you have come all this far, I am sharing the code link with you guys (do give a star to the repository if you find it helpful). This is an open initiative to help those in need.

既然您到此为止,我将与大家共享代码链接 (如果发现有帮助,请在资源库中加注星号)。 这是一项开放的倡议,旨在帮助有需要的人。

Thanks for reading this article. I hope it’s helpful to you all!

感谢您阅读本文。 希望对您有帮助!

翻译自: https://medium.com/better-programming/twitter-sentiment-analysis-using-naive-bayes-and-n-gram-5df42ae4bfc6

朴素贝叶斯 半朴素贝叶斯

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392257.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

python3:面向对象(多态和继承、方法重载及模块)

1、多态 同一个方法在不同的类中最终呈现出不同的效果&#xff0c;即为多态。 class Triangle:def __init__(self,width,height):self.width widthself.height heightdef getArea(self):areaself.width* self.height / 2return areaclass Square:def __init__(self,size):sel…

蠕变断裂 ansys_如何避免范围蠕变,以及其他软件设计课程的辛苦学习方法

蠕变断裂 ansysby Dror Berel由Dror Berel 如何避免范围蠕变&#xff0c;以及其他软件设计课程的辛苦学习方法 (How to avoid scope creep, and other software design lessons learned the hard way) 从数据科学的角度来看。 (From a data-science perspective.) You’ve got…

leetcode 674. 最长连续递增序列

给定一个未经排序的整数数组&#xff0c;找到最长且 连续递增的子序列&#xff0c;并返回该序列的长度。 连续递增的子序列 可以由两个下标 l 和 r&#xff08;l < r&#xff09;确定&#xff0c;如果对于每个 l < i < r&#xff0c;都有 nums[i] < nums[i 1] &a…

深入单例模式 java,深入单例模式四

Java代码 privatestaticClass getClass(String classname)throwsClassNotFoundException {ClassLoader classLoader Thread.currentThread().getContextClassLoader();if(classLoader null)classLoader Singleton.class.getClassLoader();return(classLoader.loadClass(class…

linux下配置SS5(SOCK5)代理服务

SOCK5代理服务器 官网: http://ss5.sourceforge.net/ yum -y install gcc gcc-c automake make pam-devel openldap-devel cyrus-sasl-devel 一、安装 # tar xvf ss5-3.8.9-5.tar.gz # cd ss5-3.8.9-5 # ./configure && make && make install 二、修改配置文…

刘备和诸葛亮闹翻:无意说出蜀国灭亡的根源?

导读&#xff1a;身为管理者&#xff0c;一件事情&#xff0c;自己做是满分&#xff0c;别人做是八十分&#xff0c;宁可让人去做八十分&#xff0c;自己也得跳出来看全局。紧抓大权不放&#xff0c;要么自己干到死&#xff0c;要么是败于战略&#xff01;&#xff01; 诸葛亮去…

mysql 时间推移_随着时间的推移可视化COVID-19新案例

mysql 时间推移This heat map shows the progression of the COVID-19 pandemic in the United States over time. The map is read from left to right, and color coded to show the relative numbers of new cases by state, adjusted for population.该热图显示了美国COVID…

leetcode 959. 由斜杠划分区域(并查集)

在由 1 x 1 方格组成的 N x N 网格 grid 中&#xff0c;每个 1 x 1 方块由 /、\ 或空格构成。这些字符会将方块划分为一些共边的区域。 &#xff08;请注意&#xff0c;反斜杠字符是转义的&#xff0c;因此 \ 用 “\” 表示。&#xff09;。 返回区域的数目。 示例 1&#x…

rcu宽限期_如何处理宽限期错误:静默失败不是一种选择

rcu宽限期by Rina Artstain通过丽娜阿斯特斯坦 I’ve never really had much of an opinion about error handling. This may come as a shock to people who know me as quite opinionated (in a good way!), but yeah. If I was coming into an existing code base I just d…

描述符、迭代器、生成器

描述符&#xff1a;将某种特殊类型的类的实例指派给另一个类的属性。 此处特殊类型的要求&#xff0c;至少实现”__set__(self , instance , owner)“、”__get__(self , instance , value)“、”__delete__(self , instance )“三个方法中的一个。 >>> class MyDecri…

php模拟表单提交登录,PHP模拟表单的post请求实现登录

stuid > $stuid,pwd > $pwd);$ch curl_init (); //初始化curlcurl_setopt ( $ch, CURLOPT_URL, $uri );curl_setopt ( $ch, CURLOPT_POST, 1 ); //使用post请求curl_setopt ( $ch, CURLOPT_HEADER, 0 );curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );curl_setopt ( $…

去除list集合中重复项的几种方法

因为用到list&#xff0c;要去除重复数据&#xff0c;尝试了几种方法。记录于此。。。 测试数据&#xff1a; List<string> li1 new List<string> { "8", "8", "9", "9" ,"0","9"};List<string&g…

Crystal Reports第一张报表

新建一个网站项目&#xff0c;1. 设置数据库 从服务器资源管理器中&#xff0c;数据连接中添加新连接&#xff0c;用Microsoft Access数据库文件作为数据提供程序&#xff0c;连接上Crystal Reports的用例的数据库Xtreme2. 创建新Crystal Reports报表 在工程项目中添加一个…

leetcode 1128. 等价多米诺骨牌对的数量

给你一个由一些多米诺骨牌组成的列表 dominoes。 如果其中某一张多米诺骨牌可以通过旋转 0 度或 180 度得到另一张多米诺骨牌&#xff0c;我们就认为这两张牌是等价的。 形式上&#xff0c;dominoes[i] [a, b] 和 dominoes[j] [c, d] 等价的前提是 ac 且 bd&#xff0c;或是…

海量数据寻找最频繁的数据_寻找数据科学家的“原因”

海量数据寻找最频繁的数据Start with “Why” - Why do we do the work we do?从“为什么”开始-我们为什么要做我们所做的工作&#xff1f; The question of “Why” is always a big question. Plus, it always makes you look smart in a meeting!“ 为什么 ”的问题始终是…

C语言中局部变量和全局变量 变量的存储类别

C语言中局部变量和全局变量 变量的存储类别(static,extern,auto,register) 局部变量和全局变量在讨论函数的形参变量时曾经提到&#xff0c;形参变量只在被调用期间才分配内存单元&#xff0c;调用结束立即释放。这一点表明形参变量只有在函数内才是有效的&#xff0c;离开该函…

营销 客户旅程模板_我如何在国外找到开发人员的工作:我从营销到技术的旅程...

营销 客户旅程模板by Dimitri Ivashchuk由Dimitri Ivashchuk 我如何在国外找到开发人员的工作&#xff1a;我从营销到技术的旅程 (How I got a developer job abroad: my journey from marketing to tech) In this post, I’ll go into the details of how I, a Ukrainian mar…

keepalive的作用

keepalive的作用是实现高可用,通过VIP虚拟IP的漂移实现高可用.在相同集群内发送组播包,master主通过VRRP协议发送组播包,告诉从主的状态. 一旦主挂了从就选举新的主,实现高可用 LVS专属技能,通过配置文件控制lvs集群节点.对后端真实服务器进行健康检查. 转载于:https://www.cnb…

scrapy.Spider的属性和方法

scrapy.Spider的属性和方法 属性: name:spider的名称,要求唯一 allowed_domains:允许的域名,限制爬虫的范围 start_urls:初始urls custom_settings:个性化设置,会覆盖全局的设置 crawler:抓取器,spider将绑定到它上面 custom_settings:配置实例,包含工程中所有的配置变量 logge…

php时间操作函数总结,基于php常用函数总结(数组,字符串,时间,文件操作)

数组:【重点1】implode(分隔,arr) 把数组值数据按指定字符连接起来例如&#xff1a;$arrarray(1,2,3,4);$strimplode(-,$arr);explode([分隔],arr)按指定规则对一个字符串进行分割&#xff0c;返回值为数组 别名joinarray_merge()合并一个或多个数组array_combine(array keys, …