北方工业大学gpa计算_北方大学联盟仓库的探索性分析

北方工业大学gpa计算

This is my firts publication here and i will start simple.

这是我的第一篇出版物,这里我将简单介绍

I want to make an exploratory data analysis of UFRN’s warehouse and answer some questions about the data using Python and Power BI.

我想对UFRN的仓库进行探索性数据分析,并使用Python和Power BI回答有关数据的一些问题。

I have downloaded the dataset from “dados.gov.br”, wich is an open source website of public data from Brazil. You can have the used dataset here 👈

我已经从“ dados.gov.br ”下载了数据集,该数据集是来自巴西的公共数据的开源网站。 您可以在这里拥有使用的数据集👈

QUESTIONS TO BE ANSWERED:

需要回答的问题:

  1. What are the items with the greatest quantity in stock;

    库存量最大物品是什么?

  2. What are the items with the greatest value in stock;

    库存中价值最大物品是什么?

  3. Wich warehouse has the greatest quantity of items in stock and wich has the lower quantity;

    Wich 仓库的库存物品最多,而数量 较少的 仓库 ;

  4. Wich warehouse has the greatest value in stock;

    Wich 仓库的库存价值 最大

  5. What are the most expensive items.

    什么是最昂贵的物品。

So, let’s start importing pandas in the Jupyter Notebook and opening the csv file:

因此,让我们开始在Jupyter Notebook中导入熊猫并打开csv文件:

Image for post

Here ☝ we have our first problem:

☝我们有第一个问题:

File is strange. Seems that all informations are in the same column…

文件很奇怪 。 似乎所有信息都在同一列中…

This factor makes it impossible to analyse the dataframe by column as we can see below:

这个因素使我们无法按列分析数据帧,如下所示:

Image for post

The problem in this case is the separator. Pandas uses the comma as default delimiter (or separator), and in this file, the delimiter is semicolon.

在这种情况下,问题是分隔符。 Pandas使用逗号作为默认定界符(或分隔符),在此文件中,定界符为分号。

So, we need to inform the delimiter right after the path file:

因此,我们需要在路径文件之后立即通知定界符:

Image for post

Now, here we go! ✌

现在,我们开始! ✌

Let’s check some informations about the dataset, like columns, data types, lenght, and missing values.

让我们检查有关数据集的一些信息,例如数据类型长度缺失

Note that we have 5 columns:

请注意,我们有5列:

Image for post
Image for post

material (Item): object;almoxarifado (Warehouse): object;saldo (balance): float;preco (price): object;valor_total (Amount): object;

材料 (项目):对象; almoxarifado (仓库):对象; saldo (平衡):浮动; preco (价格):对象; valor_total (金额):对象;

If we had to make some operations between columns, we should probably change some object types to integer or float values, but in this case we won’t need.

如果必须在列之间进行某些操作,则可能应将某些对象类型更改为整数或浮点值,但在这种情况下,我们将不需要。

Now we can analyse data through columns

现在我们可以通过列分析数据

Image for post

☝ Above we have the second issue:

☝以上是第二个问题:

There are 2 different names for the same warehouse: “MEJC — ALMOXARIFADO NUTRIÇÃO” and “MEJC — ALMOXARIFADO DE NUTRIÇÃO”.

有对同一仓库2个不同的名字:“MEJC - ALMOXARIFADONUTRIÇÃO”和“MEJC - ALMOXARIFADO DENUTRIÇÃO”。

It’s a typo, so let’s make them one.

这是一个错字,因此让我们将它们设为一个。

Image for post

Done!

做完了!

A very quick and easy way to have a lot of information in one code line is using Pandas Profilling.

使用Pandas Profilling是在一个代码行中具有很多信息的非常快速,简便的方法。

Analysing the earlier infos and Pandas Profilling below, we have: 9087 rows and 4536 distinct items. 49.9% unique items and 0 missing cells.

分析下面的早期信息和Pandas Profilling,我们有9087行4536个不同的项目。 49.9%的独特物品和0个丢失的单元格。

It means that dataframe doesn’t need much effort in cleaning and treatment.

这意味着数据帧在清理和处理方面不需要太多的工作。

Image for post

This dataset is small and brings us just a few columns. Like i said, first publication, simple things (sorry 😬)

这个数据集很小,只给我们带来了几列。 就像我说的那样,第一次出版,简单的事情(对不起😬)

Excepting “saldo”, “preco” and “valor_total”, the columns are categorical data. So now, we can go to Power BI to make some visualizations and answer some questions.

除“ saldo”,“ preco”和“ valor_total”外,这些列均为分类数据。 现在,我们可以去Power BI进行可视化并回答一些问题。

But first, we need to export the dataset we edited in Jupyter Notebook:

但是首先,我们需要导出在Jupyter Notebook中编辑的数据集:

Image for post

With the file saved in the same folder of the actual project, we just need to open it in Power BI

将文件保存在实际项目的同一文件夹中,我们只需要在Power BI中打开它

The data is configured by default as a “Western European” source. Because of this, some words in brazilian portuguese are misspelled

默认情况下,数据配置为“西欧”来源。 因此,巴西葡萄牙语中的某些单词 拼写错误

Image for post

We solve this issue choosing “Unicode — UTF8” as the file source

我们选择“ Unicode — UTF8”作为文件源来解决此问题

Image for post

At the Power Query editor, we can see that the columns are already in the right format (Text, Text, Role Number, Currency, Currency). So it doesn’t need any transformation, just close and apply.

在Power Query编辑器中,我们可以看到列的格式已经正确(文本,文本,角色编号,货币,货币)。 因此,它不需要任何转换,只需关闭并应用即可。

Image for post

LET’S GO TO THE QUESTIONS!

让我们去解决问题!

  1. What are the items with the greatest quantity in stock:

    什么是物品 库存数量最多的库存:

Image for post

Almost 2 millions syringes and double-distilled water.

近200万支注射器和双蒸馏水

2. What are the items with the greatest value in stock:

2.什么是物品 库存最大的价值

Image for post

WOW! Wait..close to 1 million $ in coffee??? Seems that teachers are driven by coffee! 😂

哇! 等待..接近一百万美元的咖啡 ??? 看来老师是被咖啡驱动的! 😂

Note that the value of these 2 items is very different from the others (could they be outliers?)

请注意,这两项的价值与其他项有很大不同 (它们可能是离群值吗?)

3. Wich warehouse has the greatest quantity of items in stock and wich has the lower quantity:

3. Wich 仓库中的物品数量最多 ,而其中的物品数量 较少:

Image for post

Despite these values are being rounded, the medicine warehouse has the largest number of items, followed by the infra-structure warehouse.

尽管对这些值进行了四舍五入,但药品仓库的物品数量最多 ,其次是基础结构仓库。

Image for post

And the museum warehouse has de lower number of items.

博物馆仓库的物品数量较少

4. Wich warehouse has the greatest value in stock:

4. Wich 仓库的库存最大 价值

Image for post

We saw earlier that the Medicine Warehouse has the largest number of items, slightly larger than that of Infrastructure. But the stock value of the Infra Warehouse is twice that of Medicine.

前面我们看到, 药品仓库的物品数量最多 ,比基础设施的数量略大。 但是Infra仓库的库存价值 医学的两倍

5. What are the most expensive items.

5.什么是最昂贵的物品。

Image for post

Ok, so a 250g package of coffee costs 30.4 thousands?? Sounds weird 🤔

好的,一包250克咖啡的价格为30,400欧元? 听起来很奇怪🤔

Image for post
Image for post

Well, it’s true! 😐 Also 11 thousand $ in plastic cups.

好吧,这是真的! 😐还有一万一千美元的塑料杯。

Another strange thing that i realize is that 107 items have 1 cent as a price:

我意识到的另一件奇怪的事是107件商品的价格为1美分

Image for post
Image for post

Does all these items really costs just 1 cent? Someone will need to answer it.

所有这些物品真的只花费1美分吗? 有人需要回答。

SO WHAT??

所以呢??

Analysing this dataset we have seen that there is A LOT of money spent in coffe and plastic cup. ALMOST 1 MILLION!!!??? 💸💸💸

通过分析该数据集,我们发现在咖啡和塑料杯上花费了很多钱。 几乎一百万!!!!!! 💸💸💸

Maybe something it’s wrong…With this information i would probably check the veracity of the facts. Perhaps, the item’s description is not as much meaningfull as it could be (how big is this package?)

也许有些问题……有了这些信息,我可能会检查事实的真实性。 也许,该项目的描述没有那么有意义(此包装有多大?)

Some inconclusions can turn into conclusions. In this case, some data curation could be needed.

一些结论可以得出结论。 在这种情况下,可能需要一些数据管理。

If you have some tips, comments or any suggestions, i would be very happy to learn more with you guys. I’m just a begginer.

如果您有任何提示,意见或建议,我将非常高兴与你们一起学习更多。 我只是一个初学者。

Thank you for your time! 🙏

感谢您的时间! 🙏

翻译自: https://medium.com/análises-exploratórias-de-dados/exploratory-analysis-of-ufrn-universidade-federal-do-rio-grande-do-norte-warehouses-e6f2ff334b0f

北方工业大学gpa计算

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389288.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

泰坦尼克数据集预测分析_探索性数据分析-泰坦尼克号数据集案例研究(第二部分)

泰坦尼克数据集预测分析Data is simply useless until you don’t know what it’s trying to tell you.除非您不知道数据在试图告诉您什么,否则数据将毫无用处。 With this quote we’ll continue on our quest to find the hidden secrets of the Titanic. ‘The …

关于我

我是谁? Who am I?这是个哲学问题。。 简单来说,我是Light,一个靠前端吃饭,又不想单单靠前端吃饭的Coder。 用以下几点稍微给自己打下标签: 工作了两三年,对,我是16年毕业的90后一直…

基于PyTorch搭建CNN实现视频动作分类任务代码详解

数据及具体讲解来源: 基于PyTorch搭建CNN实现视频动作分类任务 import torch import torch.nn as nn import torchvision.transforms as T import scipy.io from torch.utils.data import DataLoader,Dataset import os from PIL import Image from torch.autograd…

missforest_missforest最佳丢失数据插补算法

missforestMissing data often plagues real-world datasets, and hence there is tremendous value in imputing, or filling in, the missing values. Unfortunately, standard ‘lazy’ imputation methods like simply using the column median or average don’t work wel…

华硕猛禽1080ti_F-22猛禽动力回路的视频分析

华硕猛禽1080tiThe F-22 Raptor has vectored thrust. This means that the engines don’t just push towards the front of the aircraft. Instead, the thrust can be directed upward or downward (from the rear of the jet). With this vectored thrust, the Raptor can …

Memory-Associated Differential Learning论文及代码解读

Memory-Associated Differential Learning论文及代码解读 论文来源: 论文PDF: Memory-Associated Differential Learning论文 论文代码: Memory-Associated Differential Learning代码 论文解读: 1.Abstract Conventional…

大数据技术 学习之旅_如何开始您的数据科学之旅?

大数据技术 学习之旅Machine Learning seems to be fascinating to a lot of beginners but they often get lost into the pool of information available across different resources. This is true that we have a lot of different algorithms and steps to learn but star…

数据可视化工具_数据可视化

数据可视化工具Visualizations are a great way to show the story that data wants to tell. However, not all visualizations are built the same. My rule of thumb is stick to simple, easy to understand, and well labeled graphs. Line graphs, bar charts, and histo…

Android Studio调试时遇见Install Repository and sync project的问题

我们可以看到,报的错是“Failed to resolve: com.android.support:appcompat-v7:16.”,也就是我们在build.gradle中最后一段中的compile项内容。 AS自动生成的“com.android.support:appcompat-v7:16.”实际上是根据我们的最低版本16来选择16.x.x及以上编…

VGAE(Variational graph auto-encoders)论文及代码解读

一,论文来源 论文pdf Variational graph auto-encoders 论文代码 github代码 二,论文解读 理论部分参考: Variational Graph Auto-Encoders(VGAE)理论参考和源码解析 VGAE(Variational graph auto-en…

tableau大屏bi_Excel,Tableau,Power BI ...您应该使用什么?

tableau大屏biAfter publishing my previous article on data visualization with Power BI, I received quite a few questions about the abilities of Power BI as opposed to those of Tableau or Excel. Data, when used correctly, can turn into digital gold. So what …

网络编程 socket介绍

Socket介绍 Socket是应用层与TCP/IP协议族通信的中间软件抽象层,它是一组接口。在设计模式中,Socket其实就是一个门面模式,它把复杂的TCP/IP协议族隐藏在Socket接口后面,对用户来说,一组简单的接口就是全部。 Socket通…

BP神经网络反向传播手动推导

BP神经网络过程: 基本思想 BP算法是一个迭代算法,它的基本思想如下: 将训练集数据输入到神经网络的输入层,经过隐藏层,最后达到输出层并输出结果,这就是前向传播过程。由于神经网络的输出结果与实际结果…

使用python和pandas进行同类群组分析

背景故事 (Backstory) I stumbled upon an interesting task while doing a data exercise for a company. It was about cohort analysis based on user activity data, I got really interested so thought of writing this post.在为公司进行数据练习时,我偶然发…

搜索引擎优化学习原理_如何使用数据科学原理来改善您的搜索引擎优化工作

搜索引擎优化学习原理Search Engine Optimisation (SEO) is the discipline of using knowledge gained around how search engines work to build websites and publish content that can be found on search engines by the right people at the right time.搜索引擎优化(SEO…

Siamese网络(孪生神经网络)详解

SiameseFCSiamese网络(孪生神经网络)本文参考文章:Siamese背景Siamese网络解决的问题要解决什么问题?用了什么方法解决?应用的场景:Siamese的创新Siamese的理论Siamese的损失函数——Contrastive Loss损失函…

Dubbo 源码分析 - 服务引用

1. 简介 在上一篇文章中,我详细的分析了服务导出的原理。本篇文章我们趁热打铁,继续分析服务引用的原理。在 Dubbo 中,我们可以通过两种方式引用远程服务。第一种是使用服务直联的方式引用服务,第二种方式是基于注册中心进行引用。…

一件登录facebook_我从Facebook的R教学中学到的6件事

一件登录facebookBetween 2018 to 2019, I worked at Facebook as a data scientist — during that time I was involved in developing and teaching a class for R beginners. This was a two-day course that was taught about once a month to a group of roughly 15–20 …

SiameseFC超详解

SiameseFC前言论文来源参考文章论文原理解读首先要知道什么是SOT?(Siamese要做什么)SiameseFC要解决什么问题?SiameseFC用了什么方法解决?SiameseFC网络效果如何?SiameseFC基本框架结构SiameseFC网络结构Si…

Python全栈工程师(字符串/序列)

ParisGabriel Python 入门基础字符串:str用来记录文本信息字符串的表示方式:在非注释中凡是用引号括起来的部分都是字符串‘’ 单引号“” 双引号 三单引""" """ 三双引有内容代表非空字符串否则是空字符串 区别&#xf…