[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:

[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:

mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
.option("dbtable","accounts").option("user","training").option("password","training").load()

 

In [10]: mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
....: .option("dbtable","accounts").option("user","training").option("password","training").load()
17/10/03 05:59:53 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
17/10/03 05:59:53 INFO hive.HiveContext: Initializing metastore client version 1.1.0 using Spark classes.
17/10/03 05:59:53 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0-cdh5.7.0
17/10/03 05:59:53 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/03 05:59:56 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/10/03 05:59:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/10/03 05:59:56 INFO hive.metastore: Connected to metastore.
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/c2d22d09-7425-4bb3-94c3-39cb32267c7d_resources
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d/_tmp_space.db
17/10/03 05:59:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.

In [11]:


In [11]: type(mydf001)
Out[11]: pyspark.sql.dataframe.DataFrame

In [12]: mydf001.count()
17/10/03 06:00:29 INFO spark.SparkContext: Starting job: count at NativeMethodAccessorImpl.java:-2
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Registering RDD 2 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Got job 0 (count at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:30 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 11.0 KB, free 11.0 KB)
17/10/03 06:00:31 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 KB, free 16.1 KB)
17/10/03 06:00:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36793 (size: 5.2 KB, free: 208.8 MB)
17/10/03 06:00:31 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:31 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:31 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/03 06:00:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 1911 bytes)
17/10/03 06:00:31 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/03 06:00:32 INFO codegen.GenerateMutableProjection: Code generated in 425.82589 ms
17/10/03 06:00:32 INFO codegen.GenerateUnsafeProjection: Code generated in 78.278589 ms
17/10/03 06:00:33 INFO codegen.GenerateMutableProjection: Code generated in 84.676206 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeRowJoiner: Code generated in 60.144399 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeProjection: Code generated in 95.977074 ms
17/10/03 06:00:34 INFO jdbc.JDBCRDD: closed connection
17/10/03 06:00:34 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1334 bytes result sent to driver
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3081 ms on localhost (1/1)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/03 06:00:34 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at NativeMethodAccessorImpl.java:-2) finished in 3.163 s
17/10/03 06:00:34 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/10/03 06:00:34 INFO scheduler.DAGScheduler: running: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/10/03 06:00:34 INFO scheduler.DAGScheduler: failed: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 12.1 KB, free 28.3 KB)
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.6 KB, free 33.9 KB)
17/10/03 06:00:34 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:36793 (size: 5.6 KB, free: 208.8 MB)
17/10/03 06:00:34 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1999 bytes)
17/10/03 06:00:34 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 32 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 52.636353 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 49.757505 ms
17/10/03 06:00:35 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1666 bytes result sent to driver
17/10/03 06:00:35 INFO scheduler.DAGScheduler: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2) finished in 0.795 s
17/10/03 06:00:35 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 789 ms on localhost (1/1)
17/10/03 06:00:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/10/03 06:00:35 INFO scheduler.DAGScheduler: Job 0 finished: count at NativeMethodAccessorImpl.java:-2, took 6.451521 s
Out[12]: 129761

In [13]:

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/393882.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

ffmpeg mac 批量脚本_使用批处理脚本(BAT)调用FFMPEG批量编码视频

使用批处理脚本(BAT)编码视频非常方便,尤其当视频序列非常多的时候,更是省了不少简单重复性劳动。只要学会批处理里面几个基本的命令就行了,感觉和c/c差不多。set:设置变量(注意:变量一般情况下是字符串,而…

单实例oracle ha,Oracle单实例启动多个实例

Oracle单实例启动多个实例多实例运行,单个实例就是一个数据库!一个数据库对应多个实例是RAC。Linux建立oracle的实例步骤:1、在linux服务器的图形界面下,打开一个终端,输入如下的命令; xhost ###远程调用…

leetcode357. 计算各个位数不同的数字个数(回溯)

给定一个非负整数 n&#xff0c;计算各位数字都不同的数字 x 的个数&#xff0c;其中 0 ≤ x < 10n 。示例:输入: 2 输出: 91 解释: 答案应为除去 11,22,33,44,55,66,77,88,99 外&#xff0c;在 [0,100) 区间内的所有数字。代码 class Solution {int numbers0;public int …

Shell编程 之 for 循环

1. 语法结构 2. 案例 2.1 批量解压缩 #!/bin/bashcd /root/test/ ls *.tar.gz > ls.log ls *.tgz >> ls.logfor i in $( cat ls.log )dotar -zxf $i &> /dev/nulldone rm -rf ls.log ~ …

react实战课程_在使用React一年后,我学到的最重要的课程

react实战课程by Tomas Eglinskas由Tomas Eglinskas 在使用React一年后&#xff0c;我学到的最重要的课程 (The most important lessons I’ve learned after a year of working with React) Starting out with a new technology can be quite troublesome. You usually find …

化工原理物性参数_化工原理知识点总结整理

1一、流体力学及其输送1.单元操作&#xff1a;物理化学变化的单个操作过程&#xff0c;如过滤、蒸馏、萃取。2.四个基本概念&#xff1a;物料衡算、能量衡算、平衡关系、过程速率。3.牛顿粘性定律&#xff1a;FτAμAdu/dy&#xff0c;(F&#xff1a;剪应力&#xff1b;A&#…

leetcode1415. 长度为 n 的开心字符串中字典序第 k 小的字符串(回溯)

一个 「开心字符串」定义为&#xff1a;仅包含小写字母 [a, b, c]. 对所有在 1 到 s.length - 1 之间的 i &#xff0c;满足 s[i] ! s[i 1] &#xff08;字符串的下标从 1 开始&#xff09;。 比方说&#xff0c;字符串 "abc"&#xff0c;"ac"&#xff0c…

8、linux上安装hbase

1.基本信息 版本1.2.4安装机器三台机器账号hadoop源路径/opt/software/hbase-1.2.4-bin.tar.gz目标路径/opt/hbase -> /opt/hbase-1.2.4依赖关系无2.安装过程 1).使用hadoop账号解压到/opt/hadoop目录下并设置软连接&#xff1a; [rootbgs-5p173-wangwenting opt]# su hadoo…

c oracle 记录,ORACLE 19c 操作相关记录

#数据源导出导入#导出exp oracle/oraclelocalhost:1521/orcl file/home/oracle/dmp/oracle20191120.dmp owneroracle log/home/oracle/dmp/log.log#导入imp oracletest/oracletestlocalhost:1521/orcl file/home/oracle/dmp/oracle20191120.dmp fully ignorey log/home/oracle…

TensorFlow.js快速入门

by Pau Pavn通过保罗帕文(PauPavn) TensorFlow.js快速入门 (A quick introduction to TensorFlow.js) TensorFlow has been around for a while now. Until last month, though, it was only available for Python and a few other programming languages, like C and Java. A…

Mountain Number FZU-2109数位dp

Mountain NumberFZU-2109 题目大意&#xff1a;一个大于0的数字x&#xff0c;分写成xa[0]a[1]a[2][3]..a[n]的形式&#xff0c;&#xff08;比如x1234,a[0]1,a[1]2,a[3]3,a[3]4&#xff09;,Mountain Number要满足对于a[2*i1]要大于等于a[2*i]和a[2*i2]&#xff0c;给定范围l,r…

[10.5模拟] dis

题意&#xff1a;给你一个主串&#xff0c;两个分串&#xff0c;要求两个分串的距离最大&#xff0c;两个分串的距离定义为第一个分串的最右边的字符和第二个分串的最左边的字符之间的字符数 题解&#xff1a; 直接kmp匹配两个分串即可 注&#xff1a;kmp匹配时&#xff0c;当分…

什么是非集计模型_集计与非集计模型的关系

集计与非集计模型的关系Wardrop第一.第二平衡原理集计模型在传统的交通规划或交通需求预测中&#xff0c;通常首先将对象地区或群体划分为若干个小区或群体等特定的集合体&#xff0c;然后以这些小区或群体为基本单位&#xff0c;展开问题的讨论。因此&#xff0c;在建立模型或…

微软dns能做cname吗_为什么域的根不能是CNAME以及有关DNS的其他花絮

微软dns能做cname吗This post will use the above question to explore DNS, dig, A records, CNAME records, and ALIAS/ANAME records from a beginner’s perspective. So let’s get started.这篇文章将使用上述问题从初学者的角度探讨DNS &#xff0c; dig &#xff0c; A…

Java Timestamp Memo

timestamp的构造函数&#xff0c;把微妙作为纳秒存储&#xff0c;所以 Java.util.date.comepareTo(Timestamp) 结果肯定是1另外&#xff0c;​Timestamp.equal(object) 如果参数不是Timestamp&#xff0c;肯定返回false。Timestamps nanos value is NOT the number of nanoseco…

oracle虚拟机字符集,更改虚拟机上的oracle字符集

修改oracle上边的字符集,需要用到DBA数据库管理员的权限,再修改字符集时要注意到修改后的字符集只能范围变大(例如:当前的字符集是GBK,那你修改后可以是UTF-8就是说后者只能比前者大,不能小.因为字符集都是向下兼容的)步骤:第一步:使用DBA身份登录先以绕过日志的方式登录在以然…

mybaits自连接查询

看不太懂&#xff0c;先记录再查&#xff0c;有没有大大解释下 resultmap里的collection设置select字段&#xff0c;看着像递归&#xff0c;没见过这种用法&#xff0c;#{pid}从何而来&#xff1f; 转载于:https://www.cnblogs.com/haon/p/10808739.html

token要加编码decode吗_彻底弄明白Base64 编码

Base64 encoding/decoding常见于各种authentication和防盗链的实现当中。彻底搞懂它绝对提升团队troubleshooting的底气。我们从纯手工方式编码解码开始&#xff0c;然后看看学到的技能怎么样应用在实际的troubleshooting 中。准备工作&#xff1a;我们应知道一个byte有8个bits…

oracle的oradata,Oracle使用oradata恢复数据库

SQL> host del D:\oracle\ora92\database\PWDoracle.ORASQL> host orapwd fileD:\oracle\ora92\DATABASE\PWDoracle.ORA passwordsystem entries10SQL> alter database open;数据库已更改。SQL> conn system/system as sysdba已连接。SQL> shutdown immediate数…

Jenkins连接TFS出现错误:“jenkins com.microsoft.tfs.core.exceptions.TECoreException”的问题收集...

没成功解决过&#xff0c;下面提供一些收集的链接地址&#xff0c;因为这个问题真的很少。 https://social.msdn.microsoft.com/Forums/vstudio/en-US/1a75a0b2-4591-4edd-999a-9696149c8144/integration-with-jenkins?forumtfsintegration http://www.itgo.me/a/900879197026…