Spark 生成随机列

刘超 18天前 ⋅ 121 阅读   编辑

一、使用rand示例如下

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.functions.rand

val df = sc.parallelize(Seq(
  (1L, "foo"), (2L, "bar"), (3L, "baz"))).toDF("x", "y")

// Exiting paste mode, now interpreting.

scala> df.show()
+---+---+
|  x|  y|
+---+---+
|  1|foo|
|  2|bar|
|  3|baz|
+---+---+

scala> df.select(((rand * Long.MaxValue)).cast("long").alias("rnd")).show
+-------------------+
|                rnd|
+-------------------+
|3002341044871192576|
|2234781058048201728|
|2413479811152188416|
+-------------------+

scala> df.select(((rand * 10)).cast("long").alias("rnd")).show
+---+
|rnd|
+---+
|  3|
|  8|
|  6|
+---+

二、使用monotonicallyIncreasingId示例如下

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.functions.rand

val df = sc.parallelize(Seq(
  (1L, "foo"), (2L, "bar"), (3L, "baz"))).toDF("x", "y")

// Exiting paste mode, now interpreting.

scala> import org.apache.spark.sql.functions.monotonicallyIncreasingId
import org.apache.spark.sql.functions.monotonicallyIncreasingId

scala> df.select(monotonicallyIncreasingId).show()
warning: there was one deprecation warning; re-run with -deprecation for details
+-----------------------------+
|monotonically_increasing_id()|
+-----------------------------+
|                  17179869184|
|                  42949672960|
|                  60129542144|
+-----------------------------+

注意:本文归作者所有,未经作者允许,不得转载

全部评论: 0

    我有话说: