一、使用rand示例如下
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.functions.rand
val df = sc.parallelize(Seq(
(1L, "foo"), (2L, "bar"), (3L, "baz"))).toDF("x", "y")
// Exiting paste mode, now interpreting.
scala> df.show()
+---+---+
| x| y|
+---+---+
| 1|foo|
| 2|bar|
| 3|baz|
+---+---+
scala> df.select(((rand * Long.MaxValue)).cast("long").alias("rnd")).show
+-------------------+
| rnd|
+-------------------+
|3002341044871192576|
|2234781058048201728|
|2413479811152188416|
+-------------------+
scala> df.select(((rand * 10)).cast("long").alias("rnd")).show
+---+
|rnd|
+---+
| 3|
| 8|
| 6|
+---+
二、使用monotonicallyIncreasingId示例如下
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.functions.rand
val df = sc.parallelize(Seq(
(1L, "foo"), (2L, "bar"), (3L, "baz"))).toDF("x", "y")
// Exiting paste mode, now interpreting.
scala> import org.apache.spark.sql.functions.monotonicallyIncreasingId
import org.apache.spark.sql.functions.monotonicallyIncreasingId
scala> df.select(monotonicallyIncreasingId).show()
warning: there was one deprecation warning; re-run with -deprecation for details
+-----------------------------+
|monotonically_increasing_id()|
+-----------------------------+
| 17179869184|
| 42949672960|
| 60129542144|
+-----------------------------+
注意:本文归作者所有,未经作者允许,不得转载