ValueError: Expected 2D array, got 1D array instead:

最新推荐文章于 2024-01-16 15:48:55 发布

原创最新推荐文章于 2024-01-16 15:48:55 发布 · 1.5k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#pyspark svm

本文记录了使用Apache Spark进行支持向量机(SVM)训练过程中遇到的问题及解决方案。主要问题为预测阶段出现的一维数组错误，并给出了正确的特征reshape方法。

18/08/19 22:01:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Parsing data
Fitting model
train
Currently 8 partitions left
Size of data: 60000
Currently 4 partitions left
Size of data: 28024
Currently 2 partitions left
Warning: 456.0 relevant support vectors thrown away!
('model.support:', array([   0,    1,    3, ..., 7018, 7020, 7021], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 484.0 relevant support vectors thrown away!
('model.support:', array([   0,    1,    3, ..., 7065, 7066, 7067], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 348.0 relevant support vectors thrown away!
('model.support:', array([   0,    1,    2, ..., 6891, 6892, 6893], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 453.0 relevant support vectors thrown away!
('model.support:', array([   1,    2,    4, ..., 7036, 7038, 7039], dtype=int32), ' self.max_sv:', 5000.0)
Size of data: 20000
Warning: 2935.0 relevant support vectors thrown away!
('model.support:', array([   9,   19,   23, ..., 9975, 9976, 9984], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 2923.0 relevant support vectors thrown away!
('model.support:', array([  13,   35,   42, ..., 9968, 9970, 9996], dtype=int32), ' self.max_sv:', 5000.0)
Time: 845.46
Predicting outcomes training set
18/08/19 22:16:16 ERROR Executor: Exception in task 1.0 in stage 9.0 (TID 19)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 229, in main
    process()
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 224, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 362, in func
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1056, in <lambda>
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1056, in <genexpr>
  File "/home/hduser/pythonwork/CVM/bin/mnist.py", line 57, in <lambda>
    labelsAndPredsTrain = trainRDD.map(lambda p: (p.label, model.predict(p.features)))
  File "/home/hduser/pythonwork/CVM/cvm/svm.py", line 20, in predict
    return self.model.predict(features)
  File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 548, in predict
    y = super(BaseSVC, self).predict(X)
  File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 308, in predict
    X = self._validate_for_predict(X)
  File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 439, in _validate_for_predict
    X = check_array(X, accept_sparse='csr', dtype=np.float64, order="C")
  File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.08627451 0.         0.         0.01960784
 0.2745098  0.2745098  0.2745098  0.2745098  0.2745098  0.2745098
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.00784314 0.30980392 0.91372549
 0.98039216 0.96862745 0.96862745 0.96862745 0.99607843 0.98039216
 0.93333333 0.93333333 0.84313725 0.47843137 0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.02352941 0.22745098
 0.41176471 0.61176471 0.99607843 0.99607843 0.9372549  0.78823529
 0.45098039 0.54117647 0.49803922 0.18431373 0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.03529412 0.61176471 0.99607843 0.51764706 0.14901961
 0.55294118 0.56862745 0.09411765 0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.63921569
 0.9254902  0.24313725 0.00392157 0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.11764706 0.98039216 0.33333333 0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.19215686 0.98823529 0.58431373 0.09803922 0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.02745098 0.23529412
 0.99607843 0.94509804 0.80392157 0.34901961 0.         0.
 0.         0.         0.         0.01960784 0.44313725 0.18431373
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.04705882 0.49411765
 0.98039216 1.         0.75686275 0.02745098 0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.05098039 0.31372549 0.8627451
 0.94901961 0.45882353 0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.1254902  0.8        0.96078431
 0.35294118 0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.03529412 0.30980392 0.00784314 0.         0.         0.
 0.         0.         0.02352941 0.89019608 0.81176471 0.01568627
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.32156863 0.6745098
 0.03137255 0.         0.         0.         0.         0.
 0.23529412 0.78039216 0.74509804 0.05098039 0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.54117647 0.96862745 0.58039216 0.5254902
 0.24313725 0.24313725 0.24313725 0.50196078 0.92156863 0.84705882
 0.41176471 0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.18039216 0.6        0.74509804 0.78039216 0.96470588 0.96470588
 0.96470588 0.92941176 0.6627451  0.16862745 0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.        ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
    at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
    at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
    at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

解决方法:

    labelsAndPredsTrain = trainRDD.map(lambda p: (p.label, model.predict(p.features.reshape(1,-1))))

在feature后增加reshape方法.
说明:

>>> a = [1,2,3,4]
>>> import numpy as np
>>> b = np.array(a)
>>> c = b.reshape(1,-1)
>>> print c
[[1 2 3 4]]
>>> print b
[1 2 3 4]
>>> print a
[1, 2, 3, 4]