split后获得的每行的第二个值相同

刘超 13天前 ⋅ 230 阅读   编辑

一、描述

   pandas0.24.2 assign split后获得的每行的第二个值相同,如下

>>> import pandas as pd
>>> data = {'four':[0,1,2], 'creative_id':["crid:a3703069995904:0801680679857923089","crid:a3703069995904:0801680680818139138","crid:a3703069995924:0801680683271370802"]}
>>> readDF=pd.DataFrame(data)
>>> aaaa = readDF.assign(ad_id=lambda x: x['creative_id'].str.split(':')[0][1])
>>> aaaa
                               creative_id  four           ad_id
0  crid:a3703069995904:0801680679857923089     0  a3703069995904
1  crid:a3703069995904:0801680680818139138     1  a3703069995904
2  crid:a3703069995924:0801680683271370802     2  a3703069995904

二、分析

  1、使用values[行,列]获取数据(:表示所有行,比如下面匹配所有行),但是报错

>>> aaaa = readDF.assign(ad_id=lambda x: x['creative_id'].str.split(':').values[:,0])
Traceback (most recent call last):
  File "", line 1, in
  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 3556, in assign
    results[k] = com.apply_if_callable(v, data)
  File "/Library/Python/2.7/site-packages/pandas/core/common.py", line 329, in apply_if_callable
    return maybe_callable(obj, **kwargs)
  File "", line 1, in
IndexError: too many indices for array

  2、添加下标[0],也不是想要的,它把第一行数据拆分填充该列中了,如下

>>> aaaa = readDF.assign(ad_id=lambda x: x['creative_id'].str.split(':').values[:][0])
>>> aaaa
                               creative_id  four                ad_id
0  crid:a3703069995904:0801680679857923089     0                 crid
1  crid:a3703069995904:0801680680818139138     1       a3703069995904
2  crid:a3703069995924:0801680683271370802     2  0801680679857923089

  3、网上查到,可以如下使用,看了半天,才发现,还是把某行数据填充到列中去,我想分割某列,提取下标为1的数据,将其生成新列,这种方式实现不了

>>> import pandas as pd
>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2']})
>>> df
      AB
0  A1-B1
1  A2-B2
>>> df['AB_split'] = df['AB'].str.split('-')
>>> df
      AB  AB_split
0  A1-B1  [A1, B1]
1  A2-B2  [A2, B2]
>>> df['AB_split'] = df['AB'].str.split('-')[1]
>>> df
      AB AB_split
0  A1-B1       A2
1  A2-B2       B2

  4、使用iloc[:,[0]]也报错

>>> print readDF['creative_id'].str.split(':').iloc[:,[0]]
Traceback (most recent call last):
  File "", line 1, in
  File "/Library/Python/2.7/site-packages/pandas/core/indexing.py", line 1494, in __getitem__
    return self._getitem_tuple(key)
  File "/Library/Python/2.7/site-packages/pandas/core/indexing.py", line 2143, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "/Library/Python/2.7/site-packages/pandas/core/indexing.py", line 221, in _has_valid_tuple
    raise IndexingError('Too many indexers')
pandas.core.indexing.IndexingError: Too many indexers

  5、查看Series.str.split使用说明,expand设为True可以把分割结果转换为Dataframe,然后通过[1]就可以了,如下

>>> import pandas as pd
>>> data = {'four':[0,1,2], 'creative_id':["crid:a3703069995904:0801680679857923089","crid:a3703069995904:0801680680818139138","crid:a3703069995924:0801680683271370802"]}
>>> readDF=pd.DataFrame(data)
>>> aaaa = readDF.assign(ad_id=lambda x: x['creative_id'].str.split(pat=":", expand=True)[1])
>>> aaaa
                               creative_id  four           ad_id
0  crid:a3703069995904:0801680679857923089     0  a3703069995904
1  crid:a3703069995904:0801680680818139138     1  a3703069995904
2  crid:a3703069995924:0801680683271370802     2  a3703069995924

三、解决方法

  使用x['creative_id'].str.split(pat=":", expand=True)[1]获取


注意:本文归作者所有,未经作者允许,不得转载

全部评论: 0

    我有话说: