D08 Pandas DataFrame 插入、關聯、修改、刪除方法合集

title： D08|Pandas DataFrame插入、關聯、修改、刪除

author： Adolph Lee

categories：資料探勘基礎

tags：

Python

資料探勘基礎

Pandas

DataFrame

插入

insert 插入列

insert（self， loc， column， value， allow_duplicates=False）

loc 插入列索引的位置

column 插入列的名稱

value 插入值可以是整數、Series或者相同結構的陣列

是否允許列索引名稱重複，預設為False 當列索引名稱重複時丟擲異常

返回值為None，其直接在原df基礎上修改

## append 插入行

append（self， other， ignore_index=False， verify_integrity=False， sort=None）

other 插入值可以是DataFrame 、Series、或者類似結構的陣列、列表

ignore_inde 是否忽略原來的行索引，生成新的RangeIndex行索引，預設為Fasle 不忽略

verify_integrity 驗證行索引是否重複，若重複則丟擲異常

sort 對新的df排序，使用時手動設定為False

返回值為新的DataFrame

關聯

join

join（self， other， on=None， how=’left’， lsuffix=’’， rsuffix=’’， sort=False）

同透過行索引或指定列，關聯另一個列表或Series中的元素，返回一個新列表。類似sql的join。可以將其描述為一個DataFrame列或行索引與另一個DataFrame的行索引關聯。

other 關聯的資料，可以是DataFram、Series

on 指定要關聯的列，預設是透過行索引進行關聯

how 關聯方式，提供left join 左聯，right join 右聯，out join 外聯， inner join內聯，預設為left

lsuffix 為了避免欄位名重複，為左關聯物件欄位新增字尾

lsuffix 為了避免欄位名重複，為右關聯物件欄位新增字尾

sort 排序，預設為False

返回一個新的DataFrame

下面的案例中主要以左關聯為主講解，更多方式請自行嘗試。

merge

merge（self， right， how=’inner’， on=None， left_on=None， right_on=None， left_index=False， right_index=False， sort=False， suffixes=（‘_x’， ‘_y’）， copy=True， indicator=False， validate=None）

這個方法引數很多，不要被嚇到，其實它跟join類似，比join強大的地方在於，它可以指定任意列進行關聯，而不需要將其轉換為索引的形式，它同時也相容了join的功能。

right 需要關聯的DataFrame

how 關聯方式，提供left join 左聯，right join 右聯，out join 外聯， inner join內聯，預設為inner

on 關聯列索引名稱，當左右DataFrame的關聯列索引名稱相同時，可以直接指定on，而省略left_on 和 right_on

left_on 左DataFrame被關聯的列名

right_on 右DataFrame被關聯的列名

left_index 使用左DataFrame的行索引進行關聯，預設為False

right_index 使用右DataFrame的行索引進行關聯，預設為False

sort 排序預設為False

suffixes 為了避免欄位名重複，為兩個DataFrame列索引增加字尾，預設為（‘’，’’）

copy 複製資料，預設為False

indicator 預設為False，如果為真，則會新增一個解釋列。用以解釋，關聯結果。

validate 檢測左右DataFrame的合併鍵是否屬於指定型別。值域{one_to_one，one_to_many，many_to_one，many_to_many}

修改

Update

update（self， other， join=’left’， overwrite=True， filter_func=None， errors=’ignore’）

關聯並替換原有的值，透過列索引關聯，並替換相同列索引對應的值

other 關聯替換的值，DataFrame或Series

join 關聯方式僅支援左關聯

overwrite 預設為True替換所有能夠關聯的值，若為False則僅替換NaN值

errors 值為’raise’ or ‘ignore’，決定能夠匹配的行，左右都存在NaN值時是否丟擲異常

at，iat，loc，iloc

刪除

pop

刪除列，並將被刪除的列轉化為Series返回

drop

drop（self， labels=None， axis=0， index=None， columns=None， level=None， inplace=False， errors=’raise’）

透過指定行列索引來刪除行或列

labels 行、列索引的標籤名稱

axis = 0 刪除列 axis = 1 刪除行

index 行索引標籤

columns 列索引標籤

level 使用多級索引時，指定索引層級

inplace 若為True則在原DataFrame上操作，返回空。若為False則返回新DataFrame

errors 當傳入的標籤不再DataFrame時，是否丟擲異常。’raise’ or ‘ignore’

dropna 、 drop_duplicate

dropna 刪除空值

drop_duplicate 刪除重複值

轉載請註明出處。