MEDIAN

11/13/16

The MEDIAN function returns the median of a range of values.

In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('sample_data.csv')
In [3]:
df
Out[3]:
Name Date Item Product Category Time (hrs) Amt Picked (lbs) Sale Value ($/lb)
0 Lucas 1/5/16 Green Apples Apples 2 5 2.5
1 Valter 1/5/16 European Pear Pears 4 10 2.0
2 Erik 1/8/16 Red Apples Apples 2 3 2.1
3 Georg 2/1/16 Asian Pear Pears 8 15 3.8
4 Lucas 2/10/16 Red Apples Apples 6 6 3.6

Median of column values

Median of all columns


Will automatically exclude values for string-typed columns

In [4]:
df.median()
Out[4]:
Time (hrs)           4.0
Amt Picked (lbs)     6.0
Sale Value ($/lb)    2.5
dtype: float64

Median of multiple columns (opt-in method)

In [5]:
columns = ['Time (hrs)', 'Amt Picked (lbs)']
df[columns].median()
Out[5]:
Time (hrs)          4
Amt Picked (lbs)    6
dtype: float64

Median of multiple columns (opt-out method)

In [6]:
columns = list(df)
columns
Out[6]:
['Name',
 'Date',
 'Item',
 'Product Category',
 'Time (hrs)',
 'Amt Picked (lbs)',
 'Sale Value ($/lb)']
In [8]:
removelist = ['Name', 'Date', 'Item', 'Product Category', 'Sale Value ($/lb)']
newlist = [value for value in columns if value not in removelist]
df[newlist].median()
Out[8]:
Time (hrs)          4
Amt Picked (lbs)    6
dtype: float64

Median of one specific column

All rows

In [9]:
df['Amt Picked (lbs)'].median()
Out[9]:
6.0

Range of rows

In [10]:
df.ix[0:2]['Amt Picked (lbs)'].median()
Out[10]:
5.0

Median for each individual person


NB: reset_index( ) is optional and used here to preserve the spreadsheet formatting

Multiple columns

In [31]:
columns = ['Time (hrs)','Amt Picked (lbs)']
df.groupby('Name')[columns].median().reset_index()
Out[31]:
Name Time (hrs) Amt Picked (lbs)
0 Erik 2 3.0
1 Georg 8 15.0
2 Lucas 4 5.5
3 Valter 4 10.0

Single column

In [30]:
df.groupby('Name')['Time (hrs)'].median().reset_index()
Out[30]:
Name Time (hrs)
0 Erik 2
1 Georg 8
2 Lucas 4
3 Valter 4

Median of row values


Useful if the dataset has been transposed

In [11]:
df.T
Out[11]:
0 1 2 3 4
Name Lucas Valter Erik Georg Lucas
Date 1/5/16 1/5/16 1/8/16 2/1/16 2/10/16
Item Green Apples European Pear Red Apples Asian Pear Red Apples
Product Category Apples Pears Apples Pears Apples
Time (hrs) 2 4 2 8 6
Amt Picked (lbs) 5 10 3 15 6
Sale Value ($/lb) 2.5 2 2.1 3.8 3.6
In [12]:
df_alt = df.T

Median of specific rows (opt-in method)

In [14]:
rows = ['Time (hrs)', 'Amt Picked (lbs)']
In [15]:
df_alt.ix[rows].median(axis=1)
Out[15]:
Time (hrs)          4
Amt Picked (lbs)    6
dtype: float64

Median of specific rows (opt-out method)

In [24]:
rows = list(df_alt.index)
In [25]:
rows
Out[25]:
['Name',
 'Date',
 'Item',
 'Product Category',
 'Time (hrs)',
 'Amt Picked (lbs)',
 'Sale Value ($/lb)']
In [26]:
removelist = ['Name', 'Date', 'Item', 'Product Category', 'Sale Value ($/lb)']
newlist = [value for value in rows if value not in removelist]
df[newlist].median()
Out[26]:
Time (hrs)          4
Amt Picked (lbs)    6
dtype: float64