FiveThrtyEight が提供する college-majors データの分析。

データを読み込む

import pandas as pd

# 全ての年代と、近年のデータをそれぞれ読み込む
all_ages = pd.read_csv('all-ages.csv')
recent_grads = pd.read_csv('recent-grads.csv')
recent_grads.head(5)
Rank Major_code Major Major_category Total Sample_size Men Women ShareWomen Employed ... Part_time Full_time_year_round Unemployed Unemployment_rate Median P25th P75th College_jobs Non_college_jobs Low_wage_jobs
0 1 2419 PETROLEUM ENGINEERING Engineering 2339 36 2057 282 0.120564 1976 ... 270 1207 37 0.018381 110000 95000 125000 1534 364 193
1 2 2416 MINING AND MINERAL ENGINEERING Engineering 756 7 679 77 0.101852 640 ... 170 388 85 0.117241 75000 55000 90000 350 257 50
2 3 2415 METALLURGICAL ENGINEERING Engineering 856 3 725 131 0.153037 648 ... 133 340 16 0.024096 73000 50000 105000 456 176 0
3 4 2417 NAVAL ARCHITECTURE AND MARINE ENGINEERING Engineering 1258 16 1123 135 0.107313 758 ... 150 692 40 0.050125 70000 43000 80000 529 102 0
4 5 2405 CHEMICAL ENGINEERING Engineering 32260 289 21239 11021 0.341631 25694 ... 5180 16697 1672 0.061098 65000 50000 75000 18314 4440 972

5 rows × 21 columns

学部毎の卒業生数:過去、現在

# 学位のカテゴリ(学部に相当する)の一覧を作成する
major_categories = all_ages['Major_category'].value_counts().index

# 学位カテゴリ毎の総人数
all_ages_major_categories = {}
recent_grads_major_categories = {}

# 与えられたデータフレームのカテゴリ毎の総人数を求める
def calc_total_for_major_category(df):
    major_categories = df['Major_category'].value_counts().index
    totals = {}
    for major_category in major_categories:
        totals[major_category] = df[df['Major_category'] == major_category]['Total'].sum()
    return totals
 
# 過去の全データと、近年のデータ、それぞれで学位カテゴリ毎の総人数を求める
all_ages_major_categories = calc_total_for_major_category(all_ages)
recent_grads_major_categories = calc_total_for_major_category(recent_grads)

低賃金に甘んじている学位取得者の割合は?

low_wage_percent = recent_grads['Low_wage_jobs'].sum() / recent_grads['Total'].sum()
print(low_wage_percent)
0.0985254607612
=> およそ10%

学位毎の失業率は増えている?

majors = recent_grads['Major'].value_counts().index

all_ages_ordered = all_ages.sort_values('Major')
recent_grads_ordered = recent_grads.sort_values('Major')

all_ages_are_better = all_ages_ordered['Unemployment_rate'] < recent_grads_ordered['Unemployment_rate']
recent_grads_are_better = all_ages_ordered['Unemployment_rate'] > recent_grads_ordered['Unemployment_rate']

all_ages_lower_unemp_count = sum([1 if better == True else 0 for better in all_ages_are_better])
recent_grads_lower_unemp_count = sum([1 if better == True else 0 for better in recent_grads_are_better])

print(all_ages_lower_unemp_count)
print(recent_grads_lower_unemp_count)
128
43
=> 近年は43の学位で失業率が低い。悪化している。