“Chuck” number of flashes#
This notebook visualizes the number of flashes from the the TV show “Chuck” based on Reddit comments and TV transcripts
Source code: https://codeberg.org/penguinsfly/tv-mania/src/branch/main/chuck
Motivation#
Spoiler ahead by the way!
Click to show spoilers about what a flash is
For those who have not watched or heard of the show (spoilers), a flash is when Chuck (the main character) or other characters have a quick flash in their minds of the data streams from the Intersect, which is a database of information or skills that were integrated into their minds.
In the last season, they also use the word “zoom” to describe it. If you’re a DC fan, you might be able to see why already lol.
While watching the show, back when Reddit was in the middle of its API BS, I browsed on Reddit for those Rewatch episode discussions posts, and I noticed there were occassionally someone counting the number of flashes in the comment section, sometimes with other information as well.
So I was wondering whether I could visualize such information that these individuals already kindly summarized. However, it was during the whole Reddit API policy change BS, and I could not use PushShiftAPI anymore; hence I resorted to Libreddit to scrape the relevant posts and comments. One limitation was that there were only the first two, out of 13, episodes for the last season that had these types of comments. Not sure what happened.
Anyway, then I also used the transcripts on Springfield! Springfield!, and tried to see if there’s any correlation between the number of words said. Spoiler alert, there wasn’t much.
Obtain data#
To get all the data and the processed data for this notebook, please run:
make scrape # scrape both tv scripts and libreddit
make process # process the data to prepare for visualization
Note on visualization tool#
This is the first time I’m playing with apexcharts
to make visualization.
And since I’m a novice, I’m not using it very efficiently,
so it would take some time to load the visualizations.
I am also using it “inline” inside this notebook to be rendered with jupyter-book
,
which might possibly affect performance as well.
Not sure, just want to make it clear up front that this notebook is not very optimized.
Import package & Load data#
Show code cell source
import os
import re
import numpy as np
import pandas as pd
from IPython.display import display, HTML
Show code cell source
df_flashes = pd.read_csv('data/proc_libreddit.csv')
df_script = pd.read_csv('data/proc_scripts.csv')
Visualize number of flashes throughout the entire show#
Process the data and prepare for apexcharts
#
Show code cell source
df_flashes['season_episode'] = df_flashes.apply(lambda x: 'S%d-E%02d' %(x['season'], x['episode']), axis=1)
df_flashes['flasher'] = df_flashes.apply(
lambda x: x['flasher'] if pd.isna(x['type'])
else f"{x['flasher']} [{x['type']}]",
axis=1
)
Show code cell source
coord_df = df_flashes.filter(['season_episode'])\
.rename(columns={'season_episode': 'x'})\
.sort_values(by='x').drop_duplicates('x')\
.reset_index(drop=True)
cat_js = str(coord_df['x'].to_list())
Show code cell source
def create_data(x):
x = coord_df.merge(x.filter(['x', 'y']), on='x', how='outer')\
.sort_values(by='x')\
.fillna({'y':0}).astype({'y': 'int'})
assert sum(x.duplicated()) == 0, x[x.duplicated()]
return x.to_dict('records')
data_js = str(df_flashes.rename(columns={'season_episode': 'x', 'count': 'y'})\
.fillna({'flasher': 'chuck'})\
.groupby('flasher')\
.apply(create_data).to_frame('data').reset_index()\
.rename(columns={'flasher': 'name'})\
.to_dict('records'))
Show code cell source
groups_js = df_flashes.drop_duplicates('season_episode').value_counts('season', sort=False)\
.to_frame('cols').reset_index()\
.rename(columns={'season': 'title'})
groups_js['title'] = groups_js['title'].apply(lambda x: f'Season {x}')
groups_js = str(groups_js.to_dict('records'))
Plot#
Show code cell source
display(HTML('''
<script src="https://cdn.jsdelivr.net/npm/apexcharts"></script>
<div id="chart-episode" style="min-height: 365px;"></div>
<script>
var options = {
series: %s,
chart: {
type: 'bar',
height: 350,
stacked: true,
toolbar: {
show: true
},
zoom: {
enabled: false
}
},
colors: [
'#d9d9d9',
'#969696',
'#252525',
'#1b9e77',
'#d95f02',
'#7570b3',
'#e7298a',
'#66a61e',
],
plotOptions: {
bar: {
horizontal: false,
dataLabels: {
total: {
enabled: true,
style: {
fontSize: '10px',
fontWeight: 100
},
formatter: function(val) {
if (val > 0) {return val} else {return ""}
},
}
}
},
},
title: {
text: '# of Flashes/Zooms from Chuck using Reddit re-watch comments',
align: 'center',
margin: 10,
style: {
fontSize: '18px',
},
},
annotations: {
xaxis: [
{
x: 'S5-E03',
x2: 'S5-E13',
fillColor: '#ggggff',
label: {
text: 'no more count comments',
orientation: 'horizontal',
},
}
]
},
xaxis: {
type: 'category',
categories: %s,
labels: {
formatter: function(val) {
return val.split("-")[1].replace("E0", "").replace("E", "")
},
rotate: -90
},
group: {
style: {
fontSize: '15px',
fontWeight: 700
},
groups: %s
}
},
legend: {
position: 'top',
horizontalAlign: 'right',
},
fill: {
opacity: 1
}
};
var chart_ep = new ApexCharts(document.querySelector("#chart-episode"), options);
chart_ep.render();
</script>
''' %(
data_js,
cat_js,
groups_js
)))
Compare the number obtained from Reddit and from counting words in TV scripts#
Process data and prepare for apexcharts
#
Show code cell source
jitter = 0.3
df = df_script.merge(
df_flashes.dropna(subset='count', axis=0)\
.groupby(['season', 'episode'])\
['count'].agg('sum').reset_index(),
how='outer'
)
df['script'] = df.filter(regex='script_.*').sum(axis=1)
df = df.sort_values(['season', 'episode']).dropna(how='any').reset_index(drop=True)
df['season'] = 'Season ' + df['season'].astype(str)
df['reddit'] = df['count'].astype(int)
data_js = str(
df.filter(['season', 'script', 'reddit'])\
.groupby('season')\
.apply(
lambda x: x.apply(lambda y: [
round(y['reddit'] + np.random.uniform(-jitter,jitter),3),
round(y['script'] + np.random.uniform(-jitter,jitter),3),
], axis=1).to_list()
).to_frame('data')\
.reset_index()\
.rename(columns={'season': 'name'})\
.to_dict('records')
)
groups_js = str(list(df['season'].unique()))
Plot#
Show code cell source
display(HTML('''
<div id="chart-compare" style="min-height: 365px;"></div>
<script>
var options = {
series: %s,
chart: {
type: 'scatter',
height: 500,
width: 500,
toolbar: {
show: true
},
zoom: {
enabled: true,
type: 'xy'
}
},
title: {
text: 'Counts between Reddit & from TV scripts',
align: 'center',
margin: 10,
style: {
fontSize: '18px',
},
},
legend: {
position: 'top',
horizontalAlign: 'right',
},
xaxis: {
title: {
text: 'From Reddit comments',
style: {
fontSize: '15px',
fontWeight: 100,
},
},
min: -1,
max: 15,
tickAmount: 4,
},
yaxis: {
title: {
text: 'From TV scripts',
style: {
fontSize: '15px',
fontWeight: 100,
},
},
min: -1,
max: 15,
tickAmount: 4,
},
fill: {
opacity: 0.5
}
};
var chart_cmp = new ApexCharts(document.querySelector("#chart-compare"), options);
chart_cmp.render();
</script>
''' %(
data_js
)))