Chuck” number of flashes#

This notebook visualizes the number of flashes from the the TV show Chuck based on Reddit comments and TV transcripts

Source code: https://codeberg.org/penguinsfly/tv-mania/src/branch/main/chuck

Motivation#

Spoiler ahead by the way!

While watching the show, back when Reddit was in the middle of its API BS, I browsed on Reddit for those Rewatch episode discussions posts, and I noticed there were occassionally someone counting the number of flashes in the comment section, sometimes with other information as well.

So I was wondering whether I could visualize such information that these individuals already kindly summarized. However, it was during the whole Reddit API policy change BS, and I could not use PushShiftAPI anymore; hence I resorted to Libreddit to scrape the relevant posts and comments. One limitation was that there were only the first two, out of 13, episodes for the last season that had these types of comments. Not sure what happened.

Anyway, then I also used the transcripts on Springfield! Springfield!, and tried to see if there’s any correlation between the number of words said. Spoiler alert, there wasn’t much.

Obtain data#

To get all the data and the processed data for this notebook, please run:

make scrape # scrape both tv scripts and libreddit
make process # process the data to prepare for visualization

Note on visualization tool#

This is the first time I’m playing with apexcharts to make visualization. And since I’m a novice, I’m not using it very efficiently, so it would take some time to load the visualizations.

I am also using it “inline” inside this notebook to be rendered with jupyter-book, which might possibly affect performance as well. Not sure, just want to make it clear up front that this notebook is not very optimized.

Import package & Load data#

Hide code cell source
import os
import re

import numpy as np
import pandas as pd

from IPython.display import display, HTML
Hide code cell source
df_flashes = pd.read_csv('data/proc_libreddit.csv')
df_script = pd.read_csv('data/proc_scripts.csv')

Visualize number of flashes throughout the entire show#

Process the data and prepare for apexcharts#

Hide code cell source
df_flashes['season_episode'] = df_flashes.apply(lambda x: 'S%d-E%02d' %(x['season'], x['episode']), axis=1)
df_flashes['flasher'] = df_flashes.apply(
    lambda x: x['flasher'] if pd.isna(x['type'])
        else f"{x['flasher']} [{x['type']}]",
    axis=1
)
Hide code cell source
coord_df = df_flashes.filter(['season_episode'])\
    .rename(columns={'season_episode': 'x'})\
    .sort_values(by='x').drop_duplicates('x')\
    .reset_index(drop=True)
cat_js = str(coord_df['x'].to_list())
Hide code cell source
def create_data(x):
    x = coord_df.merge(x.filter(['x', 'y']), on='x', how='outer')\
            .sort_values(by='x')\
            .fillna({'y':0}).astype({'y': 'int'})
    assert sum(x.duplicated()) == 0, x[x.duplicated()]           
    return x.to_dict('records')
    
data_js = str(df_flashes.rename(columns={'season_episode': 'x', 'count': 'y'})\
    .fillna({'flasher': 'chuck'})\
    .groupby('flasher')\
    .apply(create_data).to_frame('data').reset_index()\
    .rename(columns={'flasher': 'name'})\
    .to_dict('records'))
Hide code cell source
groups_js = df_flashes.drop_duplicates('season_episode').value_counts('season', sort=False)\
    .to_frame('cols').reset_index()\
    .rename(columns={'season': 'title'})
groups_js['title'] = groups_js['title'].apply(lambda x: f'Season {x}') 
groups_js = str(groups_js.to_dict('records'))

Plot#

Hide code cell source
display(HTML('''
<script src="https://cdn.jsdelivr.net/npm/apexcharts"></script>
<div id="chart-episode" style="min-height: 365px;"></div>
<script>
var options = {
  series: %s,
  chart: {
  type: 'bar',
  height: 350,
  stacked: true,
  toolbar: {
    show: true
  },
  zoom: {
    enabled: false
  }
},
colors: [
'#d9d9d9',
'#969696',
'#252525',
'#1b9e77',
'#d95f02',
'#7570b3',
'#e7298a',
'#66a61e',
],
plotOptions: {
  bar: {
    horizontal: false,
    dataLabels: {
      total: {
        enabled: true,
        style: {
          fontSize: '10px',
          fontWeight: 100
        },
        formatter: function(val) {
            if (val > 0) {return val} else {return ""}
        },
      }
    }
  },
},
title: {
  text: '# of Flashes/Zooms from Chuck using Reddit re-watch comments',
  align: 'center',
  margin: 10,
  style: {
    fontSize: '18px',
  },
},
annotations: {
  xaxis: [
    {
      x: 'S5-E03',
      x2: 'S5-E13',
      fillColor: '#ggggff',
      label: {
        text: 'no more count comments',
        orientation: 'horizontal',        
      },
    }
  ]
},
xaxis: {
  type: 'category',
  categories: %s,
  labels: {
    formatter: function(val) {
        return val.split("-")[1].replace("E0", "").replace("E", "")
    },
    rotate: -90
  },
  group: {
    style: {
      fontSize: '15px',
      fontWeight: 700
    },
    groups: %s
  }
},
legend: {
  position: 'top',
  horizontalAlign: 'right',
},
fill: {
  opacity: 1
}
};

var chart_ep = new ApexCharts(document.querySelector("#chart-episode"), options);
chart_ep.render();
</script>
''' %(
    data_js, 
    cat_js,
    groups_js
)))

Compare the number obtained from Reddit and from counting words in TV scripts#

Process data and prepare for apexcharts#

Hide code cell source
jitter = 0.3

df = df_script.merge(
    df_flashes.dropna(subset='count', axis=0)\
        .groupby(['season', 'episode'])\
        ['count'].agg('sum').reset_index(),
    how='outer'
)
df['script'] = df.filter(regex='script_.*').sum(axis=1)
df = df.sort_values(['season', 'episode']).dropna(how='any').reset_index(drop=True)
df['season'] = 'Season ' + df['season'].astype(str)
df['reddit'] = df['count'].astype(int)

data_js = str(
    df.filter(['season', 'script', 'reddit'])\
    .groupby('season')\
    .apply(
        lambda x: x.apply(lambda y: [
            round(y['reddit'] + np.random.uniform(-jitter,jitter),3),
            round(y['script'] + np.random.uniform(-jitter,jitter),3),
        ], axis=1).to_list()
    ).to_frame('data')\
    .reset_index()\
    .rename(columns={'season': 'name'})\
    .to_dict('records')
)

groups_js = str(list(df['season'].unique()))

Plot#

Hide code cell source
display(HTML('''
<div id="chart-compare" style="min-height: 365px;"></div>
<script>
var options = {
  series: %s,
chart: {
  type: 'scatter',
  height: 500,
  width: 500,
  toolbar: {
    show: true
  },
  zoom: {
    enabled: true,
    type: 'xy'
  }
},
title: {
  text: 'Counts between Reddit & from TV scripts',
  align: 'center',
  margin: 10,
  style: {
    fontSize:  '18px',
  },
},
legend: {
  position: 'top',
  horizontalAlign: 'right',
},
xaxis: {
  title: {
    text: 'From Reddit comments',
    style: {
        fontSize: '15px',
        fontWeight: 100,
    },
  },
  min: -1,
  max: 15,
  tickAmount: 4,
},
yaxis: {
  title: {
    text: 'From TV scripts',
    style: {
        fontSize: '15px',
        fontWeight: 100,
    },
  },
  min: -1,
  max: 15,
  tickAmount: 4,
},
fill: {
  opacity: 0.5
}
};

var chart_cmp = new ApexCharts(document.querySelector("#chart-compare"), options);
chart_cmp.render();
</script>
''' %(
    data_js
)))