“Chuck” number of flashes

This notebook visualizes the number of flashes from the the TV show “Chuck” based on Reddit comments and TV transcripts

Source code: https://codeberg.org/penguinsfly/tv-mania/src/branch/main/chuck

Motivation¶

Spoiler ahead by the way!

Click to show spoilers about what a flash is

For those who have not watched or heard of the show (spoilers), a flash is when Chuck (the main character) or other characters have a quick flash in their minds of the data streams from the Intersect, which is a database of information or skills that were integrated into their minds.

In the last season, they also use the word “zoom” to describe it. If you’re a DC fan, you might be able to see why already lol.

While watching the show, back when Reddit was in the middle of its API BS, I browsed on Reddit for those Rewatch episode discussions posts, and I noticed there were occassionally someone counting the number of flashes in the comment section, sometimes with other information as well.

So I was wondering whether I could visualize such information that these individuals already kindly summarized. However, it was during the whole Reddit API policy change BS, and I could not use PushShiftAPI anymore; hence I resorted to Libreddit to scrape the relevant posts and comments. One limitation was that there were only the first two, out of 13, episodes for the last season that had these types of comments. Not sure what happened.

Anyway, then I also used the transcripts on Springfield! Springfield!, and tried to see if there’s any correlation between the number of words said. Spoiler alert, there wasn’t much.

Obtain data¶

To get all the data and the processed data for this notebook, please run:

make scrape # scrape both tv scripts and libreddit
make process # process the data to prepare for visualization

Note on visualization tool¶

This is the first time I’m playing with apexcharts to make visualization. And since I’m a novice, I’m not using it very efficiently, so it would take some time to load the visualizations.

I am also using it “inline” inside this notebook to be rendered with jupyter-book, which might possibly affect performance as well. Not sure, just want to make it clear up front that this notebook is not very optimized.

Import package & Load data¶

import os
import re

import numpy as np
import pandas as pd

from IPython.display import display, HTML

import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=UserWarning)

Visualize number of flashes throughout the entire show¶

Process the data and prepare for `apexcharts`¶

df_flashes['season_episode'] = df_flashes.apply(lambda x: 'S%d-E%02d' %(x['season'], x['episode']), axis=1)
df_flashes['flasher'] = df_flashes.apply(
    lambda x: x['flasher'] if pd.isna(x['type'])
        else f"{x['flasher']} [{x['type']}]",
    axis=1
)

coord_df = df_flashes.filter(['season_episode'])\
    .rename(columns={'season_episode': 'x'})\
    .sort_values(by='x').drop_duplicates('x')\
    .reset_index(drop=True)
cat_js = str(coord_df['x'].to_list())

def create_data(x):
    x = coord_df.merge(x.filter(['x', 'y']), on='x', how='outer')\
            .sort_values(by='x')\
            .fillna({'y':0}).astype({'y': 'int'})
    assert sum(x.duplicated()) == 0, x[x.duplicated()]           
    return x.to_dict('records')
    
data_js = str(df_flashes.rename(columns={'season_episode': 'x', 'count': 'y'})\
    .fillna({'flasher': 'chuck'})\
    .groupby('flasher')\
    .apply(create_data).to_frame('data').reset_index()\
    .rename(columns={'flasher': 'name'})\
    .to_dict('records'))

groups_js = df_flashes.drop_duplicates('season_episode').value_counts('season', sort=False)\
    .to_frame('cols').reset_index()\
    .rename(columns={'season': 'title'})
groups_js['title'] = groups_js['title'].apply(lambda x: f'Season {x}') 
groups_js = str(groups_js.to_dict('records'))

Plot¶

display(HTML('''
<script src="https://cdn.jsdelivr.net/npm/apexcharts"></script>
<div id="chart-episode" style="min-height: 365px;"></div>
<script>
var options = {
  series: %s,
  chart: {
  type: 'bar',
  height: 350,
  stacked: true,
  toolbar: {
    show: true
  },
  zoom: {
    enabled: false
  }
},
colors: [
'#d9d9d9',
'#969696',
'#252525',
'#1b9e77',
'#d95f02',
'#7570b3',
'#e7298a',
'#66a61e',
],
plotOptions: {
  bar: {
    horizontal: false,
    dataLabels: {
      total: {
        enabled: true,
        style: {
          fontSize: '10px',
          fontWeight: 100
        },
        formatter: function(val) {
            if (val > 0) {return val} else {return ""}
        },
      }
    }
  },
},
title: {
  text: '# of Flashes/Zooms from Chuck using Reddit re-watch comments',
  align: 'center',
  margin: 10,
  style: {
    fontSize: '18px',
  },
},
annotations: {
  xaxis: [
    {
      x: 'S5-E03',
      x2: 'S5-E13',
      fillColor: '#ggggff',
      label: {
        text: 'no more count comments',
        orientation: 'horizontal',        
      },
    }
  ]
},
xaxis: {
  type: 'category',
  categories: %s,
  labels: {
    formatter: function(val) {
        return val.split("-")[1].replace("E0", "").replace("E", "")
    },
    rotate: -90
  },
  group: {
    style: {
      fontSize: '15px',
      fontWeight: 700
    },
    groups: %s
  }
},
legend: {
  position: 'top',
  horizontalAlign: 'right',
},
fill: {
  opacity: 1
}
};

var chart_ep = new ApexCharts(document.querySelector("#chart-episode"), options);
chart_ep.render();
</script>
''' %(
    data_js, 
    cat_js,
    groups_js
)))

Compare the number obtained from Reddit and from counting words in TV scripts¶

Process data and prepare for `apexcharts`¶

jitter = 0.3

df = df_script.merge(
    df_flashes.dropna(subset='count', axis=0)\
        .groupby(['season', 'episode'])\
        ['count'].agg('sum').reset_index(),
    how='outer'
)
df['script'] = df.filter(regex='script_.*').sum(axis=1)
df = df.sort_values(['season', 'episode']).dropna(how='any').reset_index(drop=True)
df['season'] = 'Season ' + df['season'].astype(str)
df['reddit'] = df['count'].astype(int)

data_js = str(
    df.filter(['season', 'script', 'reddit'])\
    .groupby('season')\
    .apply(
        lambda x: x.apply(lambda y: [
            round(y['reddit'] + np.random.uniform(-jitter,jitter),3),
            round(y['script'] + np.random.uniform(-jitter,jitter),3),
        ], axis=1).to_list()
    ).to_frame('data')\
    .reset_index()\
    .rename(columns={'season': 'name'})\
    .to_dict('records')
)

groups_js = str(list(df['season'].unique()))

Plot¶

display(HTML('''
<div id="chart-compare" style="min-height: 365px;"></div>
<script>
var options = {
  series: %s,
chart: {
  type: 'scatter',
  height: 500,
  width: 500,
  toolbar: {
    show: true
  },
  zoom: {
    enabled: true,
    type: 'xy'
  }
},
title: {
  text: 'Counts between Reddit & from TV scripts',
  align: 'center',
  margin: 10,
  style: {
    fontSize:  '18px',
  },
},
legend: {
  position: 'top',
  horizontalAlign: 'right',
},
xaxis: {
  title: {
    text: 'From Reddit comments',
    style: {
        fontSize: '15px',
        fontWeight: 100,
    },
  },
  min: -1,
  max: 15,
  tickAmount: 4,
},
yaxis: {
  title: {
    text: 'From TV scripts',
    style: {
        fontSize: '15px',
        fontWeight: 100,
    },
  },
  min: -1,
  max: 15,
  tickAmount: 4,
},
fill: {
  opacity: 0.5
}
};

var chart_cmp = new ApexCharts(document.querySelector("#chart-compare"), options);
chart_cmp.render();
</script>
''' %(
    data_js
)))

Motivation¶

Obtain data¶

Note on visualization tool¶

Import package & Load data¶

Visualize number of flashes throughout the entire show¶

Process the data and prepare for apexcharts¶

Plot¶

Compare the number obtained from Reddit and from counting words in TV scripts¶

Process data and prepare for apexcharts¶

Plot¶

Process the data and prepare for `apexcharts`¶

Process data and prepare for `apexcharts`¶