Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Chuck” number of flashes

This notebook visualizes the number of flashes from the the TV show Chuck based on Reddit comments and TV transcripts

Source code: https://codeberg.org/penguinsfly/tv-mania/src/branch/main/chuck

Motivation

Spoiler ahead by the way!

While watching the show, back when Reddit was in the middle of its API BS, I browsed on Reddit for those Rewatch episode discussions posts, and I noticed there were occassionally someone counting the number of flashes in the comment section, sometimes with other information as well.

So I was wondering whether I could visualize such information that these individuals already kindly summarized. However, it was during the whole Reddit API policy change BS, and I could not use PushShiftAPI anymore; hence I resorted to Libreddit to scrape the relevant posts and comments. One limitation was that there were only the first two, out of 13, episodes for the last season that had these types of comments. Not sure what happened.

Anyway, then I also used the transcripts on Springfield! Springfield!, and tried to see if there’s any correlation between the number of words said. Spoiler alert, there wasn’t much.

Obtain data

To get all the data and the processed data for this notebook, please run:

make scrape # scrape both tv scripts and libreddit
make process # process the data to prepare for visualization

Note on visualization tool

This is the first time I’m playing with apexcharts to make visualization. And since I’m a novice, I’m not using it very efficiently, so it would take some time to load the visualizations.

I am also using it “inline” inside this notebook to be rendered with jupyter-book, which might possibly affect performance as well. Not sure, just want to make it clear up front that this notebook is not very optimized.

Import package & Load data

Source
import os
import re

import numpy as np
import pandas as pd

from IPython.display import display, HTML

import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=UserWarning)
Source
df_flashes = pd.read_csv('data/proc_libreddit.csv')
df_script = pd.read_csv('data/proc_scripts.csv')

Visualize number of flashes throughout the entire show

Process the data and prepare for apexcharts

Source
df_flashes['season_episode'] = df_flashes.apply(lambda x: 'S%d-E%02d' %(x['season'], x['episode']), axis=1)
df_flashes['flasher'] = df_flashes.apply(
    lambda x: x['flasher'] if pd.isna(x['type'])
        else f"{x['flasher']} [{x['type']}]",
    axis=1
)
Source
coord_df = df_flashes.filter(['season_episode'])\
    .rename(columns={'season_episode': 'x'})\
    .sort_values(by='x').drop_duplicates('x')\
    .reset_index(drop=True)
cat_js = str(coord_df['x'].to_list())
Source
def create_data(x):
    x = coord_df.merge(x.filter(['x', 'y']), on='x', how='outer')\
            .sort_values(by='x')\
            .fillna({'y':0}).astype({'y': 'int'})
    assert sum(x.duplicated()) == 0, x[x.duplicated()]           
    return x.to_dict('records')
    
data_js = str(df_flashes.rename(columns={'season_episode': 'x', 'count': 'y'})\
    .fillna({'flasher': 'chuck'})\
    .groupby('flasher')\
    .apply(create_data).to_frame('data').reset_index()\
    .rename(columns={'flasher': 'name'})\
    .to_dict('records'))
Source
groups_js = df_flashes.drop_duplicates('season_episode').value_counts('season', sort=False)\
    .to_frame('cols').reset_index()\
    .rename(columns={'season': 'title'})
groups_js['title'] = groups_js['title'].apply(lambda x: f'Season {x}') 
groups_js = str(groups_js.to_dict('records'))

Plot

Source
display(HTML('''
<script src="https://cdn.jsdelivr.net/npm/apexcharts"></script>
<div id="chart-episode" style="min-height: 365px;"></div>
<script>
var options = {
  series: %s,
  chart: {
  type: 'bar',
  height: 350,
  stacked: true,
  toolbar: {
    show: true
  },
  zoom: {
    enabled: false
  }
},
colors: [
'#d9d9d9',
'#969696',
'#252525',
'#1b9e77',
'#d95f02',
'#7570b3',
'#e7298a',
'#66a61e',
],
plotOptions: {
  bar: {
    horizontal: false,
    dataLabels: {
      total: {
        enabled: true,
        style: {
          fontSize: '10px',
          fontWeight: 100
        },
        formatter: function(val) {
            if (val > 0) {return val} else {return ""}
        },
      }
    }
  },
},
title: {
  text: '# of Flashes/Zooms from Chuck using Reddit re-watch comments',
  align: 'center',
  margin: 10,
  style: {
    fontSize: '18px',
  },
},
annotations: {
  xaxis: [
    {
      x: 'S5-E03',
      x2: 'S5-E13',
      fillColor: '#ggggff',
      label: {
        text: 'no more count comments',
        orientation: 'horizontal',        
      },
    }
  ]
},
xaxis: {
  type: 'category',
  categories: %s,
  labels: {
    formatter: function(val) {
        return val.split("-")[1].replace("E0", "").replace("E", "")
    },
    rotate: -90
  },
  group: {
    style: {
      fontSize: '15px',
      fontWeight: 700
    },
    groups: %s
  }
},
legend: {
  position: 'top',
  horizontalAlign: 'right',
},
fill: {
  opacity: 1
}
};

var chart_ep = new ApexCharts(document.querySelector("#chart-episode"), options);
chart_ep.render();
</script>
''' %(
    data_js, 
    cat_js,
    groups_js
)))
Loading...

Compare the number obtained from Reddit and from counting words in TV scripts

Process data and prepare for apexcharts

Source
jitter = 0.3

df = df_script.merge(
    df_flashes.dropna(subset='count', axis=0)\
        .groupby(['season', 'episode'])\
        ['count'].agg('sum').reset_index(),
    how='outer'
)
df['script'] = df.filter(regex='script_.*').sum(axis=1)
df = df.sort_values(['season', 'episode']).dropna(how='any').reset_index(drop=True)
df['season'] = 'Season ' + df['season'].astype(str)
df['reddit'] = df['count'].astype(int)

data_js = str(
    df.filter(['season', 'script', 'reddit'])\
    .groupby('season')\
    .apply(
        lambda x: x.apply(lambda y: [
            round(y['reddit'] + np.random.uniform(-jitter,jitter),3),
            round(y['script'] + np.random.uniform(-jitter,jitter),3),
        ], axis=1).to_list()
    ).to_frame('data')\
    .reset_index()\
    .rename(columns={'season': 'name'})\
    .to_dict('records')
)

groups_js = str(list(df['season'].unique()))

Plot

Source
display(HTML('''
<div id="chart-compare" style="min-height: 365px;"></div>
<script>
var options = {
  series: %s,
chart: {
  type: 'scatter',
  height: 500,
  width: 500,
  toolbar: {
    show: true
  },
  zoom: {
    enabled: true,
    type: 'xy'
  }
},
title: {
  text: 'Counts between Reddit & from TV scripts',
  align: 'center',
  margin: 10,
  style: {
    fontSize:  '18px',
  },
},
legend: {
  position: 'top',
  horizontalAlign: 'right',
},
xaxis: {
  title: {
    text: 'From Reddit comments',
    style: {
        fontSize: '15px',
        fontWeight: 100,
    },
  },
  min: -1,
  max: 15,
  tickAmount: 4,
},
yaxis: {
  title: {
    text: 'From TV scripts',
    style: {
        fontSize: '15px',
        fontWeight: 100,
    },
  },
  min: -1,
  max: 15,
  tickAmount: 4,
},
fill: {
  opacity: 0.5
}
};

var chart_cmp = new ApexCharts(document.querySelector("#chart-compare"), options);
chart_cmp.render();
</script>
''' %(
    data_js
)))
Loading...