“Outer Banks” Season 3 - you know

“Outer Banks” Season 3 - you know#

This notebook visualizes the number of “you know” from Season 3 of the TV show “Outer Banks”

Source code: https://codeberg.org/penguinsfly/tv-mania/src/branch/main/obx

Motivation#

Why? cuz When you know, you know. You know?

This is because the character Carlos Singh (newly introduced in season 3) says that way too much, almost as if it’s a gag from the writers. Sometime he would even start and end a sentence with it. So I just had to count this.

Unfornately, the only data source I could find for transcripts is Springfield! Springfield! (see below), which sadly do not contain the names of the characters for each line.

But Mr. Singh usually says it with a pause, which can be signified by commas. That makes things a bit easier but not perfect, hence the counts & visualizations below are only for fun.

Obtain data#

The data file outer-banks-2020_scripts.json was obtained using sf2 that downloaded the scripts from Springfield! Springfield!.

sf2 --show "outer-banks-2020" --season 3 --format json

Import modules#

Load and count#

df = pd.read_json('outer-banks-2020_scripts.json')
df = pd.concat(df.apply(you_know_script, axis=1).to_list(), ignore_index=True)\
        .sort_values(['episode','line_index']).reset_index(drop=True)
df

	has_start	has_middle	has_end	line	line_index	episode
0	True	False	False	You know, just some castaways.	325	1
1	True	False	False	You know, the usual.	379	1
2	False	False	True	would cause sparks, you know.	1005	1
3	False	False	True	we share certain interests, you know.	1026	1
4	False	False	True	people tried to find that gold, you know.	1047	1
...	...	...	...	...	...	...
79	False	True	False	And you too can live, you know, John,	728	10
80	False	True	False	and he did it here, you know,	743	10
81	False	False	True	and I'm gonna find you, you know.	1145	10
82	False	False	True	for much longer, you know.	1241	10
83	False	False	True	to keep you alive, you know.	1268	10

84 rows × 6 columns

df.filter(regex='has.*').sum(axis=0)

has_start     31
has_middle    16
has_end       38
dtype: int64

df.filter(regex='has.*').sum(axis=1).value_counts()

1    83
2     1
dtype: int64

Clean counts#

There were sentences that actually should have been considered one, hence the following attempts to clean and merge some of them

df = df.groupby('episode').apply(merge_line)
mdf = df.groupby(['episode', 'merged_line']).agg(list).reset_index()
for k in ['has_start','has_middle','has_end','merged']:
    mdf[k] = mdf[k].apply(any)  

mdf.filter(regex='has.*').sum(axis=0)

has_start     31
has_middle    16
has_end       38
dtype: int64

 mdf.filter(regex='has.*').sum(axis=1).value_counts()

1    77
2     4
dtype: int64

mdf['has_min_twice'] = mdf.filter(regex='has.*').sum(axis=1) > 1
mdf.query('has_min_twice == True')

	episode	merged_line	has_start	has_middle	has_end	line	line_index	merged	has_min_twice
12	2	183.5	True	True	False	[You know, I built, this fortune myself, you k...	[183, 184]	True	True
19	2	741.0	True	False	True	[You know, I never doubted you, you know.]	[741]	False	True
26	3	517.5	True	False	True	[You know, I can always tell, when people are ...	[517, 518]	True	True
27	3	578.5	True	False	True	[You know, 'cause I know where they live., And...	[578, 579]	True	True

mdf.loc[mdf['has_min_twice'], ['has_start','has_middle','has_end']] = False # avoid double counting

mdf.filter(regex='has.*').sum(axis=0)

has_start        27
has_middle       15
has_end          35
has_min_twice     4
dtype: int64

mdf.filter(regex='has.*').sum(axis=1).value_counts()

1    81
dtype: int64

Final dataframe for visualization#

The data frame is cumulative to facilitate stacked bars

	episode	only start	only middle	only end	at least twice
0	1	2	2	9	9
1	2	3	4	9	11
2	3	5	6	9	11
3	4	0	2	3	3
4	5	4	4	4	4
5	6	3	5	9	9
6	7	2	3	5	5
7	8	2	7	11	11
8	9	5	6	10	10
9	10	1	3	8	8

total_cnt = {
    rename_dict[k.replace('has_', '')]: v
    for k, v in total_cnt.items()
}
total = len(mdf)

Visualization#

../_images/fd2c13e98ad2ef3ebc0ea1c24545b27ffc702bee236c350ef2134473cf8f08af.png

	episode	only start	only middle	only end	at least twice
0	1	2	2	9	9
1	2	3	4	9	11
2	3	5	6	9	11
3	4	0	2	3	3
4	5	4	4	4	4
5	6	3	5	9	9
6	7	2	3	5	5
7	8	2	7	11	11
8	9	5	6	10	10
9	10	1	3	8	8

	episode	only start	only middle	only end	at least twice
0	1	2	2	9	9
1	2	3	4	9	11
2	3	5	6	9	11
3	4	0	2	3	3
4	5	4	4	4	4
5	6	3	5	9	9
6	7	2	3	5	5
7	8	2	7	11	11
8	9	5	6	10	10
9	10	1	3	8	8