Counting The Words A Character Said In A Movie Script

October 31, 2022 Post a Comment

I already managed to uncover the spoken words with some help. Now I'm looking for to get the text spoken by a chosen person. So I can type in MIA and get every single words she is

Solution 1:

I would ask the user for all the names in the script first. Then ask which name they want the words for. I would search the text word by word till I found the name wanted and copy the following words into a variable until I hit a name that matches someone else in the script. Now people could say the name of another character, but if you assume titles for people speaking are either all caps, or on a single line, the text should be rather easy to filter.

for word in script:
    if word == speaker and word.isupper(): # you may want to check that this is on its own line as well.
        recording = True
    elif word in character_names and word.isupper():  # you may want to check that this is on its own line as well.
        recording = False

    if recording:
        spoken_text += word + " "

Solution 2:

I will outline how you could generate a dict which can give you the number of words spoken for all speakers and one which approximates your existing implementation.

General Use

If we define a word to be any chunk of characters in a string split along ' ' (space)...

import re

speaker = '' # current speaker
words = 0 # number of words on line
word_count = {} # dict of speakers and the number of words they speak

for line in script.split('\n'):
    if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
    if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            words = len(line.split())
            if speaker in word_count:
                 word_count[speaker] += words
            else:
                 word_count[speaker] = words

Generates a dict with the format {'JOHN DOE':55} if John Doe says 55 words.

Example output:

>>> word_count['MIA']

13

Your Implementation

Here is a version of the above procedure that approximates your implementation.

import re

def wordsspoken(script,name):
    word_count = 0
    for line in script.split('\n'):
        if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
        if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            if speaker == name:
                word_count += len(line.split())
    print(word_count)

def main():
    name = input("Enter name:")
    wordsspoken(script, name)
    name1 = input("Enter another name:")
    wordsspoken(script, name1)

Solution 3:

If you want to compute your tally with only one pass over the script (which I imagine could be pretty long), you could just track which character is speaking; set things up like a little state machine:

import re
from collections import Counter, defaultdict

words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'

for line in SCRIPT.split('\n'):
    name = line.replace('(CONT\'D)', '').strip()
    if re.match('^[A-Z]+$', name):
        currently_speaking = name
    else:
        words_spoken[currently_speaking].update(line.split())

You could use a more sophisticated regex to detect when the speaker changes, but this should do the trick.

demo

Solution 4:

There are some good ideas above. The following should work just fine in Python 2.x and 3.x:

import codecs
from collections import defaultdict

speaker_words = defaultdict(str)

with codecs.open('script.txt', 'r', 'utf8') as f:
  speaker = ''
  for line in f.read().split('\n'):
    # skip empty lines
    if not line.split():
      continue

    # speakers have their names in all uppercase
    first_word = line.split()[0]
    if (len(first_word) > 1) and all([char.isupper() for char in first_word]):
      # remove the (CONT'D) from a speaker string
      speaker = line.split('(')[0].strip()

    # check if this is a dialogue line
    elif len(line) - len(line.lstrip()) == 6:
      speaker_words[speaker] += line.strip() + ' '

# get a Python-version-agnostic input
try:
  prompt = raw_input
except:
  prompt = input

speaker = prompt('Enter name: ').strip().upper()
print(speaker_words[speaker])

Example Output:

Enter name: sebastian
I know what you mean. I get breakfast five miles out of the way just to sit outside a jazz club. It was called Van Beek. The swing bands played there. Count Basie. Chick Webb. It's a samba-tapas place now. Samba-tapas. It's... Exactly. The joke's on history.

Python College