!python -m spacy download en_core_web_sm
2022-03-12 10:53:53.256641: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-12 10:53:53.256697: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Collecting en-core-web-sm==3.2.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl (13.9 MB)
|████████████████████████████████| 13.9 MB 17.1 MB/s
Requirement already satisfied: spacy<3.3.0,>=3.2.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from en-core-web-sm==3.2.0) (3.2.3)
Requirement already satisfied: thinc<8.1.0,>=8.0.12 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (8.0.13)
Requirement already satisfied: pathy>=0.3.5 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.6.1)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.4.2)
Requirement already satisfied: jinja2 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.0.3)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.9.0)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.3.0)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.7.6)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (4.63.0)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.0.1)
Requirement already satisfied: typer<0.5.0,>=0.3.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.4.0)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.27.1)
Requirement already satisfied: packaging>=20.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (21.3)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.8.2)
Collecting typing-extensions<4.0.0.0,>=3.7.4; python_version < "3.8"
Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.0.6)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.0.6)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.0.6)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.0.6)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.8 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.0.9)
Requirement already satisfied: setuptools in /root/venv/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (47.1.0)
Requirement already satisfied: numpy>=1.15.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.21.5)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pathy>=0.3.5->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (5.2.1)
Requirement already satisfied: MarkupSafe>=2.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from jinja2->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.1.0)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from typer<0.5.0,>=0.3.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (8.0.4)
Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.0.12)
Requirement already satisfied: certifi>=2017.4.17 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.26.8)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from packaging>=20.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.0.7)
Requirement already satisfied: zipp>=0.5; python_version < "3.8" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from catalogue<2.1.0,>=2.0.6->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.7.0)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from click<9.0.0,>=7.1.1->typer<0.5.0,>=0.3.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (4.11.2)
Installing collected packages: en-core-web-sm, typing-extensions
Attempting uninstall: typing-extensions
Found existing installation: typing-extensions 4.1.1
Not uninstalling typing-extensions at /shared-libs/python3.7/py-core/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'typing-extensions'. No files were found to uninstall.
Successfully installed en-core-web-sm-3.2.0 typing-extensions-3.10.0.2
WARNING: You are using pip version 20.1.1; however, version 22.0.4 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
!python -m textblob.download_corpora
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data] Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data] Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data] Unzipping corpora/movie_reviews.zip.
Finished.
import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from spacy import displacy
import csv
from matplotlib import pyplot as plt
import pandas as pd
from collections import Counter
/shared-libs/python3.7/py/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
response = requests.get("https://gutenberg.org/files/345/345-0.txt")
if response.status_code == 200:
response.encoding = 'utf-8'
contents = response.text
print(contents[:500])
The Project Gutenberg eBook of Dracula, by Bram Stoker
This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United States, you
will have to check the laws of the country where you are located before
using this
text = contents[contents.find('\r\n\r\n\r\n\r\n\r\nDRACULA\r\n\r\n\r\n\r\n\r\n')
: contents.find('THE END')]
chapters = text.split('CHAPTER')
print(chapters[0])
DRACULA
chapters.pop(0)
print(chapters[0][:50])
print(chapters[1][:50])
print(chapters[-1][:50])
I
JONATHAN HARKER'S JOURNAL
(_Kept in short
II
JONATHAN HARKER'S JOURNAL--_continued_
XXVII
MINA HARKER'S JOURNAL
_1 November._
chapters = [chap[chap.find('\r\n'):]
for chap in chapters]
print(chapters[0][:50])
print(chapters[1][:50])
print(chapters[-1][:50])
JONATHAN HARKER'S JOURNAL
(_Kept in shortha
JONATHAN HARKER'S JOURNAL--_continued_
_5
MINA HARKER'S JOURNAL
_1 November._--All
text = ''.join([chap.replace('_', ' ')
.replace('\r\n', ' ')
for chap in chapters])
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
print(nlp.pipe_names)
['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'spacytextblob']
doc_book = nlp(text)
sentence_sentiments = [sentence._.polarity
for sentence
in doc_book.sents]
sentence_sentiments[:10]
sent_dic = list(zip(sentence_sentiments, list(doc_book.sents)))
sentence_sentiments = [sentiment
for sentiment
in sentence_sentiments
if sentiment != 0]
sentence_sentiments[:10]
plot_sentiments_average(sentence_sentiments, 30, 'the original Dracula novel')
book_plot_wiki = """Jonathan Harker, a newly qualified English solicitor, visits Count Dracula at his castle in the Carpathian Mountains to help the Count purchase a house near London. Ignoring the Count's warning, Harker wanders the castle and encounters three vampire women; Dracula rescues Harker, and gives the women a small child bound inside a bag. Harker awakens in bed; soon after, Dracula leaves the castle, abandoning him to the women; Harker escapes with his life and ends up delirious in a Budapest hospital. Dracula takes a ship for England with boxes of earth from his castle. The captain's log narrates the crew's disappearance until he alone remains, bound to the helm to maintain course. An animal resembling a large dog is seen leaping ashore when the ship runs aground at Whitby.
Lucy Westenra's letter to her best friend, Harker's fiancée Mina Murray, describes her marriage proposals from Dr. John Seward, Quincey Morris, and Arthur Holmwood. Lucy accepts Holmwood's, but all remain friends. Mina joins her friend Lucy on holiday in Whitby. Lucy begins sleepwalking. After his ship lands there, Dracula stalks Lucy. Mina receives a letter about her missing fiancé's illness, and goes to Budapest to nurse him. Lucy becomes very ill. Seward's old teacher, Professor Abraham Van Helsing, determines the nature of Lucy's condition, but refuses to disclose it. He diagnoses her with acute blood-loss. Van Helsing places garlic flowers around her room and makes her a necklace of them. Lucy's mother removes the garlic flowers, not knowing they repel vampires. While Seward and Van Helsing are absent, Lucy and her mother are terrified by a wolf and Mrs. Westenra dies of a heart attack; Lucy dies shortly thereafter. After her burial, newspapers report children being stalked in the night by a "bloofer lady" (beautiful lady), and Van Helsing deduces it is Lucy. The four go to her tomb and see that she is a vampire. They stake her heart, behead her, and fill her mouth with garlic. Jonathan Harker and his now-wife Mina have returned, and they join the campaign against Dracula.
Everyone stays at Dr. Seward's asylum as the men begin to hunt Dracula. Van Helsing finally reveals that vampires can only rest on earth from their homeland. Dracula communicates with Seward's patient, Renfield, an insane man who eats vermin to absorb their life force. After Dracula learns of the group's plot against him, he uses Renfield to enter the asylum. He secretly attacks Mina three times, drinking her blood each time and forcing Mina to drink his blood on the final visit. She is cursed to become a vampire after her death unless Dracula is killed. As the men find Dracula's properties, they discover many earth boxes within. The vampire hunters open each of the boxes and seal wafers of sacramental bread inside them, rendering them useless to Dracula. They attempt to trap the Count in his Piccadilly house, but he escapes. They learn that Dracula is fleeing to his castle in Transylvania with his last box. Mina has a faint psychic connection to Dracula, which Van Helsing exploits via hypnosis to track Dracula's movements. Guided by Mina, they pursue him.
In Galatz, Romania, the hunters split up. Van Helsing and Mina go to Dracula's castle, where the professor destroys the vampire women. Jonathan Harker and Arthur Holmwood follow Dracula's boat on the river, while Quincey Morris and John Seward parallel them on land. After Dracula's box is finally loaded onto a wagon by Szgany men, the hunters converge and attack it. After routing the Szgany, Harker slashes Dracula's neck and Quincey stabs him in the heart. Dracula crumbles to dust, freeing Mina from her vampiric curse. Quincey is mortally wounded in the fight against the Szgany. He dies from his wounds, at peace with the knowledge that Mina is saved. A note by Jonathan Harker seven years later states that the Harkers have a son, named Quincey."""
doc_book_plot = nlp(book_plot_wiki)
def calc_sentence_sentiments(doc):
#retrieve sentiment scores
temp = [sentence._.polarity for sentence in doc.sents]
# remove null values
temp = [sentiment for sentiment in temp if sentiment != 0]
return temp
wiki_sentiments = calc_sentence_sentiments(doc_book_plot)
doc_book._.polarity
doc_book_plot._.polarity
displacy.render(doc_book[:200], style='ent', jupyter='True')
characters_wiki = [ent.text for ent in doc_book_plot.ents if ent.label_ == 'PERSON']
characters_book = [ent.text for ent in doc_book.ents if ent.label_ == 'PERSON']
#novel's text
Counter(characters_book).most_common(15)
#plot on Wiki
Counter(characters_wiki).most_common(15)
print(set(characters_wiki) - set(characters_book))
set()
print(set(characters_wiki + ['another character']) - set(characters_book))
{'another character'}
set([ent.text for ent
in doc_book.ents
if ent.label_ == 'WORK_OF_ART'])
print([ent.text for ent
in doc_book.ents
if (ent.label_ == 'TIME')])
['early next morning', 'an hour', 'the night', 'all night', 'forcemeat', 'a little before eight', '7:30', 'more than an hour', 'midnight', 'afternoon', 'evening', 'earlier in the evening', 'An hour', 'early to-night', 'night', 'a few minutes of midnight', 'a few minutes', 'the morning', 'the afternoon', 'early morning', 'a good night', 'last night', 'last night', 'one night', 'an hour', 'the previous night', 'the last evening', 'hour after hour', 'a few hours', 'Good-morning', 'a few hours', 'morning', 'Last evening', 'this evening', 'Last night', 'the next morning', 'a minute', 'A minute later', 'an hour or', '--This morning', 'before morning', 'a couple of hours', 'morning', 'the night', 'this morning', 'Last night', 'a few hours', 'morning', 'morning', 'a minute or', 'morning', "ten o'clock", 'Good-night', 'an hour ago', 'night', 'a few moments', 'Last night', "a few minutes'", 'evening', 'the afternoon', 'midnight', 'A little after midnight', 'a very few minutes', 'last night', 'this morning', 'Early this morning', 'noon', 'last night', 'last night', "a few hours'", 'night', 'midnight', "from few minutes'", 'a few seconds', 'last night', 'last night', 'this morning', 'this to-night', "11 o'clock", 'every night', 'a minute or so', 'night', 'noon', 'this morning', 'night', 'the previous morning', 'This afternoon', 'night', 'hours', 'night', 'Last night', '9:30 to-night', '4:30 to-morrow afternoon', 'Last night', 'all that night', 'evening', 'last night', "About eight o'clock", 'half an hour', 'night', 'ten minutes', 'a few minutes', 'hours', 'this afternoon', 'an hour', 'a little after the hour', 'night', 'one night', 'just before dawn', 'Three nights', 'a few hours', 'an hour', 'this night', 'Last night', 'This morning', "two o'clock", 'a few seconds', 'noon', 'noon', 'about five minutes', "five o'clock", 'a few minutes', 'at high noon', 'an hour', 'this morning', 'all night', 'dusk', 'the early morning', 'early in the morning', 'last night', 'a couple of hours', 'Good-night', '--This afternoon', 'To-night', 'two nights', 'Good-night', "eight o'clock", 'a lovely morning', 'morning', 'this morning', 'this night', 'another hour', 'nights', 'morning', 'last night', 'about two hours', 'night', "Just before twelve o'clock", 'twenty-two hours', 'this night', 'this night', 'such an hour', "ten o'clock", 'seconds', 'hours', 'a few seconds later', 'early in the morning', 'hour', 'late in the afternoon', '--All last night', "nearly six o'clock", 'the hours', "six o'clock", 'This afternoon', 'half an hour', 'a few minutes', 'the long hours', "six o'clock", 'five minutes', 'another half hour', 'evening', 'the to-night', 'noon', 'five minutes', "five o'clock", 'the night before', 'Every hour', 'the night', "a few minutes'", 'about twenty minutes', 'late in the evening', 'the following morning', 'night', 'last night', 'late in the morning', 'a bad night', 'last night', 'hours', 'this very hour', 'night', 'this morning', 'night', "two o'clock", 'a few minutes', 'hour', "6 o'clock", '6:25 to-night', "eight o'clock", 'last night', 'a few minutes', 'each hour', 'last night', 'the previous night', 'this afternoon', 'the night', 'another night', "About ten o'clock", 'the night', "a few hours'", 'noon', "two o'clock", 'noon', 'more than an hour', 'last night', 'last night', 'last night', 'one hour', 'this night', 'night', 'morning', 'Last night', "a little before ten o'clock", 'this night', 'this night', 'The night', 'several minutes', 'Last night', 'up to an hour ago', 'noon', 'night', "little before twelve o'clock", 'last night', 'a few minutes', 'about fifteen minutes', 'an hour', 'night', 'a few minutes', "nine o'clock", 'to-night', 'the last hour', 'nights', 'nights', "five o'clock", 'a minute', 'Good-evening', 'five minutes', 'this morning', 'this very hour', "nine o'clock", 'two hours', "six o'clock", 'the hours of the day', 'noon', 'A minute later', 'between the hours of noon', 'this night', 'a few minutes', 'the morning', 'this very hour', 'Good-night', 'a few minutes', 'A few minutes later', 'about a minute three', 'our night', 'a few seconds', 'later in the day', 'noon', "last night's", 'this morning', 'This morning', 'one morning', 'last night', 'the morning', 'night', 'a good night', 'Last night', 'afternoon', 'Good-night', 'evening', 'evening', 'the previous night', "five o'clock that morning", "twelve o'clock", 'dusk', 'a few seconds', 'night', 'before morning', 'this morning', 'This morning', 'last night', 'a little after midnight', 'a few minutes', 'a few minutes', 'a few minutes', 'hours', 'this afternoon', 'a few seconds', 'a few seconds', 'to-night', 'this hour', 'a few seconds', 'a couple of minutes', "six o'clock", 'half an hour', 'this morning', 'the minutes and seconds', 'all hour', "ten o'clock", 'last night', 'that night', 'a few minutes', 'this evening', "12:30 o'clock", 'Last night', 'this morning', 'one hour', '12:45', 'twenty minutes past one', 'hourly', 'a minute', "five o'clock", 'the night', 'last night', 'all night', 'morning', 'two or three minutes later', 'this hour', 'a few minutes', 'several minutes', "three o'clock", 'last night', 'last afternoon', "about five o'clock", 'the river that hour', 'morning', 'this morning', 'half an hour', 'every hour', 'morning', 'afternoon', 'this morning', 'midnight', 'morning', 'a minute', 'Evening', 'some half hour', 'A very few minutes', 'the morning of the 12th', 'the same night', "about five o'clock", 'this morning', "only about 24 hours'", "one o'clock", 'this morning', 'an hour ago', 'About noon', 'some hours', 'last evening', 'last night', 'this morning', 'a few seconds', '6:30', 'night', 'hour', 'so many hours', 'a great hour', 'Last night', 'the night', 'daytime', 'Early this morning', 'three hours', 'this morning', 'a few seconds', 'this morning', 'an hour', 'evening', 'half an hour', 'every minute', 'night', 'night', 'a minute', 'another hour', 'three hours', 'every hour', 'night', 'evening', 'Early this morning', 'morning', 'all night', 'this morning', 'about noon', 'only a few hours', 'noon', 'this morning', 'an hour', 'morning', 'all night', 'night', 'morning', 'morning', 'morning', 'that night', 'before morning', 'morning', 'night', 'the cold hour', 'the cold hour', 'evening', 'afternoon', 'late in the afternoon', 'last night', 'less than an hour']
for ent in doc_book_plot.ents:
if ent.label_ != 'PERSON' and ent.label_ != 'CARDINAL':
print(ent.text, ent.label_)
English NORP
the Carpathian Mountains EVENT
Count ORG
London GPE
Count ORG
Budapest GPE
England GPE
Whitby ORG
Budapest GPE
Van Helsing ORG
Seward ORG
Mina ORG
Transylvania GPE
Van Helsing ORG
Galatz GPE
Romania GPE
Szgany LOC
Szgany LOC
seven years later DATE
Harkers ORG
SELECT *, LEN(Plot)
FROM '/work/dracula_nosferatu_plots.csv'
1922 - 2014
Dracula11.1%
Dracula's Daughter3.7%
23 others85.2%
0
1931
Dracula
1
1936
Dracula's Daughter
2
1943
Son of Dracula
3
1945
House of Dracula
4
1958
The Return of Dracula
5
1966
Billy the Kid vs. Dracula
6
1969
Blood of Dracula's Castle
7
1971
Dracula vs. Frankenstein
8
1979
Dracula
9
1992
Bram Stoker's Dracula
27 rows, showing
per page
Page of 3
with open('/work/dracula_nosferatu_plots.csv', newline='') as f:
reader = csv.reader(f)
movie_plots = list(reader)
class movie:
instances = []
def __init__(self, item, varname):
self.name = varname
self.release_year = item[0]
self.title = item[1]
self.plot = item[-1]
self.__class__.instances.append(self)
def spacy_magic(self):
self.nlp = nlp(self.plot)
for film in movie_plots[1:]:
#make name
varname = film[0] + film[1][:3]
#make movie object
globals()[varname] = movie(film, varname)
for film in movie.instances:
film.spacy_magic()
Dracula 1931 new characters:
{'Helen Chandler', 'Dwight Frye', 'John Harker', 'Lucy Weston', 'Dracula with Mina', 'Carfax Abbey', 'Joan Standing', 'Edward Van Sloan', 'Briggs', 'Herbert Bunston', 'Bela Lugosi', 'Nurse Briggs', "Count Dracula's", 'David Manners'}
Dracula's Daughter 1936 new characters:
{'Irving Pichel', 'Otto Kruger', 'Countess Zaleska', 'Countess', 'Gloria Holden', 'Von Helsing', 'Countess Marya', 'Sandor', 'Janet', 'Edward Van Sloan', 'Garth', 'Lili (Nan Grey', 'Jeffrey Garth'}
Son of Dracula 1943 new characters:
{'Katherine Caldwell', 'Brewster', 'Katherine', 'George Irving', 'Claire', 'Frank', 'Alucard', 'Louise Allbritton', 'Frank Stanley', 'Caldwell'}
House of Dracula 1945 new characters:
{'Niemann', 'Onslow Stevens', "Martha O'Driscoll", 'Latos', 'Lawrence Talbot', 'Frankenstein', 'Steinmuhl', 'Skelton Knaggs', 'Franz Edlemann', 'Lionel Atwill', 'Holtz', 'Jane Adams', 'Talbot', 'Edlemann'}
The Return of Dracula 1958 new characters:
{'Jennie', 'Whitfield', 'Cora', 'Tim', 'John Meierman', 'Rachel', 'Bryant', 'Bellac Gordal', 'Mickey', 'Bellac', 'Meierman', 'Mack Bryant'}
Billy the Kid vs. Dracula 1966 new characters:
{"Billy the Kid's", 'Underhill', 'Billy', 'Betty Bentley'}
Blood of Dracula's Castle 1969 new characters:
{'John Carradine', 'Paula Raymond', 'George (', 'Johnny'}
Dracula vs. Frankenstein 1971 new characters:
{'Judith Fontaine', 'Mike Howard', 'Rico', 'Frankenstein', 'Mike', 'Forrest J. Ackerman', 'Greydon Clark', 'Judith', 'Anthony Eisley', 'Grazbo', 'Monster', 'Durea', 'Martin', 'Angelo Rossitto', 'Joanie', 'Anne Morrell', 'J. Carrol Naish', 'Roger Engel', 'John Bloom', 'Beaumont', 'Jim Davis', 'Russ Tamblyn', 'Groton'}
Dracula 1979 new characters:
{'Donald Pleasence', 'Demeter', 'James Bond', 'Trevor Eve', 'Carfax Abbey', 'Frank Langella', 'Kate Nelligan', 'Lucy Seward', 'Jan Francis', 'Tony Haygarth', 'Milo Renfield', 'Maurice Binder'}
Bram Stoker's Dracula 1992 new characters:
{'Vlad Dracula', 'Carfax Abbey', 'hunt Dracula', 'Jonathan meets Dracula'}
Dracula: Dead and Loving It 1995 new characters:
{'Dwight Frye', 'Thomas Renfield', 'Carfax Abbey', 'Dracula grabs Mina', "Count Dracula's"}
Dracula 2000 2000 new characters:
{'Valerie', 'Lucy Westerman', 'Simon Sheppard', 'Carfax Abbey', 'Simon', 'Matthew Van Helsing', 'Mary Heller', 'Mary escape'}
Dracula Reborn 2012 new characters:
{'Krash Miller', 'Linda Bella', 'Vladimir Sarkany', 'Lucy Spencer', 'Ian Pfister', 'Quincy Morris', 'Corey Landis', 'Quincy', 'Lina', 'Stuart Rigby', 'Keith Reay', 'Dani Lennon', 'Joan Seward', 'James Hillier', 'Charlie Garcia', 'Victoria Summer'}
Dracula: The Dark Prince 2013 new characters:
{'Leonardo', 'Andros', 'Erzebet', 'Lightbringer', 'Demetria', 'Leonardo Van Helsing', 'Lucian', 'Lucien'}
Dracula Untold 2014 new characters:
{'Mirena', 'Broken Tooth Mountain', 'Mehmed', 'Sultan Mehmed II', 'Vlad', 'Janissaries', 'Impaler', 'Vlad Țepeș'}
Dracula 1958 new characters:
{'Holmwoods', 'Soon Dracula', 'Tania', 'Gerda', 'Lucy Holmwood', 'Karlstadt'}
The Brides of Dracula 1960 new characters:
{'Baroness', 'Stepnik', 'Meinster', 'Gina', 'Helsing', "M. R. James'", 'Marianne Danielle', 'Marianne', 'Greta'}
Dracula: Prince of Darkness 1966 new characters:
{'Diana', 'Helen', 'Sandor', 'Klove', 'Alan', 'Charles shoots', 'Charles'}
Dracula Has Risen from the Grave 1968 new characters:
{'Barry Andrews', 'Zena', 'Barbara Ewing', 'Veronica Carlson', 'Ernest Mueller', 'Ewan Hooper', 'Mueller', 'Rupert Davies', 'Anna', 'Maria under', 'Paul', 'Maria'}
Taste the Blood of Dracula 1969 new characters:
{'Jeremy', 'Courtley', 'Alice', 'Hargood', 'William Hargood', 'Samuel Paxton', 'Prayer', 'Jonathon Secker', 'Paul', 'Paxton'}
Scars of Dracula 1970 new characters:
{'Sarah Framsen', 'Simon Carlson', 'Klove', 'Tania', 'Paul Carlson', 'Sarah'}
Countess Dracula 1971 new characters:
{'Ilona', 'Fabio', 'Toth', 'Dobi', 'Julie', 'Elisabeth', 'Hungarian Countess', 'Countess Ilona', 'Imre Toth', 'Countess Dracula'}
Dracula AD 1972 1972 new characters:
{'Marsha Hunt', 'Laura Bellows', 'Stephanie Beacham', 'Jessica', 'Britons', 'Lorrimer', 'Jessica Van Helsing', 'Gaynor Keating', 'Caroline Munro', 'Michael Coles', 'Bob', 'Laura', 'Peter Cushing', 'Philip Miller', 'Christopher Neame', 'Lorrimer Van Helsing', 'Alucard accidentally', 'Johnny Alucard', 'Alucard'}
The Satanic Rites of Dracula 1974 new characters:
{'Valerie Van Ost', 'Torrence', 'Jessica', "Maurice O'Connell", 'Joanna Lumley', 'Julian Keeley', 'Keeley', 'John Porter', 'Jane', 'Richard Vernon', 'Michael Coles', 'Chin Yang', 'William Franklyn', 'Peter Cushing', 'Christopher Lee', 'Lorrimer Van Helsing', 'D. D. Denham', 'Barbara Yu Ling', 'Richard Mathews'}
Dracula: Pages from a Virgin's Diary 2002 new characters:
{'Jonathon Harker'}
Lake of Dracula 1971 new characters:
{'Takashi Saki', 'Maki', 'Takahashi', 'Saki', 'Akiko', 'Mori Kishida', 'Kusaku', 'Natsuko'}
Nosferatu: A Symphony of Horror 1922 new characters:
{'Bulwer', 'Herr Knock', 'Hutter', 'Ellen', 'Count Orlok', 'Thomas Hutter', 'Harding', "Count Orlok's"}
for film in movie.instances:
print('\n', film.title, film.release_year, 'focus:')
focus = [ent.text for ent in film.nlp.ents if (ent.label_ in ('GPE', 'FAC', 'ORG'))]
print(set(focus))
Dracula 1931 focus:
{'Mina', 'Van Helsing', 'England', 'Harker', 'Vesta', 'Transylvania', 'London'}
Dracula's Daughter 1936 focus:
{'Scotland Yard', 'Countess', 'Castle Dracula', 'Zaleska', 'Transylvania', 'London'}
Son of Dracula 1943 focus:
{'Brewster', 'Count', 'Katherine', 'Dark Oaks', 'U.S.', 'New Orleans', 'Alucard', 'Sheriff'}
House of Dracula 1945 focus:
{'Visaria', 'Count', 'Wolfman', 'Nina', 'Edlemann', 'Milizia', 'Talbot'}
The Return of Dracula 1958 focus:
{'Jennie', 'Carleton', 'Germany', 'Rachel', 'the United States', 'Mickey', 'Bellac', 'California'}
Billy the Kid vs. Dracula 1966 focus:
{'Count', 'Bentley', 'Betty'}
Blood of Dracula's Castle 1969 focus:
{'Count and Countess Townsend', 'California', "Shea's Castle", 'Arizona', 'Bloody Marys', 'Lancaster'}
Dracula vs. Frankenstein 1971 focus:
{'Judith', 'Jr.', 'Emporium', 'Lon Chaney', 'Las Vegas', 'Durea', 'Venice', 'Samantha', 'Martin', 'the Creature Emporium', 'California'}
Dracula 1979 focus:
{'Count', 'Romania', 'Whitby', 'England', 'Harker', 'Transylvania'}
Bram Stoker's Dracula 1992 focus:
{'Count', 'Varna', 'Romania', 'London', 'Van Helsing', 'Bowie', 'England', 'Elisabeta', 'Jonathan and Mina', 'Seward', 'Transylvania', 'the Order of the Dragon', 'Holmwood'}
Dracula: Dead and Loving It 1995 focus:
{'Van Hesling', 'England', 'opera house', 'Transylvania', 'Carfax', 'London', 'Moldavian'}
Dracula 2000 2000 focus:
{'Solina', 'Louisiana', 'Van Helsing', 'Marcus', 'England', 'New Orleans', 'London', 'America', 'Sun'}
Dracula Reborn 2012 focus:
{'Varna', 'Los Angeles', 'Van Helsing', 'Lina', 'Holmwood', 'California'}
Dracula: The Dark Prince 2013 focus:
{'Esme', 'Erzebet', 'Alina', 'Abel', "Lucien's"}
Dracula Untold 2014 focus:
{'Renaissance', 'Mirena', 'the Ottoman Empire', 'London', 'Mehmed', 'Cozia Monastery', 'Castle Dracula', 'the Prince of Wallachia', 'Transylvania', 'Easter', 'Janissaries', 'Vlad', 'Îngeraș', 'Ingeras'}
Dracula 1958 focus:
{'Arthur guard', "Van Helsing's", 'Ingolstadt', 'Klausenberg', 'Arthur', 'Van Helsing', 'Tania', 'Gerda', 'Harker', 'Dracula', 'the Virgin Mary', 'Klausenburg'}
The Brides of Dracula 1960 focus:
{"Van Helsing's", 'Baroness', 'Severin', 'Van Helsing', 'Transylvania', 'Baron', 'Marianne'}
Dracula: Prince of Darkness 1966 focus:
{'Diana', 'Count', 'Kents', 'Van Helsing', 'Helen', 'Ludwig', 'Alan', 'Karlsbad'}
Dracula Has Risen from the Grave 1968 focus:
{'Castle Dracula', 'Keinenberg', 'Mueller'}
Taste the Blood of Dracula 1969 focus:
{'Jeremy', 'Secker', 'Alice', 'Hargood', 'Paul and Alice', 'Weller', 'the Cafe Royal', 'Courtley', 'Paul', 'Paxton'}
Scars of Dracula 1970 focus:
{'Count', 'Castle Dracula', 'Klove', 'Tania', 'Kleinenberg'}
Countess Dracula 1971 focus:
{'Hungary', 'Dobi'}
Dracula AD 1972 1972 focus:
{'Lorrimer Van Helsing', "St. Bartolph's", 'Van Helsing', 'Jessica', 'Murray'}
The Satanic Rites of Dracula 1974 focus:
{"Van Helsing's", 'Colonel Mathews', 'Torrence', 'Count', 'Keeley', 'Van Helsing', "Scotland Yard's", 'Secret Service', 'Freddie Jones', 'Murray', 'Undead'}
Dracula: Pages from a Virgin's Diary 2002 focus:
{"Van Helsing's", 'the Brides of Dracula', 'Van Helsing', 'Castle Dracula', 'Brides of Dracula', 'Harker', 'London'}
Lake of Dracula 1971 focus:
{'Japan', 'Natsuko', 'Natsuku'}
Nosferatu: A Symphony of Horror 1922 focus:
{'Orlok', 'Transylvania', 'Wisborg'}
for film in movie.instances:
print('\n', film.title, film.release_year, 'average plot mood: ', round(film.nlp._.polarity, 2))
Dracula 1931 average plot mood: 0.02
Dracula's Daughter 1936 average plot mood: -0.04
Son of Dracula 1943 average plot mood: -0.01
House of Dracula 1945 average plot mood: -0.1
The Return of Dracula 1958 average plot mood: -0.1
Billy the Kid vs. Dracula 1966 average plot mood: 0.34
Blood of Dracula's Castle 1969 average plot mood: -0.05
Dracula vs. Frankenstein 1971 average plot mood: 0.1
Dracula 1979 average plot mood: 0.0
Bram Stoker's Dracula 1992 average plot mood: -0.09
Dracula: Dead and Loving It 1995 average plot mood: -0.0
Dracula 2000 2000 average plot mood: -0.03
Dracula Reborn 2012 average plot mood: 0.04
Dracula: The Dark Prince 2013 average plot mood: 0.03
Dracula Untold 2014 average plot mood: 0.07
Dracula 1958 average plot mood: -0.02
The Brides of Dracula 1960 average plot mood: -0.01
Dracula: Prince of Darkness 1966 average plot mood: 0.03
Dracula Has Risen from the Grave 1968 average plot mood: 0.06
Taste the Blood of Dracula 1969 average plot mood: -0.08
Scars of Dracula 1970 average plot mood: 0.05
Countess Dracula 1971 average plot mood: 0.1
Dracula AD 1972 1972 average plot mood: -0.1
The Satanic Rites of Dracula 1974 average plot mood: -0.02
Dracula: Pages from a Virgin's Diary 2002 average plot mood: 0.06
Lake of Dracula 1971 average plot mood: -0.02
Nosferatu: A Symphony of Horror 1922 average plot mood: 0.11
1945Hou3.7%
1972Dra3.7%
25 others92.6%
Dracula11.1%
House of Dracula3.7%
23 others85.2%
3
1945Hou
House of Dracula
22
1972Dra
Dracula AD 1972
4
1958The
The Return of Dracula
9
1992Bra
Bram Stoker's Dracula
19
1969Tas
Taste the Blood of Dracula
6
1969Blo
Blood of Dracula's Castle
1
1936Dra
Dracula's Daughter
11
2000Dra
Dracula 2000
23
1974The
The Satanic Rites of Dracula
25
1971Lak
Lake of Dracula
27 rows, showing
per page
Page of 3
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
This is separate from the ipykernel package so we can avoid doing imports until
def display_only_entities(doc, types):
previous = [] # to look for duplicates
for ent in doc.nlp.ents:
if (ent.text in previous) == False:
if ent.label_ in types:
displacy.render(ent, style='ent', jupyter='True')
previous.append(ent.text)
movienum
15 / 28
focus
family = []
for film in movie.instances:
family = family + [ent.text for ent in film.nlp.ents
if ent.label_ == 'PERSON' and 'helsing' in ent.text.lower()
and len(ent.text.split(' ')) >= 3]
print(*set(family), sep='\n')
Lorrimer Van Helsing
Jessica Van Helsing
Abraham Van Helsing
Matthew Van Helsing
Leonardo Van Helsing