[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
0
ablaze
nan
1
ablaze
nan
2
ablaze
New York City
3
ablaze
Morgantown, WV
4
ablaze
nan
5
ablaze
OC
6
ablaze
London, England
7
ablaze
Bharat
8
ablaze
Accra, Ghana
9
ablaze
Searching
0
ablaze
nan
1
ablaze
nan
2
ablaze
New York City
3
ablaze
Morgantown, WV
4
ablaze
nan
5
ablaze
OC
6
ablaze
London, England
7
ablaze
Bharat
8
ablaze
Accra, Ghana
9
ablaze
Searching
0
ablaze
'nan'
1
ablaze
'nan'
2
ablaze
'New York City'
3
ablaze
'Morgantown, WV'
4
ablaze
'nan'
5
ablaze
'OC'
6
ablaze
'London, England'
7
ablaze
'Bharat'
8
ablaze
'Accra, Ghana'
9
ablaze
'Searching'
0
ablaze
'nan'
1
ablaze
'nan'
2
ablaze
'New York City'
3
ablaze
'Morgantown, WV'
4
ablaze
'nan'
5
ablaze
'OC'
6
ablaze
'London, England'
7
ablaze
'Bharat'
8
ablaze
'Accra, Ghana'
9
ablaze
'Searching'
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:11: FutureWarning: The default value of regex will change from True to False in a future version.
# This is added back by InteractiveShellApp.init_path()
0
ablaze
1
ablaze
2
ablaze
New York City
3
ablaze
Morgantown, WV
4
ablaze
5
ablaze
OC
6
ablaze
London, England
7
ablaze
Bharat
8
ablaze
Accra, Ghana
9
ablaze
Searching
<class 'pandas.core.frame.DataFrame'>
Int64Index: 11370 entries, 0 to 11369
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 keyword 11370 non-null object
1 location 11370 non-null object
2 text 11370 non-null object
3 target 11370 non-null int64
dtypes: int64(1), object(3)
memory usage: 444.1+ KB
Data Type
________________________
We have 0 missing locations
Data Type Converted
________________________
We have 397 duplicated texts
80
aftershock
Philippines ,From America
303
annihilation
Adelaide, South Australia
336
apocalypse
Lat Krabang, Bangkok
337
apocalypse
348
apocalypse
Brisbane, Australia
378
armageddon
無断転載及び加工自作発言SNS使用等保存一言ください
604
attacked
Germany
620
attacked
Arkansas
623
attacked
BTS World
627
attacked
Anchorage, AK
/root/venv/lib/python3.7/site-packages/pyproj/crs/crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
in_crs_string = _prepare_from_proj_string(in_crs_string)
0
USA
39.7837304
1
UK
54.7023545
2
India
22.3511148
3
Australia
-24.7761086
4
Worldwide
52.4808348
2
ablaze
USA
58
accident
USA
71
aftershock
USA
78
aftershock
USA
85
aftershock
USA
110
aftershock
USA
116
airplane accident
USA
140
airplane accident
USA
144
airplane accident
USA
151
airplane accident
USA
<wordcloud.wordcloud.WordCloud object at 0x7fb6148b7610>
2
ablaze
USA
78
aftershock
USA
110
aftershock
USA
116
airplane accident
USA
140
airplane accident
USA
300
annihilation
USA
333
apocalypse
USA
458
arson
USA
538
attack
USA
589
attack
USA
<wordcloud.wordcloud.WordCloud object at 0x7fb61501a710>
58
accident
USA
71
aftershock
USA
85
aftershock
USA
144
airplane accident
USA
151
airplane accident
USA
231
annihilated
USA
263
annihilated
USA
272
annihilated
USA
370
armageddon
USA
373
armageddon
USA
<wordcloud.wordcloud.WordCloud object at 0x7fb614c7e810>
6
ablaze
UK
153
airplane accident
UK
207
ambulance
UK
213
ambulance
UK
219
ambulance
UK
223
ambulance
UK
244
annihilated
UK
245
annihilated
UK
256
annihilated
UK
258
annihilated
UK
<wordcloud.wordcloud.WordCloud object at 0x7fb6149cbd10>
6
ablaze
UK
244
annihilated
UK
342
apocalypse
UK
599
attack
UK
611
attacked
UK
696
avalanche
UK
1298
body bag
UK
1307
body bag
UK
1318
body bag
UK
1350
body bag
UK
<wordcloud.wordcloud.WordCloud object at 0x7fb614e01810>
153
airplane accident
UK
207
ambulance
UK
213
ambulance
UK
219
ambulance
UK
223
ambulance
UK
245
annihilated
UK
256
annihilated
UK
258
annihilated
UK
431
army
UK
585
attack
UK
<wordcloud.wordcloud.WordCloud object at 0x7fb614cdc790>
32
ablaze
India
317
annihilation
India
318
annihilation
India
391
army
India
395
army
India
544
attack
India
559
attack
India
583
attack
India
636
attacked
India
660
attacked
India
<wordcloud.wordcloud.WordCloud object at 0x7fb60a842f50>
32
ablaze
India
395
army
India
544
attack
India
675
avalanche
India
677
avalanche
India
1111
blizzard
India
1758
buildings on fire
India
2029
casualties
India
2649
crash
India
3231
deaths
India
<wordcloud.wordcloud.WordCloud object at 0x7fb6155d2c50>
317
annihilation
India
318
annihilation
India
391
army
India
559
attack
India
583
attack
India
636
attacked
India
660
attacked
India
1000
bleeding
India
1176
blood
India
1190
blood
India
<wordcloud.wordcloud.WordCloud object at 0x7fb61495cfd0>
0
ablaze
1
ablaze
2
ablaze
USA
3
ablaze
Morgantown, WV
4
ablaze
0
ablaze
1
ablaze
2
ablaze
USA
3
ablaze
Morgantown, WV
4
ablaze
Accuracy score train: 0.9477792436235708
Accuracy score test 0.8698328935795955
__________________________________________________
-Classification report train:
precision recall f1-score support
0 0.99 0.94 0.97 7395
1 0.80 0.97 0.87 1701
accuracy 0.95 9096
macro avg 0.89 0.95 0.92 9096
weighted avg 0.96 0.95 0.95 9096
-Classification report test:
precision recall f1-score support
0 0.95 0.89 0.92 1861
1 0.61 0.77 0.68 413
accuracy 0.87 2274
macro avg 0.78 0.83 0.80 2274
weighted avg 0.89 0.87 0.88 2274
Compute Area Under the Receiver Operating Characteristic Curve train: 0.954530147520907
Compute Area Under the Receiver Operating Characteristic Curve test: 0.8309846693893908
__________________________________________________
Confusion Matrix Train:
[[6979 416]
[ 59 1642]]
Confusion Matrix Test:
[[1660 201]
[ 95 318]]