战争不仅是一种政治行为,而且是一种真正的政治工具,是政治交往的继续,是政治交往通过另一种手段的实现。 ——克劳塞维茨
随着信息网络技术的快速发展,现代战争已经迈入到了全民参与甚至是全民直播的新维度。这一点在本次的俄乌战争中,体现得淋漓尽致。俄乌战争今天已经进入第28天了,俄乌双方围绕战损各说各话,都希望通过战功鼓舞己方士气,瓦解对手意志,在舆论战中能够取得一定成效。 Oryxspioenkop网站开设了一个专题,由众多军事爱好者通过公开来源搜集实际战损图片和视频统计俄乌双方的实际战损,可信度比较高。专题地址:https://www.oryxspioenkop.com/2022/02/attack-on-europe-documenting-equipment.html。
统计的每项装备都有实际确凿的视频或图像,小编简单对比了几项俄乌官方统计的数据,相对来说还是比较客观准确地。不过有个问题,这里面的统计都是文本形式,没有做成表格,非常不便于比较分析。这里提供一个Python脚本程序,每天能自动爬取相关数据,并生成报表。废话不说,直接上代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import datetime as dt
import warnings
warnings.filterwarnings("ignore")
def download_page():
url = "https://www.oryxspioenkop.com/2022/02/attack-on-europe-documenting-equipment.html"
response = requests.get(url)
html = BeautifulSoup(response.text, html.parser)
return html
def format_dataframe(df):
flagmap = {
https://upload.wikimedia.org/ *** /commons/thumb/a/a9/Flag_of_the_Soviet_Union.svg/23px-Flag_of_the_Soviet_Union.svg.png: ussr,
https://upload.wikimedia.org/ *** /en/thumb/f/f3/Flag_of_Russia.svg/23px-Flag_of_Russia.svg.png: russian federation,
https://upload.wikimedia.org/ *** /commons/thumb/4/49/Flag_of_Ukraine.svg/23px-Flag_of_Ukraine.svg.png: ukraine,
https://upload.wikimedia.org/ *** /commons/thumb/8/85/Flag_of_Belarus.svg/23px-Flag_of_Belarus.svg.png: belarus,
https://upload.wikimedia.org/ *** /en/thumb/a/ae/Flag_of_the_United_Kingdom.svg/23px-Flag_of_the_United_Kingdom.svg.png: united kingdom,
https://upload.wikimedia.org/ *** /en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png: united states
}
df[category] = df.subheading.apply(lambda _: _.split(()[0])
df[country] = df.heading.apply(lambda _: _.split(-)[0].strip())
df[equipment_origin] = df.flag.map(flagmap)
return df
def parse_article(html):
dataset = []
article = html.select(.post-body.entry-content div)[-1]
blocks = html.select(article h3, article ul)
heading =
subheading =
for block in blocks:
if block.name == h3:
if color: red in str(block):
heading = block.get_text()
subheading = block.get_text()
elif block.name == ul:
for li in block.select(li):
flag = li.select(img.thumbborder)[0][src]
equipment = li.get_text().split(:)[0].strip()
for a in li.select(a):
source = a[href]
text = a.get_text()
dataset.append(dict(
heading = heading,
subheading = subheading,
flag = flag,
equipment = equipment,
source = source,
text = text
))
df = pd.DataFrame(dataset)
df = format_dataframe(df)
return df
def parse_category(text, country):
data = []
name = text.split(()[0]
for section in text.split(,):
parts = section.split(:)
if len(parts) == 2:
state = parts[0].split()[-1]
value = int(re.findall([0-9]*, parts[1].split()[0])[0])
data.append(dict(
country = country,
category = name,
state = state,
value = value
))
return data
def parse_categories(df):
categories = []
for text, country in df.groupby(subheading).country.first().iteritems():
categories.extend(parse_category(text, country))
categories = pd.DataFrame(categories)
return categories
def get_data(html):
df = parse_article(html)
summary = parse_categories(df)
reports = df[[country, category, equipment, equipment_origin, text, source]]
reports[equipment] = reports.equipment.apply(lambda _: .join(_.split()[1:]))
reports[text] = reports.text.apply(lambda _: re.sub([\(\)]*, , _).strip())
return summary, reports
def get_timestamp():
return dt.datetime.utcnow().replace(microsecond=0)
def get_difference(summary, prev_summary):
index_cols = [country, category, state]
df_diff = pd.concat([
prev_summary.set_index(index_cols),
summary.set_index(index_cols)], axis=1).fillna(0)
df_diff.columns = [0,1]
df_diff[value] = df_diff[1] - df_diff[0]
df_diff = df_diff[df_diff.value != 0]
return df_diff.reset_index()[[country, category, state, value]]
def normalize_log(log):
category_changes = [{
to_replace: communications station,
replace_with: communications stations
}]
log[category] = log.category.apply(lambda _: _.lower().strip())
for change in category_changes:
log.category = log.category.replace(change[to_replace], change[replace_with])
log = log.groupby([timestamp, country, category, state]).value.sum().reset_index()
return log
def update_log(summary):
prev_summary = pd.read_csv(./data/summary.csv)
log = pd.read_csv(./data/log.csv, date_parser=[timestamp])
summary_diff = get_difference(summary, prev_summary)
timestamp = get_timestamp()
summary_diff.insert(0, timestamp, timestamp)
log = pd.concat([log, summary_diff])
log = normalize_log(log)
return log
def save_all(summary, reports, log):
summary.sort_values([country, category, state]).to_csv(./data/summary.csv, index=False)
reports.sort_values([country, category, equipment, source]).to_csv(./data/reports.csv, index=False)
log.sort_values([timestamp, country, category, state]).to_csv(./data/log.csv, float_format="%.0f", index=False)
html = download_page()
summary, reports = get_data(html)
log = update_log(summary)
save_all(summary, reports, log)
运行上述代码需要先安装python运行环境,不会安装的朋友可以自行百度,非常简单。安装按成之后,在安装本脚本的依赖包:
beautifulsoup4==4.10.0
bs4==0.0.1
certifi==2021.10.8
charset-normalizer==2.0.12
idna==3.3
numpy==1.21.5
pandas==1.3.5
python-dateutil==2.8.2
pytz==2022.1
requestes==0.0.1
requests==2.27.1
six==1.16.0
soupsieve==2.3.1
urllib3==1.26.9
然后运行本脚本,会在脚本当前目录生成data文件夹,里面包含四个文件:events、log、reports和summary。
1 events文件
events主要爬取的twitter上有关俄乌冲突的一些事件(包含地理坐标),并且进行了归类,主要分为下面几类:
1、Bombing, shelling or explosion2、Civilian Casualty3、Civilian Infrastructure Damage February 20224、Civilian Infrastructure Damage March 20225、Gunfire, fighting, battle6、Military Infrastructure Damage7、Munitions8、Other9、Russian Firing Positions10、Russian Military Losses11、Russian Military Movements February 202212、Russian Military Movements January 202213、Russian Military Movements March 202214、Ukrainian Military Losses 比如,通过第11、12、13类事件可以分析俄军兵力机动情况,通过第9类俄军火力打击位置,可以分析俄军的 *** 情况等等。这些事件都带有地理坐标,有兴趣的朋友,可以做一些可视化的分析和聚类分析。
2 log文件
log文件主要记录了每日俄乌各型装备的损失数量,主要分为击毁的、被俘的和损坏的三类,例如3月35日的战损:
3 report文件
report文件主要记录了俄乌双方每件战损装备的型号以及损坏装备的视频或图片证据,例如乌军损失的雷达装备:
4 summary文件
summary文件主要对双方的战损按照类别进行了汇总,方便查询和分析。
最后,上面四个文件的地址:https://poem.lanzouq.com/ixc4N020z9eb,不想搭建环境的可以直接下载使用。
健康食品
产品推荐
洗护测评
知识科普
牛牛说喷剂
霸王液精华液
七月七胶囊
牛鲨延时喷剂
今枪哥延时喷剂
小牛测评网
赛无双
银豹鹿鞭糖
无限神力虫草鹿血糖