note/work/AI/抓取流程.md
2025-11-19 10:16:05 +08:00

15 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

1. 下载jsonhttps://s-file-2.ykt.cbern.com.cn/zxx/ndrs/national_lesson/teachingmaterials/85c19ef2-23a4-43b6-94dd-8c7898639b5d/resources/part_100.json
2. 解析json获取章节id在part_100.json的{}节点中找到version_id字段, 该字段值即为章节idbe762242-fbf3-41fe-abfc-9d6c909d1bff。注意可能有多个不同的version_id只取第二层的version_id也就是
[{version_id:"d4ab2c8d-6714-469c-bc51-de0dce8b63a9"}]json文件中有多个version_id分别对应不同章节比如
[{xxxxx,version_id:"be762242-fbf3-41fe-abfc-9d6c909d1bff",xxxxx},{xxxxx,version_id:"d4ab2c8d-6714-469c-bc51-de0dce8b63a9",xxxxx},{xxxxx,version_id:"7765ebda-acd2-06d4-df7e-24c6e4272b6d",xxxxx}]
把所有的version_id都保存下来。
3. 拼接下载json
https://s-file-2.ykt.cbern.com.cn/zxx/ndrv2/national_lesson/resources/details/{version_id}.json
4. 解析下载json
根据{version_id}.json获取教学设计、学习任务单、课后练习、课件、题目及答案等资源的下载地址。
需要下载的资源有pdf格式和m3u8格式的视频。这些资源的下载地址都在{version_id}.json中以https开头以.pdf或.m3u8结尾。
5. 下载资源:
需要token才能下载token在请求头中格式如下
X-Nd-Auth:MAC id="7F938B205F876FC398BCDC5BCE419D078A9A9DC46BC1C5EB5D458752DA28A954776C4459233C9F6209FA0EC2EC21AE85202FAE132D402538",nonce="1758355290351:STU4ZCMA",mac="cmPIHUYMwn6OiCanuD/OLV75xyyhxyGZzzEwFwMaKbc="