本帖最后由 susheng 于 2022-11-3 22:59 编辑
之前用python和golang写过公众号的批量工具 https://www.52pojie.cn/thread-1695991-1-1.html ,今天分享下今日头条的文章批量,主要就是这个接口的抓取
代码如下:
[Asm] 纯文本查看 复制代码
while True: content = get_data() time.sleep(2) if not content['data']: break for i in content['data']: if not i.get('item_id'): continue if self.down and i['article_genre'] == 'article': res = requests.get('https://www.toutiao.com/article/'+i['item_id'],verify=False, headers=headers) comments_html = re.search(r'<div class="article-content">(.*)</article></div>', res.text).group(1) try: with open(trim(i['title'])+'.html', 'w', encoding='utf-8') as f: f.write('<div class="article-content">'+comments_html+'</article></div>') except Exception as err: with open(str(randint(1,10))+'.html', 'w', encoding='utf-8') as f: f.write('<div class="article-content">'+comments_html+'</article></div>') image_url = '' if i.get('image_list'): image_url=i['image_list'][0]['url'] with open(f'{self.filename}.csv', 'a+', encoding='utf-8-sig') as f2: f2.write(trim(i['behot_time'])+','+trim(i['title'])+','+ 'https://www.toutiao.com/article/'+i['item_id']+ ','+trim(i['abstract'])+ ','+trim(i['source'])+','+image_url+','+ i['go_detail_count']+ ','+i['comments_count']+'n')
效果:
还要文章数据的excel,包括头条文章日期,文章标题,文章链接,文章简介,阅读数和评论数等。