【Python】Webスクレイピングで403 Forbiddenエラーになった時の対象方法

403 Forbiddenエラーは、サーバーがリクエストを拒否していることを示しています。これは通常、アクセス権限の問題や、特定のリクエストがブロックされている場合に発生します。この問題を解決するためにいくつかの方法を試してみましょう。

返却されたHTML(soupの中身)

<html> 
<head>
  <title>403 Forbidden</title>
</head> 
<body> 
  <center><h1>403 Forbidden</h1></center> 
  <hr/><center>Microsoft-Azure-Application-Gateway/v2</center> 
</body> 
</html>

1.User-Agentの設定

User-Agentの設定: ウェブサイトがボットからのアクセスを拒否する場合があります。リクエストに人間のブラウザを模倣するUser-Agentヘッダーを追加してみてください。

import requests
from bs4 import BeautifulSoup

search_key = "your_search_keyword"  # 適切な検索キーワードを代入してください
url = f"https://www.torecolo.jp/shop/goods/search.aspx?ct2=2074&search=x&keyword={search_key}&search=search"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

data = []
cardlist = soup.find_all(class_='block-thumbnail-t--goods')

for card in cardlist:
    description = card.find(class_='block-thumbnail-t--goods-description')
    if description:
        data.append(description.text.strip())

print(data)

筆者はこのヘッダー情報追加（User-Agentの設定）だけで解決しました。

2.セッションの利用

セッションの利用: requests.Sessionを使用して、クッキーを管理し、同じセッションを維持することでアクセスが許可されることがあります。

import requests
from bs4 import BeautifulSoup

search_key = "your_search_keyword"  # 適切な検索キーワードを代入してください
url = f"https://www.torecolo.jp/shop/goods/search.aspx?ct2=2074&search=x&keyword={search_key}&search=search"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

session = requests.Session()
session.headers.update(headers)

response = session.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

data = []
cardlist = soup.find_all(class_='block-thumbnail-t--goods')

for card in cardlist:
    description = card.find(class_='block-thumbnail-t--goods-description')
    if description:
        data.append(description.text.strip())

print(data)

3.セッションの利用

リファラの設定: サイトがリファラのチェックを行っている場合、リファラヘッダーを追加することが助けになる場合があります。

import requests
from bs4 import BeautifulSoup

search_key = "your_search_keyword"  # 適切な検索キーワードを代入してください
url = f"https://www.torecolo.jp/shop/goods/search.aspx?ct2=2074&search=x&keyword={search_key}&search=search"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Referer': 'https://www.torecolo.jp/'
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

data = []
cardlist = soup.find_all(class_='block-thumbnail-t--goods')

for card in cardlist:
    description = card.find(class_='block-thumbnail-t--goods-description')
    if description:
        data.append(description.text.strip())

print(data)