chatbot/scraper_functions.py

import requests
from bs4 import BeautifulSoup
from urllib.parse import quote

def query_external_website(base_url, query):
    try:
        page = requests.get(base_url + quote(query))
        soup = BeautifulSoup(page.content, "html.parser")
        title = soup.find("span", class_="mw-page-title-main").text
        content = next((paragraph for paragraph in soup.find(id="mw-content-text").select("p") if not paragraph.has_attr("class")), None)
        if content == None:
            raise Exception("Can't parse")
        return "\nTITLE:\n" + title + "\n\nCONTENT:\n" + content.text + "\n\nFULL LINK:\n" + base_url + quote(query)
    except:
        return "Can't parse search result :("
Added scraper function 2024-02-06 01:21:53 +00:00			`import requests`
			`from bs4 import BeautifulSoup`
Correct url link from multiword wiki query 2024-02-06 01:58:45 +00:00			`from urllib.parse import quote`
Added scraper function 2024-02-06 01:21:53 +00:00
			`def query_external_website(base_url, query):`
Added try catch for scraper functions 2024-02-06 01:37:19 +00:00			`try:`
Correct url link from multiword wiki query 2024-02-06 01:58:45 +00:00			`page = requests.get(base_url + quote(query))`
Added try catch for scraper functions 2024-02-06 01:37:19 +00:00			`soup = BeautifulSoup(page.content, "html.parser")`
			`title = soup.find("span", class_="mw-page-title-main").text`
Get first available paragraph from query 2024-02-06 02:34:43 +00:00			`content = next((paragraph for paragraph in soup.find(id="mw-content-text").select("p") if not paragraph.has_attr("class")), None)`
Raise exception on None content 2024-02-06 02:37:26 +00:00			`if content == None:`
			`raise Exception("Can't parse")`
Get first available paragraph from query 2024-02-06 02:34:43 +00:00			`return "\nTITLE:\n" + title + "\n\nCONTENT:\n" + content.text + "\n\nFULL LINK:\n" + base_url + quote(query)`
Added try catch for scraper functions 2024-02-06 01:37:19 +00:00			`except:`
			`return "Can't parse search result :("`