web scraping - How to Find Link Associated with Keyword using Python, Requests, and Beautiful soup
I am very new python requests and beautiful soup so my code is probably really bad.
What I have now:
f = open('sites.txt','r') sitelist =  for line in f: sitelist.append(line.strip()) getsites = [''] print(sitelist) for i in range(len(sitelist)): getsites.append(sitelist[i]) for i in range(len(sitelist)): temp = requests.get(sitelist[i]) data = temp.text soup = BeautifulSoup(data, "html.parser") for url in soup.find_all("Yeezy"): print(element.find_previous_sibling('loc')) print(url.text)
Example of XML File I am parsing:
<url> <loc> https://www.a-ma-maniere.com/products/beanie-502805f16-black-white </loc> <lastmod>2016-12-24T22:25:05Z</lastmod> <changefreq>daily</changefreq> <image:image> <image:loc> https://cdn.shopify.com/s/files/1/0626/9065/products/502805F16-1.jpg?v=1472499019 </image:loc> <image:title>Alexander Wang: Beanie (Black/White)</image:title> </image:image> </url>
What I want to do is grab a keyword via the then print the link associated with it stored in .
For find all you need to give it a tag to look for. If you only want tags of that type that contain the word "Yeezy" then in your for loop check to see if the text of the tag is the string you are looking for. If it is the string you are looking for then you have the element want and can print the url.
For most urls this is simply
for url in soup.find_all('a') if "Yeezy" in url.get_text(): print(url['href'])
For yours more like
for url in soup.find_all('url') if url.find('image:title') and url.loc: if "Yeezy" in url.find('image:title').get_text() print(url.find('image:loc').get_text())
For additional information visit get_text() (https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text)
Because you are trying to get an image at this point you might want to look at this answer (https://stackoverflow.com/questions/32853980/temporarily-retrieve-an-image-using-the-requests-library) as well. You'll need a library that can read and store images rather than trying to access it as a builtin python object.
Didn't find the answer?
Our community is visited by hundreds of Shopify development professionals every day. Ask your question and get a quick answer for free.
Find the answer in similar questions on our website.
Write quick answer
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.