Did you notice our Python tips lately? They looks more sexy, don't they? That's thanks to Carbon which lets you create beautiful images of your source code. As much as I love its interface though, what if we can automate this process generating the image for us?
That's what we did and posting new tips to Twitter is now a breeze. In this article I will show you how using a bit of BeautifulSoup
and selenium
. Enjoy!
Getting Ready
If you want to follow along, the final code is here.
First make a virtual environment and pip install
the requirements. Also make sure you have the chromedriver in your PATH
.
[bobbelderbos@imac code]$ mkdir carbon && cd $_
[bobbelderbos@imac carbon]$ alias pvenv
alias pvenv='/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m venv venv && source venv/bin/activate'
[bobbelderbos@imac carbon]$ pvenv
(venv) [bobbelderbos@imac carbon]$ python -V
Python 3.7.0
(venv) [bobbelderbos@imac carbon]$ pip install -r requirements.txt
...
Successfully installed beautifulsoup4-4.7.1 bs4-0.0.1 certifi-2018.11.29 chardet-3.0.4 idna-2.8 requests-2.21.0 selenium-3.141.0 soupsieve-1.8 urllib3-1.24.1
(venv) [bobbelderbos@imac carbon]$ ls ~/bin/chromedriver
/Users/bobbelderbos/bin/chromedriver
Introducing our Platform Tips
At the time of this writing we have 92 tips on our platform:
Let's inspect the tip html we need to parse with BeautifulSoup:
Each tip is wrapped in a tr
where the tip text is in a blockquote
and the code in pre
tags.
We also want to know if a tip has already been shared out by inspecting the Twitter link. In this example it has pybites/status which means we did.
By the way, I didn't realize it at the time of coding this, but we did make an API GET endpoint some time ago. If there is an API it's preferred to use that to retrieve data. However knowing BeautifulSoup
might come in handy too :)
I will do a follow-up post how to convert this into a Tips API ...
Bootstrapping the script
Before scraping the tips page, let's define the overall structure of the script:
from collections import namedtuple
from random import choice
import sys
from time import sleep
import urllib.parse
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
TIPS_PAGE = 'https://codechalleng.es/tips'
PYBITES_HAS_TWEETED = 'pybites/status'
CARBON = 'https://carbon.now.sh/?l=python&code={code}'
TWEET_BTN_CLASS = 'jsx-2739697134'
TWEET = '''{tip} {src}
🐍 Check out more @pybites tips at https://codechalleng.es/tips 💡
(image built with @carbon_app)
{img}
'''
Tip = namedtuple('Tip', 'tip code src')
def retrieve_tips():
"""Grab and parse all tips from https://codechalleng.es/tips
returning a dict of keys: tip IDs and values: Tip namedtuples
"""
pass
def get_carbon_image(tip):
"""Visit carbon.now.sh with the code, click the Tweet button
and grab and return the Twitter picture url
"""
pass
if __name__ == '__main__':
tips = retrieve_tips()
if len(sys.argv) == 2:
tip_id = int(sys.argv[1])
else:
tip_id = choice(list(tips.keys()))
tip = tips.get(tip_id)
if tip is None:
print(f'Could not retrieve tip ID {tip_id}')
sys.exit(1)
src = tip.src and f' - see {tip.src}' or ''
img = get_carbon_image(tip)
tweet = TWEET.format(tip=tip.tip, src=src, img=img)
print(tweet)
OK step by step:
-
We are going to use some nice stdlib modules like
collections
andrandom
. Secondly we import the external modules we just installed. -
We define some constants.
TWEET
is the template of the tweet we want to build. I will explain why we needTWEET_BTN_CLASS
when we get to the carbon section ... -
I define a
namedtuple
(basically aclass
without behavior) to hold a tip. -
We are going to do the work in two functions:
retrieve_tips
andget_carbon_image
(I probably should add some type hinting on the next iteration ...) -
Under
if __name__ == '__main__'
(aka "I, the script, am called, not imported") I define two ways to call the script:-
with exactly one argument (
len(sys.argv) == 2
,sys.argv[0]
is the script name), retrieving the tip by numeric ID (see the first column of the tips table). -
with any other number of arguments, taking a random tip from the retrieved
dict
usingrandom.choice
.
-
-
It retrieves the tip from the
tips dict
and creates the tweet using theTWEET
template and prints it to stdout. For now I am happy to manually post it (see demo at the end).
Parsing the tips
At this point the code at best will throw an AttributeError
, because our tips dict
is empty. So let's write retrieve_tips
to populate it:
def retrieve_tips():
"""Grab and parse all tips from https://codechalleng.es/tips
returning a dict of keys: tip IDs and values: Tip namedtuples
"""
Firt we need to retrieve the page with requests
:
html = requests.get(TIPS_PAGE)
We then instantiate a BeautifulSoup
object passing it in the response text and parser:
soup = BeautifulSoup(html.text, 'html.parser')
As we saw all the tips are in a table, each one in a table row or tr
, so let's get all of them:
trs = soup.findAll("tr")
Next let's use a data structure to store the tips. At first I used a list
but later I wanted to index by tip ID, so a dict
turned out to be more appropriate:
tips = {}
Next let's loop through the rows, creating Tip namedtuple
s and adding them to our tips dict
:
for tr in trs:
tds = tr.find_all("td")
id_ = int(tds[0].text.strip().rstrip('.'))
tip_html = tds[1]
links = tip_html.findAll("a", class_="left")
share_link = links[0].attrs.get('href')
pre = tip_html.find("pre")
code = pre and pre.text or ''
# skip if tweeted or not code in tip
if PYBITES_HAS_TWEETED in share_link or not code:
continue
tip = tip_html.find("blockquote").text
src = len(links) > 1 and links[1].attrs.get('href') or ''
tips[id_] = Tip(tip, code, src)
Step by step:
-
First I get all table cells or
td
elements in the table row. -
I parse the tip ID from the first cell, stripping off the dot and storing it in a variable called
id_
. -
The tip html is in the second table cell.
-
I check if there are any links in the tips html. As you scroll down on the tips page there is always a share link (), and an optional second resource link ().
-
I put the tip code in a variable called
code
. -
Then I check if the tip was already shared on Twitter by checking if
PYBITES_HAS_TWEETED
(pybites/status) is in the share link, or if the tip does not contain any code (some are just quotes). In these instances we want to exclude the tip. -
Next we put the tip text in a variable called
tip
. -
We add the resource link in a variable called
src
. As it's optional we first check the length of thelinks
list. -
Finally we make the
Tip namedtuple
and assign it as value to theid_
key in thetips dict
.
Lastly we return the tips dict
:
return tips
Let's see if this works using pprint
:
Running this it outputs:
Great. Let's make a carbon image next.
Beautiful images of your source code
Meet carbon:
It allows you to add code, choose a language and configure other settings, then generate the image and/or tweet it out. It's really nice!
While playing with the interface I found that clicking the Tweet button it would generate a shareable picture hosted on Twitter. For example clicking the Tweet button I get this popup:
And that Twitter link shows the generated code snippet image:
We can automate this using Selenium to click the Tweet button, capturing the generated image link. This is why I defined the TWEET_BTN_CLASS
constant which is the class set on this button.
Use Selenium to create tip code image
Let's write the second function get_carbon_image
:
def get_carbon_image(tip):
"""Visit carbon.now.sh with the code, click the Tweet button
and grab and return the Twitter picture url
"""
First we need to encode (replace special characters) the tip code snippet. quote_plus
(from urllib.parse
) also replaces spaces by plus signs, as required for quoting HTML form values when building up a query string to go into a URL (see docs).
code = urllib.parse.quote_plus(tip.code)
With that done we define the full url:
url = CARBON.format(code=code)
We then start the Chromedriver. Unlike last time I am not going to use headless mode here, because I'd actually like to see what Selenium is doing:
driver = webdriver.Chrome()
driver.get(url)
Here we locate mentioned TWEET_BTN_CLASS
(jsx-2739697134) button and click on it:
driver.find_element_by_class_name(TWEET_BTN_CLASS).click()
Trial and error taught me that this might take a bit so I use sleep
:
sleep(5)
Retrieve the image from popup
And here is the tricky part. The Tweet button opened a popup but the driver is still on the main browser page window (see seconds 15 and 41 of the demo below).
You can toggle windows though using driver.switch_to.window
:
window_handles = driver.window_handles
driver.switch_to.window(window_handles[1])
Now I am on the Twitter popup window and I can target the status ID field and grab the image URL from it:
status = driver.find_element_by_id('status')
img = status.text.split(' ')[-1]
Finally I quit the driver
(this closes the browser) and return the image string:
driver.quit()
return img
See it in action
Here you can see this automation script in action, generating an image from a random tip as well as when specifying a specific ID:
And voilà: two new tips I could post to our Twitter (here and here).
Note that after manually tweeting it out as @pybites, we set the obtained tweet URL on the tip (in the DB) so it's not selected upon next run (the if PYBITES_HAS_TWEETED in share_link
check above).
Room for improvement
Here are some things we can do to take it to the next level:
- Auto-post the tweet to Twitter (we already have the code for this).
- Make a Tips API (pending article):
- allow
GET
to retrieve a tip andPOST
to receive new ones, - do a
PUT
request on the tip with the tweet link after running this script and/or auto-posting to Twitter (1.)
- allow
Feel free to PR any of this here.
I hope you enjoyed this and it inspired you to build your own automation scripts. To learn how to run Selenium in headless mode on Heroku, check out our article from last week.
Feel free to share more Python tips on our platform.
Question: what would you like us to write about more? You can drop us an email or brainstorm with us and our amazing community on our Slack. We do accept guest posts!
Keep Calm and Code in Python!
-- Bob
from Planet Python
via read more
No comments:
Post a Comment