Wednesday, March 13, 2019

Automate Web Page To PDF

Automating Web Page To PDF

automating-web-page-pdf-python

I thought it might be nice to offer all my readers, (yes both of you!), the option to download a .PDF file of each of my articles, on the off chance that someone may find this a useful service.

It could also serve as an extra backup of my posts I guess.

 

No Sign Up, No Login, Cool!

I found two good free sites that would do this without making me sign up, or drowning me with adverts (I use uBlock Origin ad blocker btw), they are:

http://www.html2pdf.it/ and http://www.web2pdfconvert.com/

So I started copying and pasting the URLS of each of my 52 Python posts, one by one, and after doing 26 of these I soon got bored and frustrated.

As I am now starting to think a bit like a programmer, I remembered a book I once read called “Automate the Boring PDF stuff“, or something similar 🙂 and thought maybe I could do something with Python to help ease the process.

So I ran a few tests.

No Login = No Hassle Man

I copied the URL of the next webpage that I wanted to convert to the clipboard, and went to the first converter site.

These two actions can be scripted using the pyperclip module, pyperclip.copy(url), and the webbrowser module using webbrowser.open(url)  with no problem.

python-Automate Web Page To PDF-pdf-converter-website

Once the page at http://www.html2pdf.it is loaded, you will see an entry box for the URL to convert. There is no login which saves us a load of hassle.

A Ctrl-V, (paste), from the keyboard, pasted my URL directly where it should, into the entry box, this was without having to click anywhere luckily. This can be easily achieved with Pyperclip or even easier, PyAutoGui using, pyautogui.hotkey(‘ctrl’, ‘v’).

python-Automate Web Page To PDF-pasted-url

All I needed to do now was to click the “Get PDF now!” button on the page.

That can get a little tricky, but it is doable in PyAutoGui, but I tried the enter key, just in case, and it worked. Awesome, this is easy to do in a script using the command pyautogui.press(‘enter’)

 

I Freaking Love This Module

I freaking love the PyAutoGui module, it’s written for humans, not genius scientists or engineers, even a doofus like me can use it, fairly confidently.

To recap. So far we have worked out how to load the webpage, paste a URL into entry box and then Press enter, all auto magically with a few lines of Python.

After that is done the website will then go off and do the conversion. It will then open the resultant .PDF in a new tab in my browser.

Well, this is what happens in my Firefox browser. It may be different in your browser.

python-Automate Web Page To PDF-pdf-open2

To save the .PDF is a simple case of pressing Ctrl-s on the keyboard. Like this:

pyautogui.hotkey(‘ctrl’, ‘s’).

And we have a working program.

However, it only does one file at a time, and we would need to feed it a new URL each time, hardly saving any time or hassle really.

To be useful, and really automate the task, it would need to take either a list of URLs, or just do a whole site in one click.

Blocked Irony

Here’s the irony, by the time I got to this stage I had converted all 52 of my webpages, 🙂 and I kind of lost enthusiasm for continuing writing the more difficult part of the code.

What is worse, I now found I was, and still am, blocked by the website, or maybe the server is down? I don’t know yet.

But, this test code could still be useful to fellow beginners on how to go about automating a website service, maybe?

The Other Site

By the way, the second site I mentioned, http://www.web2pdfconvert.com/, will almost work in the same way, the paste URL and enter to convert, works fine, but instead of immediately displaying the .PDF you have to click on a “Download” button.

There does not appear to be a keyboard option here, so in PyAutoGui we would have to use the search for a button feature and supply a graphic of the button for PyAutoGui to compare , find, and click on, I think.

Maybe there is an easier way using mouse coordinates, but I’m not sure that it would work on different set ups, screen sizes etc.

python-Automate Web Page To PDF-convert-site2

So, here is the code so far. Feel free to make it better and call it your own, I don’t mind, just give this page a shout in return.

 

Double click inside code box to select all

 

'''
Web page to PDF V0.1 (just a test to see if it would work)
By Steve Shambles March 2019

Works in latest Firefox browser on Windows 7.
It may or may not work on other browsers and OS's?

You may need to:
"pip install pyautogui"
"pip install pyperclip"

More outrageously incompetent code at:
https://stevepython.wordpress.com/
'''

# Features I might try to add:
# Right click and paste for URL dialog
# Rename saved PDF's as default is crazy long filenames.
# Option to grab all pages on a site and convert each.
# A simple GUI
# A back up site in case first blocks or not working.
# User agent?

import time
import webbrowser
from tkinter import Tk, simpledialog
import pyautogui
import pyperclip

ROOT = Tk()

# Stop naff GUI window from showing, for now.
ROOT.withdraw()

# Website that does the conversion.
pdf_url = "http://www.html2pdf.it/"

# Ask user for the URL of page to convert via GUI input.
# You can use ctr-v to paste a url into box.
users_url = simpledialog.askstring(title='Convert To PDF',
                                   prompt='URL to convert:')

# Copy users URL to the system clipboard.
pyperclip.copy(users_url)

# Open PDF converter page in default web browser.
webbrowser.open(pdf_url)
time.sleep(3)

# Paste users URL into PDF converter sites entry box.
pyautogui.hotkey('ctrl', 'v')
time.sleep(2)
# Send enter key to start conversion.
pyautogui.press('enter') .

# Not sure how long to wait while PDF converts, experiment.
time.sleep(8)

# Now pdf should appear in browser.
# To save it we need to do a ctrl-s.
pyautogui.hotkey('ctrl', 's')

# File save dialog should come up now.
# We just need to send the enter key to save it
# to the browser's default save location.

# Pause here a little, so user can take control
# and edit the save name of the pdf if wants to.
time.sleep(3)
pyautogui.press('enter')

print("PDF saved")

Using Python V3.6.5 32bit, on Windows 7 64bit

Home Page

Previous post: Python Screen Spy Project

My other free blog

Advertisements


from Python Coder
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...