Automating Web Page To PDF
I thought it might be nice to offer all my readers, (yes both of you!), the option to download a .PDF file of each of my articles, on the off chance that someone may find this a useful service.
It could also serve as an extra backup of my posts I guess.
No Sign Up, No Login, Cool!
I found two good free sites that would do this without making me sign up, or drowning me with adverts (I use uBlock Origin ad blocker btw), they are:
http://www.html2pdf.it/ and http://www.web2pdfconvert.com/
So I started copying and pasting the URLS of each of my 52 Python posts, one by one, and after doing 26 of these I soon got bored and frustrated.
As I am now starting to think a bit like a programmer, I remembered a book I once read called “Automate the Boring PDF stuff“, or something similar and thought maybe I could do something with Python to help ease the process.
So I ran a few tests.
No Login = No Hassle Man
I copied the URL of the next webpage that I wanted to convert to the clipboard, and went to the first converter site.
These two actions can be scripted using the pyperclip module, pyperclip.copy(url), and the webbrowser module using webbrowser.open(url) with no problem.
Once the page at http://www.html2pdf.it is loaded, you will see an entry box for the URL to convert. There is no login which saves us a load of hassle.
A Ctrl-V, (paste), from the keyboard, pasted my URL directly where it should, into the entry box, this was without having to click anywhere luckily. This can be easily achieved with Pyperclip or even easier, PyAutoGui using, pyautogui.hotkey(‘ctrl’, ‘v’).
All I needed to do now was to click the “Get PDF now!” button on the page.
That can get a little tricky, but it is doable in PyAutoGui, but I tried the enter key, just in case, and it worked. Awesome, this is easy to do in a script using the command pyautogui.press(‘enter’)
I Freaking Love This Module
I freaking love the PyAutoGui module, it’s written for humans, not genius scientists or engineers, even a doofus like me can use it, fairly confidently.
To recap. So far we have worked out how to load the webpage, paste a URL into entry box and then Press enter, all auto magically with a few lines of Python.
After that is done the website will then go off and do the conversion. It will then open the resultant .PDF in a new tab in my browser.
Well, this is what happens in my Firefox browser. It may be different in your browser.
To save the .PDF is a simple case of pressing Ctrl-s on the keyboard. Like this:
pyautogui.hotkey(‘ctrl’, ‘s’).
And we have a working program.
However, it only does one file at a time, and we would need to feed it a new URL each time, hardly saving any time or hassle really.
To be useful, and really automate the task, it would need to take either a list of URLs, or just do a whole site in one click.
Blocked Irony
Here’s the irony, by the time I got to this stage I had converted all 52 of my webpages, and I kind of lost enthusiasm for continuing writing the more difficult part of the code.
What is worse, I now found I was, and still am, blocked by the website, or maybe the server is down? I don’t know yet.
But, this test code could still be useful to fellow beginners on how to go about automating a website service, maybe?
The Other Site
By the way, the second site I mentioned, http://www.web2pdfconvert.com/, will almost work in the same way, the paste URL and enter to convert, works fine, but instead of immediately displaying the .PDF you have to click on a “Download” button.
There does not appear to be a keyboard option here, so in PyAutoGui we would have to use the search for a button feature and supply a graphic of the button for PyAutoGui to compare , find, and click on, I think.
Maybe there is an easier way using mouse coordinates, but I’m not sure that it would work on different set ups, screen sizes etc.
So, here is the code so far. Feel free to make it better and call it your own, I don’t mind, just give this page a shout in return.
Double click inside code box to select all
''' Web page to PDF V0.1 (just a test to see if it would work) By Steve Shambles March 2019 Works in latest Firefox browser on Windows 7. It may or may not work on other browsers and OS's? You may need to: "pip install pyautogui" "pip install pyperclip" More outrageously incompetent code at: https://stevepython.wordpress.com/ ''' # Features I might try to add: # Right click and paste for URL dialog # Rename saved PDF's as default is crazy long filenames. # Option to grab all pages on a site and convert each. # A simple GUI # A back up site in case first blocks or not working. # User agent? import time import webbrowser from tkinter import Tk, simpledialog import pyautogui import pyperclip ROOT = Tk() # Stop naff GUI window from showing, for now. ROOT.withdraw() # Website that does the conversion. pdf_url = "http://www.html2pdf.it/" # Ask user for the URL of page to convert via GUI input. # You can use ctr-v to paste a url into box. users_url = simpledialog.askstring(title='Convert To PDF', prompt='URL to convert:') # Copy users URL to the system clipboard. pyperclip.copy(users_url) # Open PDF converter page in default web browser. webbrowser.open(pdf_url) time.sleep(3) # Paste users URL into PDF converter sites entry box. pyautogui.hotkey('ctrl', 'v') time.sleep(2) # Send enter key to start conversion. pyautogui.press('enter') . # Not sure how long to wait while PDF converts, experiment. time.sleep(8) # Now pdf should appear in browser. # To save it we need to do a ctrl-s. pyautogui.hotkey('ctrl', 's') # File save dialog should come up now. # We just need to send the enter key to save it # to the browser's default save location. # Pause here a little, so user can take control # and edit the save name of the pdf if wants to. time.sleep(3) pyautogui.press('enter') print("PDF saved")
Using Python V3.6.5 32bit, on Windows 7 64bit
Previous post: Python Screen Spy Project
from Python Coder
via read more
No comments:
Post a Comment