What Will I Learn?
I intend to cover the following concepts in this part of the tutotial
- You will learn how to navigate in through pages in a website.
- You will learn different functions and methods for interacting with forms.
- You will learn to build a basic form filling bot.
Requirements
The user is expected to have the following requirements for clear understanding of the tutorial
- Basic knowledge on Python programming language.
- Python 3+ installed PC (For practical understanding)
- Read my previous tutorial on MechanicalSoup to ensure continuation.
Difficulty
- Basic
Tutorial Contents
So Let's continue our journey to learn MechanicalSoup. As stated above we will learn how to automate interactions with a webpage with MechanicalSoup, like we use a browser to interact with them. This might come in handy if you are into creating bots and web scrapers.
1. Navigation
First of all let's look at how to navigate through pages in a website. We know that every page in a website has a unique URL associated with it. While using a browser we click on links available on the pages to navigate through pages in the website.
open() method
We have used and familiarized the open() method in the previous part of this tutorial. We saw in the how to open a webpage in the browser instance in MechanicalSoup. It involved passing the whole absolute URL of the webpage we are targeting to the function browser.open()
ie. For example to open Steemit.com
browser.open('https://steemit.com')
You are free to use the open() method every time you need to go to another page in the site or to a different website. Its gonna be a mess if we have to specify the absolute URL every time for accessing a specific page (Unless you want to move to a different website). Luckily there is a shortcut method for this.
follow_link() method
follow_link() method can be used to move to different pages by just specifying only the relative path to the page. ie. We can now avoid the https://website.domain
part and just specify a page.
For example, if we want to move to the pages that contain the latest posts in Steemit, then we just have to do this after browser.open('https://steemit.com')
browser.follow_link('created') # Since the new posts are listed in https://steemit.com/created
Now our browser instance is pointed towards the link https://steemit.com/created
and contains the contents of that page.
NOTE:
follow_link()
should only be used in the case if you want to move to a different page in the same website. ie. As long as the domain part stays the same, it will work. And in case if you need to move to a different website, then you should use theopen()
method instead offollow_link()
.
So I hope its clear about surfing through pages in a website.
2. Interacting with Forms
Let us see the different methods we need for this:
select_form() method
It is a function to select a particular form on a page. It pretty much works just like a CSS selector which is really helpful in selecting the form we need, when a page contains more than a single form.
Everyone who worked with HTML and CSS is familiar with CSS selectors, which are used to give styling properties to a single or a group of elements. Here is a great guide on CSS selectors from w3schools
It is a function associated with the browser instance, it is called as
form = browser.select_form('optional_css_selector')
The above code will return a mechanicalsoup.form.Form object, which has all the input fields in the form, which can be accessed as a Python dictionary and also some cool functions to help us with the form filling.
In case if the page doesn't have multiple forms, then calling just
select_form()
without any arguments will do the trick.
get_current_form() method
This method is a member function of the browser instance which will return the currently selected form object.
form = browser.get_current_form()
print_summary() method
It is a method associated with the Form object, which is returned by the select_form() method, On calling this method, it returns the list of all the input elements present inside the form object.
You can print the list of inputs either like this:
browser.select_form('optional_css_selector').print_summary()
Or like this, by using get_current_form() function:
browser.get_current_form().print_summary()
Assigning values to input fields.
It's very simple to assign values to the form fields in MechanicalSoup. First of all you have to select the particular form using select_form() then you can assign values to the form fields like this:
We utilize the name of the input field to assign values to the form inputs, If you have some experience in the Web development then you will know that the POST request consists of a JSON structure like the name
fields acts as the keys and value
act as the corresponding value.
The same mechanism is applied here. You can just assign the values to the corresponding input fields by just using the browser object:
For example if we have an input named "Name"
, then to assign a value to the field you just have to:
browser['Name'] = 'Ajmal Noushad'
Simple isn't it?
launch_browser() method
This will launch a real browser with the current page that is in the browser instance. But you can see that the browser doesn't go to the original URL, but instead goes to local URL to a file stored inside your PC, because it also contains the form that you just filled along with it. So using launch_browser()
function you can just confirm that you just did everything right.
So that's all the methods we need, so just get on the play ground.
Creating a Basic form filling bot
We will now see how we can fill a form in a webpage using the above functions. For that purpose I have made a dummy webpage with a form that consists of some input fields. I used Django to build this. You can find the code here : Github Repo
The form looks like this :
I hosted it into PythonAnywhere for practice, you can access it here : DummyForm
Lets proceed,
First of all, open a python console.
If you have read the previous tutorial we have set up an environment with mechnaicalsoup installed in it. So you just have to activate the virtualenv and type python in the terminal.
$ source env-name/bin/activate
$ python
Now in the python console, import mechanicalsoup
import mechanicalsoup
Create a new browser instance
browser = mechanicalsoup.StatefulBrowser()
Open the URL of the webpage that contains the form, in our case 'ajmal.pythonanaywhere.com'
browser.open('http://ajmal.pythonanaywhere.com')
Select the form in the webpage using select_form()
browser.select_form()
Remember: No CSS selectors are given since the age contains a single form.
To list the input fields we use print_summary()
browser.get_current_form().print_summary() # get_current_form() returns the form object pointing to the currently selected form
The above command will give you an output like:
<input name="csrfmiddlewaretoken" type="hidden" value="lIMnuL2olx1GEnGyTms3rDLMEB8lZKqCRWd9qo111631GkSEBNhEjv4IOAHDniym"/>
<input class="form-control" id="id_name" maxlength="20" name="name" required="" type="text"/>
<input class="form-control" id="id_age" name="age" required="" type="number"/>
<select class="form-control" id="id_gender" name="gender">
<option value="1">MALE</option>
<option value="2">FEMALE</option>
</select>
<textarea class="form-control" cols="40" id="id_about_me" name="about_me" required="" rows="10"></textarea>
Now fill the input fields with data using their name attributes.
browser['name'] = 'My Name'
browser['age'] = 21
browser['gender'] = '1' # See that we gave the value attribute of the select options for the gender input field. ie. '1' for 'MALE' and '2' for 'FEMALE'
browser['about_me'] = 'I am learning MechanicalSoup'
Note that
radio
inputs andselect
inputs should provide the correspongding value attribute of the item that needs to be selected.
Inputs for checkboxes can be given as an array of values likebrowser['checkbox'] = ['val1', 'val2']
Now lets take a look how it looks on a browser now,
browser.launch_browser()
The above command will open a local webpage with same contents as of the original webpage with the form filled with the values given by us.
Finally submit the form
browser.submit_selected()
Now we will get response 200 indicating that everything went well and Form is successfully submitted.
<Response [200]>
Finally lets checkout the content of the response,
browser.get_current_page()
The above will return the page after the form submit is occured. If you did it through a browser you can see that you get a 'Success' as the http-response message
In console we see that as
<html><body><p>Success</p></body></html>
So that's it, you have now learned how to work with MechanicalSoup to interact with webpages. Feel free to ask any doubts. Thanks for reading...
Curriculum
My previous tutorial on MechanicalSoup
Posted on Utopian.io - Rewarding Open Source Contributors
Congratulations @ajmaln! You have completed some achievement on Steemit and have been rewarded with new badge(s) :
Award for the number of upvotes
Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here
If you no longer want to receive notifications, reply to this comment with the word
STOP
Your Post Has Been Featured on @Resteemable!
Feature any Steemit post using resteemit.com!
How It Works:
1. Take Any Steemit URL
2. Erase
https://
3. Type
re
Get Featured Instantly – Featured Posts are voted every 2.4hrs
Join the Curation Team Here | Vote Resteemable for Witness
Thank you for the contribution. It has been approved.
Very interesting tutorial! I've only ever used BeautifulSoup and didn't even know MechanicalSoup was a thing, it looks really cool though. If I ever find a use for it I will definitely refer to this tutorial!
You can contact us on Discord.
[utopian-moderator]
Thanks for the quick moderation.
Glad you found it helpful.
Hey @ajmaln I am @utopian-io. I have just upvoted you!
Achievements
Suggestions
Get Noticed!
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
Very good tutorial. I did not know that python can be used for this.
Python can literally be used for anything, lol. Python is Life...
Articles like this are a great contribution to the knowledge pool, @ajmaln! Congratulations on it being approved by Utopian!
@mitneb Curation Trail Project Daily Report for 02 FEB 2018.I've upvoted and resteemed this article as one of my daily post promotions for the @mitneb Curation Trail Project. It will be featured in the
Cheers!
Thank you @mitneb
Cheers!You're very welcome, @ajmaln!
Thanks for introducing to us Mechanical Soup. I also only knew about Beautiful Soup. I'm planning to scrape some financial data in the future and Mechanical Soup will be very useful.
Also, I hope you will continue showing us examples using Mechanical Soup and Steemit as you did in the first post. I know there is an API, but sometimes API changes, sometimes just don't work (I can't get it to work in combination of venv and yupiter notebook for example) so it is always useful, to have a backup plan, a second tool when first one stops working ;-)
This post has received a 0.04 % upvote from @drotto thanks to: @banjo.
Peace, Abundance, and Liberty Network (PALnet) Discord Channel. It's a completely public and open space to all members of the Steemit community who voluntarily choose to be there.Congratulations! This post has been upvoted from the communal account, @minnowsupport, by ajmaln from the Minnow Support Project. It's a witness project run by aggroed, ausbitbank, teamsteem, theprophet0, someguy123, neoxian, followbtcnews, and netuoso. The goal is to help Steemit grow by supporting Minnows. Please find us at the
If you would like to delegate to the Minnow Support Project you can do so by clicking on the following links: 50SP, 100SP, 250SP, 500SP, 1000SP, 5000SP.
Be sure to leave at least 50SP undelegated on your account.
Congratulations @ajmaln! You have completed some achievement on Steemit and have been rewarded with new badge(s) :
Award for the number of upvotes
Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here
If you no longer want to receive notifications, reply to this comment with the word
STOP