Python Web Scraping: Test if a given page is found or not on the server
Python Web Scraping: Exercise-1 with Solution
Write a Python program to test if a given page is found or not on the server.
Sample Solution:
Python Code:
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
try:
html = urlopen("https://abcxyz.com")
except HTTPError as e:
print("HTTP error")
except URLError as e:
print("Server not found!")
else:
print(html.read())
try:
html = urlopen("https://www.example.com/")
except HTTPError as e:
print("HTTP error")
except URLError as e:
print("Server not found!")
else:
print("HTML Details")
print(html.read())
Sample Output:
Server not found! HTML Details b'<!doctype html> \n<html> \n<head> \n <title>Example Domain</title> \n\n <meta charset="utf-8" /> \n <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> \n <meta name="viewport" content="width=device-width, initial-scale=1" /> \n <style type="text/css"> \n body { \n background-color: #f0f0f2; \n margin: 0; \n padding: 0; \n font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; \n \n } \n div {\n width: 600px; \n margin: 5em auto; \n padding: 50px; \n background-color: #fff; \n border-radius: 1em;\n } \n a:link, a:visited { \n color: #38488f; \n text-decoration: none; \n } \n @media (max-width: 700px) { \n body { \n background-color: #fff; \n } \n div {\n width: auto; \n margin: 0 auto;\n border-radius: 0;\n padding: 1em; \n } \n } \n </style> \n</head> \n\n<body> \n<div> \n <h1>Example Domain</h1> \n <p>This domain is established to be used for illustrative examples in documents. You may use this \n domain in examples without prior coordination or asking for permission.</p> \n <p><a href="https://www.iana.org/domains/example"> More information...</a> </p> \n</div> \n</body> \n</html>\n'
Flowchart:
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Python Web Scraping Exercises Home.
Next: Write a Python program to download and display the content of robot.txt for en.wikipedia.org.
What is the difficulty level of this exercise?
Python: Tips of the Day
Find current directory and file's directory:
To get the full path to the directory a Python file is contained in, write this in that file:
import os dir_path = os.path.dirname(os.path.realpath(__file__))
(Note that the incantation above won't work if you've already used os.chdir() to change your current working directory, since the value of the __file__ constant is relative to the current working directory and is not changed by an os.chdir() call.)
To get the current working directory use
import os cwd = os.getcwd()
Documentation references for the modules, constants and functions used above:
- The os and os.path modules.
- The __file__ constant
- os.path.realpath(path) (returns "the canonical path of the specified filename, eliminating any symbolic links encountered in the path")
- os.path.dirname(path) (returns "the directory name of pathname path")
- os.getcwd() (returns "a string representing the current working directory")
- os.chdir(path) ("change the current working directory to path")
Ref: https://bit.ly/3fy0R6m
- New Content published on w3resource:
- HTML-CSS Practical: Exercises, Practice, Solution
- Java Regular Expression: Exercises, Practice, Solution
- Scala Programming Exercises, Practice, Solution
- Python Itertools exercises
- Python Numpy exercises
- Python GeoPy Package exercises
- Python Pandas exercises
- Python nltk exercises
- Python BeautifulSoup exercises
- Form Template
- Composer - PHP Package Manager
- PHPUnit - PHP Testing
- Laravel - PHP Framework
- Angular - JavaScript Framework
- Vue - JavaScript Framework
- Jest - JavaScript Testing Framework