Python Web Scraping: Test if a given page is found or not on the server

Last update on May 28 2022 13:46:53 (UTC/GMT +8 hours)

Python Web Scraping: Exercise-1 with Solution

Write a Python program to test if a given page is found or not on the server.

Sample Solution:

Python Code:

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError

try:
    html = urlopen("https://abcxyz.com")
except HTTPError as e:
    print("HTTP error")
except URLError as e:
    print("Server not found!")
else:
    print(html.read())
    
try:
    html = urlopen("https://www.example.com/")
except HTTPError as e:
    print("HTTP error")
except URLError as e:
    print("Server not found!")
else:
    print("HTML Details")    
    print(html.read())

Sample Output:

Server not found!
HTML Details
b'<!doctype html>
\n<html>
\n<head>
\n    <title>Example Domain</title>
\n\n    <meta charset="utf-8" />
\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
\n    <meta name="viewport" content="width=device-width, initial-scale=1" />
\n    <style type="text/css">
\n    body {
\n        background-color: #f0f0f2;
\n        margin: 0;
\n        padding: 0;
\n        font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
\n        \n    }
\n    div {\n        width: 600px;
\n        margin: 5em auto;
\n        padding: 50px;
\n        background-color: #fff;
\n        border-radius: 1em;\n    }
\n    a:link, a:visited {
\n        color: #38488f;
\n        text-decoration: none;
\n    }
\n    @media (max-width: 700px) {
\n        body {
\n            background-color: #fff;
\n        }
\n        div {\n            width: auto;
\n            margin: 0 auto;\n            border-radius: 0;\n            padding: 1em;
\n        }
\n    }
\n    </style>   
 \n</head>
 \n\n<body>
 \n<div>
 \n    <h1>Example Domain</h1>
 \n    <p>This domain is established to be used for illustrative examples in documents. You may use this
 \n    domain in examples without prior coordination or asking for permission.</p>
 \n    <p><a href="https://www.iana.org/domains/example">
 More information...</a>
 </p>
 \n</div>
 \n</body>
 \n</html>\n'

Flowchart:

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Python Web Scraping Exercises Home.
Next: Write a Python program to download and display the content of robot.txt for en.wikipedia.org.

Python: Tips of the Day

Find current directory and file's directory:

To get the full path to the directory a Python file is contained in, write this in that file:

import os dir_path = os.path.dirname(os.path.realpath(__file__))

(Note that the incantation above won't work if you've already used os.chdir() to change your current working directory, since the value of the __file__ constant is relative to the current working directory and is not changed by an os.chdir() call.)

To get the current working directory use

import os cwd = os.getcwd()

Documentation references for the modules, constants and functions used above:

The os and os.path modules.

The __file__ constant

os.path.realpath(path) (returns "the canonical path of the specified filename, eliminating any symbolic links encountered in the path")

os.path.dirname(path) (returns "the directory name of pathname path")

os.getcwd() (returns "a string representing the current working directory")

os.chdir(path) ("change the current working directory to path")

Ref: https://bit.ly/3fy0R6m