Please note, this is a STATIC archive of website www.w3resource.com from 19 Jul 2022, cach3.com does not collect or store any user information, there is no "phishing" involved.
w3resource

Python Web Scraping: Test if a given page is found or not on the server

Python Web Scraping: Exercise-1 with Solution

Write a Python program to test if a given page is found or not on the server.

Sample Solution:

Python Code:

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError

try:
    html = urlopen("https://abcxyz.com")
except HTTPError as e:
    print("HTTP error")
except URLError as e:
    print("Server not found!")
else:
    print(html.read())
    
try:
    html = urlopen("https://www.example.com/")
except HTTPError as e:
    print("HTTP error")
except URLError as e:
    print("Server not found!")
else:
    print("HTML Details")    
    print(html.read())  

Sample Output:

Server not found!
HTML Details
b'<!doctype html>
\n<html>
\n<head>
\n    <title>Example Domain</title>
\n\n    <meta charset="utf-8" />
\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
\n    <meta name="viewport" content="width=device-width, initial-scale=1" />
\n    <style type="text/css">
\n    body {
\n        background-color: #f0f0f2;
\n        margin: 0;
\n        padding: 0;
\n        font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
\n        \n    }
\n    div {\n        width: 600px;
\n        margin: 5em auto;
\n        padding: 50px;
\n        background-color: #fff;
\n        border-radius: 1em;\n    }
\n    a:link, a:visited {
\n        color: #38488f;
\n        text-decoration: none;
\n    }
\n    @media (max-width: 700px) {
\n        body {
\n            background-color: #fff;
\n        }
\n        div {\n            width: auto;
\n            margin: 0 auto;\n            border-radius: 0;\n            padding: 1em;
\n        }
\n    }
\n    </style>   
 \n</head>
 \n\n<body>
 \n<div>
 \n    <h1>Example Domain</h1>
 \n    <p>This domain is established to be used for illustrative examples in documents. You may use this
 \n    domain in examples without prior coordination or asking for permission.</p>
 \n    <p><a href="https://www.iana.org/domains/example">
 More information...</a>
 </p>
 \n</div>
 \n</body>
 \n</html>\n'
 

Flowchart:

Python Web Scraping Flowchart: Test if a given page is found or not on the server

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Python Web Scraping Exercises Home.
Next: Write a Python program to download and display the content of robot.txt for en.wikipedia.org.

What is the difficulty level of this exercise?



Python: Tips of the Day

Find current directory and file's directory:

To get the full path to the directory a Python file is contained in, write this in that file:

import os 
dir_path = os.path.dirname(os.path.realpath(__file__))

(Note that the incantation above won't work if you've already used os.chdir() to change your current working directory, since the value of the __file__ constant is relative to the current working directory and is not changed by an os.chdir() call.)

To get the current working directory use

import os
cwd = os.getcwd()

Documentation references for the modules, constants and functions used above:

  • The os and os.path modules.
  • The __file__ constant
  • os.path.realpath(path) (returns "the canonical path of the specified filename, eliminating any symbolic links encountered in the path")
  • os.path.dirname(path) (returns "the directory name of pathname path")
  • os.getcwd() (returns "a string representing the current working directory")
  • os.chdir(path) ("change the current working directory to path")

Ref: https://bit.ly/3fy0R6m