NLTK Tokenize: Read a given text through each line and look for sentences

Last update on May 28 2022 13:00:06 (UTC/GMT +8 hours)

NLTK Tokenize: Exercise-8 with Solution

Write a Python NLTK program that will read a given text through each line and look for sentences. Print each sentence and divide two sentences with “==============”.

Sample Solution:

Python Code-1:

import nltk.data
text = '''
Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))

Sample Output:

Original Tweet:

Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.

Mr. Smith waited for the train.
==============
The train was late.
==============
Mary and Samantha took the bus.
==============
I looked for Mary and
Samantha at the bus station.

Punctuation following sentences is also included by default.

Example:

Python Code-2:

import nltk.data
text = '''
Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))

Output:

Original Tweet:

Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].

Mr. Smith waited for the train.
==============
(The train was late.)
==============
Mary and Samantha took the bus.
==============
I looked for Mary and
Samantha at the bus station [Sector-1].

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to remove Twitter username handles from a given twitter text.
Next: Write a Python NLTK program to find parenthesized expressions in a given string and divides the string into a sequence of substrings.

Python: Tips of the Day

Find current directory and file's directory:

To get the full path to the directory a Python file is contained in, write this in that file:

import os dir_path = os.path.dirname(os.path.realpath(__file__))

(Note that the incantation above won't work if you've already used os.chdir() to change your current working directory, since the value of the __file__ constant is relative to the current working directory and is not changed by an os.chdir() call.)

To get the current working directory use

import os cwd = os.getcwd()

Documentation references for the modules, constants and functions used above:

The os and os.path modules.

The __file__ constant

os.path.realpath(path) (returns "the canonical path of the specified filename, eliminating any symbolic links encountered in the path")

os.path.dirname(path) (returns "the directory name of pathname path")

os.getcwd() (returns "a string representing the current working directory")

os.chdir(path) ("change the current working directory to path")

Ref: https://bit.ly/3fy0R6m