Question:

Python programming question

by  |  earlier

0 LIKES UnLike

I have built a simple web crawler which downloads files using the urllib library -

My question is, Is there a way to compare a range of similar HTML documents, in order to check for similarities among tags, text, etc..?

also, is there a regular-expression in python that can only separate text from the document and not the tags?

thnx for any help given...

 Tags:

   Report

1 ANSWERS


  1. Have you looked into the Beautiful Soup library by Leonard Richardson?

    http://www.crummy.com/software/Beautiful...

    Not sure about the first part of your question however, what kinds of similarities are you interested in - structure of the document?

Question Stats

Latest activity: earlier.
This question has 1 answers.

BECOME A GUIDE

Share your knowledge and help people by answering questions.
Unanswered Questions