Alert icon
We're changing our privacy policy. This stuff matters.  Learn more  Dismiss

Denodo 4.6 Tutorial - Creating a Content-Extracting Aracne crawler project using Denodo Scheduler

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
106 views
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Jul 15, 2010

Next, Start the scheduler server from the "sched" area of the Denodo Platform Control Center, then configure a job for crawling and indexing the American content of the BBC news site. Access the Denodo Scheduler from the following URL:

http://localhost:9090/webadmin/denodo-scheduler-admin/

We use the Denodo Scheduler to create an Aracne and ARN-Index data source, filter sequence, and WebBot crawling job as part of a new project to extract data from the BBC News website, where all we need specify is the regular URL syntax for accessing various sections of content that is available.

The target site for crawling is:

http://news.bbc.co.uk/2/hi/americas

and the Link Filter pattern expression for including matching content pages is:

http://news.bbc.co.uk/2/hi/world/us_and_canada/(.)*

Once the job has been completed once, and the Aracne index populated, use the supplied Search Engine to query the index from the Search Engine tab of the Aracne web interface.

This video is part of the Denodo 4.6 Tutorial. Please visit http://help.denodo.com/tutorial for more interesting exercises with the Denodo Platform!

  • likes, 0 dislikes

Link to this comment:

Share to:
see all

All Comments (0)

Sign In or Sign Up now to post a comment!
Loading...

Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more