Next, Start the scheduler server from the "sched" area of the Denodo Platform Control Center, then configure a job for crawling and indexing the American content of the BBC news site. Access the Denodo Scheduler from the following URL:
http://localhost:9090/webadmin/denodo-scheduler-admin/
We use the Denodo Scheduler to create an Aracne and ARN-Index data source, filter sequence, and WebBot crawling job as part of a new project to extract data from the BBC News website, where all we need specify is the regular URL syntax for accessing various sections of content that is available.
The target site for crawling is:
http://news.bbc.co.uk/2/hi/americas
and the Link Filter pattern expression for including matching content pages is:
http://news.bbc.co.uk/2/hi/world/us_and_canada/(.)*
Once the job has been completed once, and the Aracne index populated, use the supplied Search Engine to query the index from the Search Engine tab of the Aracne web interface.
This video is part of the Denodo 4.6 Tutorial. Please visit http://help.denodo.com/tutorial for more interesting exercises with the Denodo Platform!
Link to this comment:
All Comments (0)