Help Center
/
Knowledge Base
/
Data Connection: Web Crawler
Knowledge Base

Data Connection: Web Crawler

A step-by-step guide to integrating a web crawler into your knowledge base.

Adding a Web Crawler

It takes three simple steps to add a web cralwer:

  1. From your knowledge base, navigate to “Upload & Index”“Data Sources”.
  2. Select “+ Data Sources”.
  3. Choose the Web Crawler option to add a website for crawling and indexing into your knowledge base.

1. Name Your Data Source

Assign a meaningful name to this data source. This name will help you easily identify the purpose or nature of the link within your knowledge base.

2. Provide the URL

Enter the URL of the website you want to crawl. Ensure the link is accurate and active.

3. Set the Maximum Crawl Depth

Specify how deep the crawler should explore the website.

  • Level 0: Crawls only the specified URL.
  • Level 1: Crawls the specified URL and follows links on that page.
  • Level 2: Crawls the specified URL, follows links on that page, and also follows links on subsequent pages.

Common Issues

When using a web crawler, you may encounter certain limitations, such as:

  • Paywalls: If the URL is behind a paywall, the web crawler may be unable to access the full content.
  • JavaScript-Rendered Content: Some websites use JavaScript to dynamically load content, making it inaccessible to traditional web crawlers.

For best results, ensure the target website doesn’t have restrictions that could impede crawling.

Related Articles

Need More Help?
Get in touch with us!

Submit a Ticket