Q: Why does Octoparse only collect the first item from each page?

 

Description: 

 

I have been testing your software to try and data mine some info.

The website is https://www.yelp.com/search?find_desc=car+audio&find_loc=Brooklyn%2C+NY

The problem is it will only collect the first item from each page.

 

 

A: 

In this case, you can check the "Loop Item" that used to extract all the items from the page, and the XPath for the "Loop Item".

Please follow the steps to check your rule.

1. Open the task

2. In the "Design Overflow" step, you will see the rule in the Workflow Designer. Click each step/box one by one from the beginning to go through the rule. Make sure the order of the rule is correct.

3. When you click the "Loop Item" box, check if all items in the page are extracted by the XPath.

If not, you need to modify the XPath by using our Octoparse XPath tool or other tools like Firepath.

Check out these tutorials to learn how to edit XPath.

       Modify XPath Manually in Octoparse

       Get Started With XPath 1

       Get Started With XPath 2

4. Replace the original with the correct XPath.

 

Only after you create a correct 'loop item' that contains all the links to the detail pages can you move forward to next step and collect data from websites.

 

Check out the tutorial to check your scraping task: Check The Extraction Rule When Errors Occur

 

btn_sidebar_use.png
btn_sidebar_form.png
当社ウェブサイトは、利便性、品質維持・向上を目的に、Cookieを使用しております。詳しくはプロキシーをご確認ください。Cookieの利用に同意頂ける場合は、「同意する」ボタンを押してください。同意頂けない場合は、ブラウザを閉じて閲覧を中止してください。
同意する 閉じる