Login
Dynamic Web Scraping by CSS Selector using Javascript

Dynamic Web Scraping by CSS Selector using Javascript

Ygrab.js is jquery plugins that lets you scrape content – but the CSS part really makes it interesting for front-end devs.

Using this plugin, I was able to build a simple tool to let you display the possible values for any CSS property, which is scraped from my iBacor website.

Here I input the URL of the website I want to scrape, then I use a CSS selector to determine what part of the page to grab. So the above JavaScript would compute to the following if the user enters the display property:

<script src="//code.jquery.com/jquery-2.2.4.min.js"></script>
<script src="js/ygrab.js"></script>
<script>        
$(function() {

    var data = [
        {
            url: 'http://ibacor.com/blog', // url string rquired
            selector: 'div.post', // selector string rquired
            loop: true, // each boolean rquired
            result: [
                {
                    name: 'title', // key string rquired
                    find: 'h3 a', // selector child string rquired
                    grab: {
                        by: 'text', // attribut string rquired
                        value: '' // attribut value string optional
                    }
                },
                {
                    name: 'link',
                    find: 'h3 a',
                    grab: {
                        by: 'attr',
                        value: 'href'
                    }
                },
                // ---- new selector ---- //
            ]
        },
        // ---- new website url ---- //
    ];
    
    var result = ygrab(data);
    
    console.log(JSON.stringify(result, null, 2));
    
});
</script>

Result that allows you to turn any site into a JSON:

{
  "result": [
    {
      "title": "Mengetahui Informasi Suatu Negara Menggunakan API Worldbank",
      "image": "http://ibacor.com/bcr_asset/images/artikel/11794390317144d0693208eefbc6f908.jpg",
      "link": "http://ibacor.com/blog/mengetahui-informasi-suatu-negara-menggunakan-api-worldbank"
    },
    {
      "title": "Tutorial Login, Register, Logout & CRUD menggunakan PHP & MySQLi OOP + Bootstrap",
      "image": "http://ibacor.com/bcr_asset/images/artikel/e234a76c6b3133e6abd25c9325267ee1.jpg",
      "link": "http://ibacor.com/blog/tutorial-login-register-logout-crud-menggunakan-php-mysqli-oop-bootstrap"
    },
    ...
  ]
}

Having built the site myself, I know that every CSS property’s section has an ID that matches its property’s name. And I also know that each property lists its values in an unordered list. So grabbing those values is trivial with a useful service like this as long as you know the structure of the HTML.