Forensic Crawl the CEF program (Chromium Embedded Framework)

0xRet

Зарегистрированный
Сообщения
3
Реакции
6
After thinking about the title for a long time, I finally felt that it was more appropriate to use this title.

The problem scenario is this:

There is an older CEF program that needs to crawl the contents of the program to obtain data.

As we all know, the CEF framework is nothing more than a browser, and the web page is ultimately accessed, so why not just visit the web page to crawl it?

But in fact, it is debatable whether this can be done. For example, the program I want to crawl uses CEF to register JS extensions, so that the web page can communicate with other local processes, so that I can’t get out of this CEF program. Visit the webpage independently, otherwise the webpage will not function properly.

In this case, we can enable the remote debugging function of the browser according to the remote debugging protocol, and then use crawler tools such as Selenium or Puppeteer to hook the browser, so that we can perform some automated operations on the browser.

Enable remote debugging

The opening method is very simple, add parameters to the shortcut properties of the browser process to be debugged:

--remote-debugging-port=9222 --user-data-dir=C:\ChromeDebug
The remote debugging port is generally 9222, and we just create one for the user data directory.

At this time, we can access 127.0.0.1:9222/json/version, and there is a value called webSocketDebuggerUrl. The debugger mainly connects to the browser based on this URL.

Selenium crawler

Why use Selenium instead of Puppeteer? As mentioned earlier, the CEF program is an ancient program, the Chromium browser kernel version is around V58, and the Puppeteer framework has strict restrictions on the browser version. The official also stated: each version of Puppeteer theoretically only adapts to one A specific version of Chromium browser. The Selenium framework is more friendly to version support. Regardless of the old version or the new version, there is an adapted browser driver. We only need to download the corresponding version of WebDriver. The download address is as follows:

http://chromedriver.storage.googleapis.com/index.html

As for the Selenium usage tutorial, please refer to:

https://selenium-python.readthedocs.io/

Write code to connect to the browser
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('debuggerAddress','127.0.0.1:9222')

browser = webdriver.Chrome(executable_path="C:\chromedriver.exe",chrome_options=chrome_options)

browser.get('http://www.google.com')

After that, the following simulation clicks and crawling content will not be discussed in detail.
 
Верх Низ