Chromium Browser Evaluation

本文探讨了在Mozart平台项目中,Chromium(基于Blink引擎)和Firefox(Gecko引擎)作为候选浏览器的优劣,涉及社区支持、资源利用、W3C标准支持等方面,并分析了WebKitGTK与WebKit在平台适应性上的局限。结论部分列出了Mozart项目的依赖支持情况,指出主要挑战是ffmpeg框架缺失和软件解码问题。

Introduction

        Browsers are the killer applications that many network device manufacturers need to embed in their products. More and more terminal devices presents their contents by web-based applications, benfits from the technology, users can easily access the applicatons from any computer connected to the internet using a standard browser. This contrasts with traditional desktop applications , which are installed on a local computer. Almost any desktop software can be developed as a web-based application.

Now, The most popular web browsers on embeded devices are Chrome

Chromium ,FirefoxSafari. We adopt a open source browser named midori which is based on webkitgtk as a demostration for our platform capablities. but, It needs a deep investigation to choose a proper browser to meet diverse application scenario and demands. Here lists several criterial we think about on the project mozart for you.

  1. Tremendously active community support Behind, with input from community developers, issues can be solved quickly and new features can be developed continuously.
  2. Lightweight, A lightweight web browser is a web browser that sacrifices some of the features of a manstream web browser in order to reduce the consumption of system resources, especially to minimize the memory footprint.

  3. Make full use of hardware resources, that means takes advantages of the GPU to provide hardware accelerations CSS, WebGL, and html5 built-in  video tag.
  4. Full features of W3C HTML5 standard support, supports webrtc,websocket ,web-gl and built-in multimedia components.
  5. Open-Source, License Business-friendly, So Customers can use freely without payments.

 as shown below table:

 nd basically,  we believe the two browsers, mozilla firefox and chromium can meet our customers requirments.

  1. WebKitGTK, a browser engine used in Apple`s safari browser and other products. And it is also the ancestors of Blink, which is the defaulty web engine of chromium(chrome). Although there are many browsers developed based on it, there are no active community for the browser issue fix and new feature keeping developments, may be safari is a exception, but it is a proprietary software, and can only be running on MacOS and IOS.
  2. WpeWebKit, same reason with WebKitGTK, not suitable on the scenario for well-developted browser demands.
  3. Genko, the web-engine of mozilla-firefox, which is well-know as one of the most popular browsers. And meets our requirments.
  4. Chrome, chrome is developed by google, google releases the majority of chrome`s source code as the chromium open-source project, however, chrome itself is a proprietary software, so it can`t be used on commerical projects.
  5. Chromium, chromium is an open-source web-browser project started by google target to provide the source code for the proprietary google chrome browser, the two browsers share the majority of code and features, though there are some minor differences in features and logos, and thy are different licensing. Based on blink engine, chromium only use WebCore, and include its own javascript engine named V8 and multiprocess system. We select is as our 2nd candadite except mozilla in the project.
  6. CEF, the Chromium Embeded Framework is an open source framework for embeding a web browser engine based on the chromium core, it allows developers to add web-browser control and implement an HTML5-based layout GUI in a desktop application or to provide web browser capabilities to a software application or game. Our candidate.

Basically, the relationship between KHTML, webkit, blink and chromium is like thiat:

Chromium Architecture 

This section i present an analysis of the google chromium web browser, The golal of this section was to analyze the availbale documentation on the chromium project and determine a conceptural architecture for the system. From the perspective of function, the chromium can be divided into serval subsystems, such as multi-meida, display, javascript engine, plugins, multi-process, network, Etc. And through the functional comparation, we give the feasibility analysis of porting chromium to  project, including the schedule evalution..

This section serves as a summary feasibility evalution of chromium running on project, so there wont be the conrete technology details of the google chromium.

The reference architecture of a web browser maps comfortably to Google Chrome's subsystems, as it contains all the components in the web browser domain that are portrayed in below diagram.

Analysis: 

  1. User Interface, The user interface is the subsystem that user explicity sees, It is the layer that allows the user to access the functionality of the browser in a method that ideally should be understandable and easy to see.
  2. Browser, In terms of managing the different subsystems, the browser is the most central component of chromium, the browser is responsible for the following list of tasks:
  1. Spawing new tabs
  2. Communication with the network
  3. Handing the user input
  4. Window management
  5. Location bar
  6. Disk Cache
  7. Cookie DataBase
  8. History Database
  9. PasswdDatabase
  1.   Networking, the network stack handles all universal resource locator requests by the browser, fetches the resources from the network, and also requests caches of results for further possible use. It is platform independent and easy to port.
  2. Javascript interpreter(V8), The javascript interpreter in chromium is V8, it is currently maintained as an open source project.platform independent and easy to port to mozart project.
  3. Rending Engine(WebKit), WebKit consists of three components , WebCore, JavaScriptCore and WebKit(an API layer around the two formor), however, only the WebCore component is used in chromium. WebCore offers rending engine service to chromium, handing web components such as CSS, DOM, HTML and XHTML.
  4. XML Parser(libXML & libXSLT), as time progress, XML and its associated technologies are becoming more and more of an integral part of the web browsing experience, chomium use open source code libXML in the browser to handle with the XML. Platform independent, and easy to port.
  5. Plugins, plugins provide a specific function to the browser that is no already contained therein, for example, the adobe reader plugin allows PDF documents to be viewed insied chromium and the quicktime plugin allows videos to be viewed inside chromium tab.
  6. DisplayBackEnd, chromium use skia for render, skia is google`s in-house graphic renderer, which provide high performance grahics rending for everting but text. Skia did not involed with lowlevel display implementation, both X11 and wayland can be taken as its backend. Work principle with wayland show below graph.

  1. Flash player plugin.  Adobe Flash player plugin is a lightweight browser plugin and rich internet application runtime for multi-meida process, games and so on. it is wildey used during former days, and would be replaced by html5 in the future. Chromium has built-in flash player plugin called ppapi-flashplayer. In july 2017, adobe announced that it would end support for flash player in 2020. The html5 will candidate.
  2. Multi-media. Chromium take ffmpeg as its media backend engine. Ffmpeg is a free software project, the product of which is a vast software suit of libraries and programs for handing video, audio, and other multi-media files and streams. But now, mozart platform not support.

    Above all, the common functionality is not complete enough because we dont support ffmpeg framework on mozart project. But, it is can be dealt with by use ffmpeg software codec temporily and then enable the hardware acceleration later.

Conclusion

From the analysis of above section, we can lists the items support situation that is platform dependent of mozart porject as below table.

components

Mozart supports situation

comment

network

yes

Display backend

Yes(wayland)

Media backend

No(only gstreamer, noffmpeg)

Mozart cant suppport ffmpeg

Flash plugin

Yes

Would end support 2020

others

Platform independent or ready on mozart

The main problem is:

  1. Chromium and CEF perform the video playback function with FFMEG framwork, although its almostly works on all platforms, but software decoding is not a right option on embded linux platforms. The ffmpeg video codec should be replaced by sunxi hardware decoder accelerator, shown as below

  1. Building Chromium using Yocto, need dealt with compille and dependency staff problems.
  2. The footprint may bigger than now webkitgtk midori implementation, There are demo implemetations in open source community for firefox and chromium porting to yocto project. We can take this for reference.


 结束! 

import atexit import json import multiprocessing import time import uuid # import browsergym.core # noqa F401 (we register the openended task as a gym environment) import gymnasium as gym import html2text import tenacity from browsergym.utils.obs import flatten_dom_to_str, overlay_som from openhands.core.exceptions import BrowserInitException from openhands.core.logger import openhands_logger as logger from openhands.runtime.browser.base64 import image_to_png_base64_url from openhands.utils.shutdown_listener import should_continue, should_exit from openhands.utils.tenacity_stop import stop_if_should_exit BROWSER_EVAL_GET_GOAL_ACTION = 'GET_EVAL_GOAL' BROWSER_EVAL_GET_REWARDS_ACTION = 'GET_EVAL_REWARDS' class CloudPSSBrowserEnv: def __init__(self, browsergym_eval_env: str | None = None): """ 初始化浏览器环境实例。 Args: browsergym_eval_env (str | None): 用于评估模式的浏览器环境路径。如果提供,则启用评估模式。 属性: html_text_converter: HTML 文本转换器实例。 eval_mode (bool): 标识是否处于评估模式。 eval_dir (str): 评估目录路径。 browsergym_eval_env (str | None): 评估模式下的浏览器环境路径。 browser_side: 浏览器端通信管道。 agent_side: 代理端通信管道。 说明: - 如果 `browsergym_eval_env` 不为 None,则自动启用评估模式。 - 初始化时会设置多进程启动方法为 'spawn',并创建浏览器和代理之间的通信管道。 - 注册 `close` 方法为程序退出时的清理函数。 """ self.html_text_converter = self.get_html_text_converter() self.eval_mode = False self.eval_dir = '' # EVAL only: browsergym_eval_env must be provided for evaluation self.browsergym_eval_env = browsergym_eval_env self.eval_mode = bool(browsergym_eval_env) # Initialize browser environment process multiprocessing.set_start_method('spawn', force=True) self.browser_side, self.agent_side = multiprocessing.Pipe() self.init_browser() atexit.register(self.close) def get_html_text_converter(self) -> html2text.HTML2Text: html_text_converter = html2text.HTML2Text() # ignore links and images html_text_converter.ignore_links = False html_text_converter.ignore_images = True # use alt text for images html_text_converter.images_to_alt = True # disable auto text wrapping html_text_converter.body_width = 0 return html_text_converter @tenacity.retry( wait=tenacity.wait_fixed(1), stop=tenacity.stop_after_attempt(5) | stop_if_should_exit(), retry=tenacity.retry_if_exception_type(BrowserInitException), ) def init_browser(self) -> None: logger.debug('Starting cloudpss browser env...') try: self.process = multiprocessing.Process(target=self.browser_process) self.process.start() except Exception as e: logger.error(f'Failed to start browser process: {e}') raise if not self.check_alive(timeout=200): self.close() raise BrowserInitException('Failed to start browser environment.') def browser_process(self) -> None: if self.eval_mode: assert self.browsergym_eval_env is not None logger.info('Initializing cloudpss browser env for web browsing evaluation.') if not self.browsergym_eval_env.startswith('browsergym/'): self.browsergym_eval_env = 'browsergym/' + self.browsergym_eval_env if 'visualwebarena' in self.browsergym_eval_env: import browsergym.visualwebarena import nltk nltk.download('punkt_tab') elif 'webarena' in self.browsergym_eval_env: import browsergym.webarena # noqa F401 register webarena tasks as gym environments # type: ignore elif 'miniwob' in self.browsergym_eval_env: import browsergym.miniwob # noqa F401 register miniwob tasks as gym environments # type: ignore else: raise ValueError( f'Unsupported browsergym eval env: {self.browsergym_eval_env}' ) env = gym.make(self.browsergym_eval_env, tags_to_mark='all', timeout=100000) else: env = gym.make( 'browsergym/openended', task_kwargs={'start_url': 'about:blank', 'goal': 'PLACEHOLDER_GOAL'}, wait_for_user_message=False, headless=True, disable_env_checker=True, tags_to_mark='all', timeout=100000, pw_context_kwargs={'accept_downloads': True}, pw_chromium_kwargs={'downloads_path': '/workspace/.downloads/'}, ) obs, info = env.reset() logger.info('Successfully called env.reset') # EVAL ONLY: save the goal into file for evaluation self.eval_goal = None self.goal_image_urls = [] self.eval_rewards: list[float] = [] if self.eval_mode: self.eval_goal = obs['goal'] if 'goal_object' in obs: obs['goal_object'] = list(obs['goal_object']) if len(obs['goal_object']) > 0: self.eval_goal = obs['goal_object'][0]['text'] for message in obs['goal_object']: if message['type'] == 'image_url': image_src = message['image_url'] if isinstance(image_src, dict): image_src = image_src['url'] self.goal_image_urls.append(image_src) logger.debug(f'Browsing goal: {self.eval_goal}') logger.info('Browser env started.') while should_continue(): try: if self.browser_side.poll(timeout=0.01): unique_request_id, action_data = self.browser_side.recv() # shutdown the browser environment if unique_request_id == 'SHUTDOWN': logger.debug('SHUTDOWN recv, shutting down browser env...') env.close() return elif unique_request_id == 'IS_ALIVE': self.browser_side.send(('ALIVE', None)) continue # EVAL ONLY: Get evaluation info if action_data['action'] == BROWSER_EVAL_GET_GOAL_ACTION: self.browser_side.send( ( unique_request_id, { 'text_content': self.eval_goal, 'image_content': self.goal_image_urls, }, ) ) continue elif action_data['action'] == BROWSER_EVAL_GET_REWARDS_ACTION: self.browser_side.send( ( unique_request_id, {'text_content': json.dumps(self.eval_rewards)}, ) ) continue action = action_data['action'] obs, reward, terminated, truncated, info = env.step(action) # EVAL ONLY: Save the rewards into file for evaluation if self.eval_mode: self.eval_rewards.append(reward) # type: ignore # add text content of the page html_str = flatten_dom_to_str(obs['dom_object']) obs['text_content'] = self.html_text_converter.handle(html_str) # make observation serializable obs['set_of_marks'] = image_to_png_base64_url( overlay_som( obs['screenshot'], obs.get('extra_element_properties', {}) ), add_data_prefix=True, ) obs['screenshot'] = image_to_png_base64_url( obs['screenshot'], add_data_prefix=True ) obs['active_page_index'] = obs['active_page_index'].item() obs['elapsed_time'] = obs['elapsed_time'].item() self.browser_side.send((unique_request_id, obs)) except KeyboardInterrupt: logger.debug('Browser env process interrupted by user.') try: env.close() except Exception: pass return def step(self, action_str: str, timeout: float = 120) -> dict: """Execute an action in the browser environment and return the observation.""" unique_request_id = str(uuid.uuid4()) self.agent_side.send((unique_request_id, {'action': action_str})) start_time = time.time() while True: if should_exit() or time.time() - start_time > timeout: raise TimeoutError('Browser environment took too long to respond.') if self.agent_side.poll(timeout=0.01): response_id, obs = self.agent_side.recv() if response_id == unique_request_id: return dict(obs) def check_alive(self, timeout: float = 60) -> bool: self.agent_side.send(('IS_ALIVE', None)) if self.agent_side.poll(timeout=timeout): response_id, _ = self.agent_side.recv() if response_id == 'ALIVE': return True logger.debug(f'Browser env is not alive. Response ID: {response_id}') return False def close(self) -> None: if not self.process.is_alive(): return try: self.agent_side.send(('SHUTDOWN', None)) self.process.join(5) # Wait for the process to terminate if self.process.is_alive(): logger.error( 'Browser process did not terminate, forcefully terminating...' ) self.process.terminate() self.process.join(5) # Wait for the process to terminate if self.process.is_alive(): self.process.kill() self.process.join(5) # Wait for the process to terminate self.agent_side.close() self.browser_side.close() except Exception as e: logger.error(f'Encountered an error when closing browser env: {e}') 我希望进行自己的调试怎么修改以上代码?
最新发布
08-30
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

papaofdoudou

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值