YumDB
Since yum 3.2.26 yum has started storing additional information about installed packages in a location outside of the rpmdatabase. None of the information stored there is critical to performing its function but it enhances the user experience and makes it possible to know more about the context in which a package was installed.
Format
the yumdb is a simple flat file database. The filesystem creates a simple tree structure:
/var/lib/yum/yumdb/ p/ $checksum-packagename-$ver-$rel.$arch/keyname
Each keyname is a file and the contents of that file are the values.
Note since 3.2.28 hardlinks are allowed between different keys, this saves on load time and storage but means that if you try to change the data using a text editor it'll probably change more than you want it to.
Why not a "real" database
The two main operations that yum uses the yumdb for are:
- Given an installed package XYZ-2-1.noarch, get the value of yumdb key FOO. (Eg. yumdb get from_repo yum).
- Given an installed package XYZ-2-1.noarch, set the value of yumdb key FOO to BAR. (Eg. yumdb set from_repo special yum).
...using the filesystem allows both those operations to be fast and atomic. It is unlikely to be significantly better to use any other approach for the two main uses, however the most common suggestions "sqlite" and a key/value store (like "libdb*") fail at least one of those tests. Using the filesystem makes it easy to:
- Keep all the yum code simple.
- Have isolation. Eg. Something goes wrong and the "reason" key for package XYZ is broken, nothing else should be affected.
- Have a knowledgeable sysadmin fix any problems.
- Have interoperability (it's trivial to to the get/set operations from any language without having to use the yum API -- although we still don't recommend it).
There are two minor downsides to using the filesystem:
- Searching is not fast (Eg. yumdb search from_repo updates-testing). The main thing to realize here is that no yum tool currently needs to perform operations like this.
- Load all keys of XYZ from all installed packages. The only usecase here is loading the checksum data to calculate rpmdb-versions, on install/etc. ... however we need a separate index for this anyway, as we when need to know this information quickly we don't want to load the packages at all.
Stored information
One of the desires for the yumdb is that users/plugins/etc. could store almost arbitrary information in the yumdb, and have it attached to specific packages. So listing a "canonical" set of keys is never going to be possible. At some point there may be an API to get a list of "keys that should migrate on a package update", but that isn't in 3.2.29 atm.
So here's a list of all the items that should be set for every package (from yumdb info) from 3.2.29 onwards:
- from_repo: the name of the repo from which the pkg was installed
- from_repo_revision: Repo. revision. Or ctime for a local package.
- from_repo_timestamp: Repo. timestamp. Or mtime for a local package.
- reason: reason for installing this pkg (user, dep, etc)
- releasever: $releasever of the system at the time the pkg was installed (so you can look for pkgs which have lingered across release updates)
- installed_by (3.2.28): The loginuid of the user who first installed this package (note that some tools which call yum don't obey loginuid, this not being set is one of many problems that introduces). This doesn't cross Obsoletes.
- changed_by (3.2.28): The loginuid of the user who last installed this package.
These are known other keys:
- checksum_type: The type of the checksum for the installed pkg. Eg. md5, sha1, sha256.
- checksum_data: The value of the checksum for the installed pkg.
- origin_url (3.2.29): Requires a newer urlgrabber, this is the url that the package was download from.
- command_line: The command line used to install this pkg (only set if pkg. installed from a tool that has a command line).
- installonly: Not set by yum, but looked at to see if installonly packages should be automatically removed.
- group_member (3.2.29+?): Set by yum if a package was installed as part of a "group install" (beta patch).
Accessing this information
There is a script called 'yumdb' in yum-utils which allows you to access this information:
- get the repo from which yum-utils was installed:
yumdb get from_repo yum-utils
- set a note on the packages 'joe' and 'geany'
yumdb set note "installed by seth b/c he likes them" joe geany
- Dump out all yumdb values about yum and yum-utils:
yumdb info yum-utils yum
History
Long ago in a galaxy far away known as 2007 - we asked for the ability to write this kind of data into the rpmdb itself. We asked again in 2009. With no answer from the subject but told informally "no", we decided to implement it in a db outside of the rpmdb. In order to keep it flexible we just needed key,value pairs tied to a pkgid.
other info: rpm.org ticket on this subject: http://rpm.org/ticket/43
yum 运行原理
yum的工作需要两部分来合作,一部分是yum服务器,还有就是client的yum工具。下面分别介绍两部分工作原理。
-
yum服务器
所有要发行的rpm包都放在yum服务器上以提供别人来下载,rpm包根据kernel的版本号,cpu的版本号分别编译发布。yum服务器只要提供简单的下载就可以了,ftp或者httpd的形式都可以。yum服务器有一个最重要的环节就是整理出每个rpm包的基本信息,包括rpm包对应的版本号,conf文件,binary信息,以及很关键的依赖信息。在yum服务器上提供了createrepo工具,用于把rpm包的基本概要信息做成一张"清单",这张"清单""就是描述每个rpm包的spec文件中信息。
-
yum client端
client每次调用yum install或者search的时候,都会去解析/etc/yum.repos.d下面所有以.repo结尾的配置文件,这些配置文件指定了yum服务器的地址。yum会定期去"更新"yum服务器上的rpm包"清单",然后把"清单"下载保存到yum自己的cache里面,根据/etc/yum.conf里配置(默认是在/var/cache/yum下面),每次调用yum装包的时候都会去这个cache目录下去找"清单",根据"清单"里的rpm包描述从而来确定安装包的名字,版本号,所需要的依赖包等,然后再去yum服务器下载rpm包安装。(前提是不存在rpm包的cache)
搭建yum服务器
1. 安装工具createrepo $ yum install createrepo 2. 建立repo发布目录 $ mkdir /var/www/yum/centos/5/{i386,x86_64} $ mkdir /var/www/yum/centos/6/{i386,x86_64} 3. 用rpmbuild生成两个rpm包,比如说下面几个包,版本号不一样,包名字不一样 rpm_test-0.0.1-3.noarch.rpm rpm_test-0.0.2-3.noarch.rpm rpm_test2-0.0.2-3.noarch.rpm 4. copy rpm包到发布目录下 $ cp rpm_test-0.0.* /var/www/yum/centos/5/i386/ 5. 用createrepo生成meta信息 $ createrepo -o /var/www/yum/centos/5/i386/ /var/www/yum/centos/5/i386 3/3 - rpm_test-0.0.1-3.noarch.rpm Saving Primary metadata Saving file lists metadata Saving other metadata 6. 配置apache或者nginx到发布目录
在createrepo之后会在/var/www/yum/centos/5/i386/生成下面的目录和文件
$ tree repodata/ repodata/ |-- filelists.xml.gz |-- other.xml.gz |-- primary.xml.gz `-- repomd.xml $ gunzip filelists.xml.gz $ gunzip primary.xml.gz
filelists.xml里面记录了所有rpm包列表,版本号,配置文件等
<package pkgid="19c82aa653a394ee1f7dbc7b694fbf0221bc1848" name="rpm_test" arch="noarch"><version epoch="0" ver="0.0.1" rel="3"/><file>/usr/local/rpm_test/conf/test.conf</file><file>/usr/local/rpm_test/test.py</file><file type="dir">/usr/local/rpm_test/conf</file></package> ...
primary.xml里面记录描述了rpm包的依赖等信息
配置客户端
$ vim /etc/yum.repos.d/firefoxbug.repo [firefoxbug] name=firefoxbug baseurl=http://42.120.7.71/centos/5/i386/ enabled=1 gpgcheck=0 gpgkey=
查看本地yum cache
默认是在/var/cache/yum下这里记录着每个repo对应的cache
/var/cache/yum/ |-- base | |-- cachecookie | |-- mirrorlist.txt | |-- packages | |-- primary.xml.gz | |-- primary.xml.gz.sqlite | `-- repomd.xml |-- epel | |-- 76c4dcbfaf075e55d5876839eb11c4f33b3a2495-primary.sqlite | |-- cachecookie | |-- mirrorlist.txt | |-- packages | `-- repomd.xml |-- firefoxbug | |-- cachecookie | |-- packages | |-- primary.xml.gz | |-- primary.xml.gz.sqlite | `-- repomd.xml |-- timedhosts.txt |-- updates | |-- cachecookie | |-- mirrorlist.txt | |-- packages | |-- primary.sqlite | `-- repomd.xml
- 查看firefoxbug这个repo,primary.xml.gz就是yum服务器上的"清单",但是这里以sqlite方式存储了,可以查看sqlite的db
$ sqlite3 primary.xml.gz.sqlite sqlite> .table conflicts db_info files obsoletes packages provides requires sqlite> select * from packages; 1|896712eb4b4af2d61745dd30e0a6f6513043fd69|rpm_test|noarch|0.0.2|0|3|rpm_test|rpm_test by Wanghua||1406360629|1406360561|Commercial||tools|firefoxbug|rpm_test-0.0.2-3.src.rpm|280|2402||2734|268|816|rpm_test-0.0.2-3.noarch.rpm||sha 2|3ad546bd3ce28b0a82a1387f438f456349e20c78|rpm_test2|noarch|0.0.2|0|3|rpm_test|rpm_test by Wanghua||1406363739|1406363674|Commercial||tools|firefoxbug|rpm_test2-0.0.2-3.src.rpm|280|2406||2738|268|816|rpm_test2-0.0.2-3.noarch.rpm||sha 3|19c82aa653a394ee1f7dbc7b694fbf0221bc1848|rpm_test|noarch|0.0.1|0|3|rpm_test|rpm_test by Wanghua||1406360629|1406356964|Commercial||tools|firefoxbug|rpm_test-0.0.1-3.src.rpm|280|2402||2733|268|816|rpm_test-0.0.1-3.noarch.rpm||sha sqlite> select * from requires; /bin/sh|||||1|TRUE python|GE|0|2.4.3||1|FALSE /bin/sh|||||2|TRUE python|GE|0|2.4.3||2|FALSE /bin/sh|||||3|TRUE python|GE|0|2.4.3||3|FALSE
- 每次yum装包或者卸载的时候都会来查询这个sqlite的DB,然后做出相应的操作。
- 清除本地yum cache
调用sudo yum clean会把这份"清单""全都清除,下次调用yum install等操作又会重新生成。
$ sudo yum clean /var/cache/yum/ |-- base | |-- packages |-- epel | |-- packages |-- firefoxbug | |-- packages |-- updates | |-- packages
- timedhosts.txt这个文件记录着所有源地址访问所需要的时间,可以查到哪些源的地址比较慢
- 如果/etc/yum.conf中keepcache选项是1,那么下载的rpm包都会保存到/var/cache/yum/xxx/package下
- yum install package的时候怎么确定package已经安装了呢?这部分确定不是在/var/cache/yum中得到的,而是在/var/lib/rpm/下面得到。因为装包的时候会要用root去写这个文件夹下面的db。具体这块的内容就得看rpm的源码了