By Jonathan Corbet
July 8, 2009
Making the best use ofavailable memory is one of the biggest challenges for any operating system.Throwing virtualization into the mix adds both new challenges (balancing memoryuse between guests, for example) and opportunities (sharing pages betweenguests). Developers have responded with technologies like hot-plug memory and KSM, but nobody seems to think thatthe problem is fully solved. Transcendent memory is a new memory-managementtechnique which, it is hoped, will improve the system's use of scarce RAM,regardless of whether virtualization is being used.
In his linux-kernel introduction, DanMagenheimer asks:
What if there was aclass of memory that is of unknown and dynamically variable size, isaddressable only indirectly by the kernel, can be configured either aspersistent or as "ephemeral" (meaning it will be around for awhile,but might disappear without warning), and is still fast enough to besynchronously accessible?
Dan (along with a listof other kernel developers) is exploring this concept, which he calls"transcendental memory." In short, transcendental memory can bethought of as a sort of RAM disk with some interesting characteristics: nobodyknows how big it is, writes to the disk may not succeed, and, potentially, datawritten to the disk may vanish before being read back again. At a first blush,it may seem like a relatively useless sort of device, but it is hoped thattranscendental memory will be able to improve performance in a few situations.
There is anAPI specification [PDF] available; there is also a related C API found inthe patch itself. This discussion will focus on the latter, which suffers fromless EXCESSIVE CAPITAL USE and is generally easier to understand.
Transcendental memoryoperates on the concept of page pools; once a pool is created, data can bestored to pages within the pool. The calls for creating and destroying poolslook like this:
u32 pool_id =tmem_new_pool(struct tmem_pool_uuid uuid, u32 flags)
tmem_destroy_pool(u32pool_id);
Pools are identified bythe uuid value, though the identification really only matters for poolswhich might be shared among multiple users. A fair amount of information isstored in the flags field, including:
- An "ephemeral" bit, which controls whether data successfully written to the pool is allowed to disappear at a random future time.
- A "shared" bit indicating whether the pool is to be shared with other users.
- The size of pages to use in the pool, expressed as a kernel "order" value.
- A specification version number, used to ensure that both sides of the conversation know how to understand each other.
While users are expectedto specify an expected page size, there is no way to specify the size of thepool as a whole. Determining the proper sizing for a pool (which almostcertainly changes over time) is left to the hypervisor or whatever othersoftware component is managing the pool.
As suggested by theabove interface, transcendental memory is very much page-based. Beyond that, italso can never be referenced directly; users are required to copy data into andout of the pool explicitly. The functions used for moving data between normaland transcendental memory are:
int tmem_put_page(u32pool_id, u64 object_id, u32 page_id, unsigned long pfn);
int tmem_get_page(u32pool_id, u64 object_id, u32 page_id, unsigned long pfn);
For both of these calls,pool_id specifies an existing pool. Theobject_id and page_id values, together, forma unique identifier for the page within the pool. If the pool is being used tocache file pages, for example, the object_id would identify the file, while page_id would be the offsetwithin the file.pfn (a page frame number) identifies the page whichis the source of the data (for tmem_put_page()) or the destination (tmem_get_page()).
Note that either callmight fail. Since the size of the pool is not known, callers can never know inadvance whether tmem_put_page() will succeed. So any transcendental memory usermust have a backup plan ready in case the call fails. For pools marked as"ephemeral,"tmem_get_page() is allowed to fail even iftmem_put_page()on the same ID succeeded; in other words, the implementation is allowed to droppages from ephemeral pools if it decides that the memory can be put to betteruse elsewhere. It's also worth noting that, with private, ephemeral pools, tmem_get_page()will remove the indicated page from the pool.
As an example of howthis feature might be used, consider the Linux page cache, which maintainscopies of pages from disk files. When memory gets tight, the page cache will startforgetting pages which are clean, but which have not been referenced in therecent past. With transcendental memory, the page cache could, before droppingthe pages, attempt to store them into an ephemeral transcendental memory pool.At some future time, when one of those pages is needed again, the page cachewould first attempt to fetch it from the pool. If the tmem_get_page()call succeeds, a disk I/O operation will have been avoided and everybodybenefits; otherwise the page is read from disk as usual.
Persistent(non-ephemeral) pools could be used as a sort of swap device. If the swappingcode succeeds in writing a page to the pool, it can avoid writing it to thereal swap device. The result is saved I/O at both swap-out and swap-in times.If the pool lacks space for the swapped page, it will be written to the realswap device in the usual way.
Meanwhile, thetranscendental memory implementation can try to optimize its management of thememory pools. Guests which are more active (or which have been given a higherpriority) might be allowed to allocate more pages from the pools. Duplicatepages can be coalesced; KSM-like techniques could be used, but the use ofobject IDs could make it easier to detect duplicates in a number of situations.And so on.
The API specifies anumber of other operations. There are a couple of calls to flush pages from thepool; one of them can remove all pages with a given object ID. Sub-page-sizereads and writes are supported; there is also a tmem_xchg() call to atomically exchangedata within a transcendental memory page. See the API specification for thefull list.
A number of concernswere raised in the subsequent discussion; as a result, the above API is likelyto change a bit. The biggest concern, though, appears to be security. Thepotential for hostile code to tap into shared pools and read out pages hasdevelopers worried; the need to guess a 128-bit UUID first has proved not to besufficiently reassuring. Even with legitimate users only, a shared pool has thepotential to contain data which should not, in reality, be shared betweenguests. As a result, any transcendental memory user will have to be written totake high-level security issues into account in low-level code.
Dan seemingly doesn'tsee the security problems as being as worrisome as others do. Even so, heeventually announced that thenext transcendental memory patch would not include support for shared pools,and, indeed, version 2 lacks thatfeature. That feature will probably not come back until the security issueshave been thought through and the concerns have been addressed.
Beyond that,transcendental memory will need some convincing evidence that it improves performancebefore it can make it into the mainline. The potential for improvements isclearly there; it is essentially a way for the system to take higher-levelinformation into account when managing its virtual memory resources. Iftranscendental memory is able to fulfill that potential in a secure way, theremay well be a place for it in the mainline kernel.
另外,在看高通的代码时,有一个叫FMEM的设计,意图是在多媒体相关申请的内存在不使用时,通过enable TMEM的方式,将该部分内存加入到cache中,从而加速页面的读写速度(通过回收的方式)。而在需要使用多媒体相关的内存(ioremap cache)时,disable TMEM. 在cleancache和TMEM之间是一层ZCACHE适配层,在disable TMEM的情况下,cleancache仍然调用原来接口,但会将page抛弃,在回收page时,失败则重新交换。在cleancache的文章中也有说明,回收是不一定成功的。在需要页面时,首先清理的就是该部分cache page。FMEM唯一的问题在于disable TMEM和enable TMEM之间的速度相对于swap的有速度和性能。
如果和通过实验比较二者的优缺点呢?