so what is the reason why people want anon mTHP khugepaged?

Jann Horn

so what is the reason why people want anon mTHP khugepaged? is that only for TLB reasons (which means they really just want one more hugepage size between 4K and 2M) or is there something more going on where this is supposed to improve internal kernel efficiency of vmscan or something like that?

Lorenzo Stoakes

@jann why not both?

Jann Horn

@ljs I mean, it's pretty clear to be that it would improve TLB stuff, my question is: Does it only improve TLB stuff, or is there something else it also improves?
(And the follow-up question would be: Do people actually expect to want a bunch of different mTHP sizes or is the expected usecase to have a single anon mTHP size?)

Lorenzo Stoakes

@jann TLB is an obvious improvement, but establishing higher order folios will have fault scalability/reclaim advantages also.

I don't know whether the latter has as much impact of the former.

As for follow up, I think you could at least in theory benefit from multiple mTHP sizes as if we were unable to obtain a larger mTHP size we could try for smaller, but you'd always want to have the largest possible.

Note that this (painfully) also aligns with with the max_pte_none logic which determines how much memory you want to 'waste' on collapse should you find adjacent unmapped ranges.

We are working on a saner means of doing this. See https://lore.kernel.org/linux-mm/dcfc7e27-d3c8-4fd0-8b7b-ce8f5051d597@lucifer.local/T/#m2ecc263112f715710cd6745a6a7ed211aeed92fe and above for much discussion on this.

Jann Horn

@ljs

establishing higher order folios will have fault scalability/reclaim advantages also

By the time khugepaged comes around, fault scalability is probably not much of a concern anymore, right? Since you're probably done faulting when khugepaged comes around to collapse stuff? Unless we're talking about swapin performance. But yeah, reclaim I could see.

As for follow up, I think you could at least in theory benefit from multiple mTHP sizes as if we were unable to obtain a larger mTHP size we could try for smaller, but you'd always want to have the largest possible.

I think you wouldn't get a benefit out of it in terms of TLB - like, on modern AMD machines, I believe if you can't get a 16K page, then the TLB will have to use 4K entries anyway and from the TLB perspective, there's no sense in having 8K pages?

Lorenzo Stoakes

@jann

>By the time khugepaged comes around, fault scalability is probably not much of a concern anymore, right? Since you're probably done faulting when khugepaged comes around to collapse stuff? Unless we're talking about swapin performance. But yeah, reclaim I could see.

ok yeah from khugepaged point of view right. Sorry getting ahead of things there come in handy at _the point of page fault_ when/if we implement mTHP there.

>I think you wouldn't get a benefit out of it in terms of TLB - like, on modern AMD machines, I believe if you can't get a 16K page, then the TLB will have to use 4K entries anyway and from the TLB perspective, there's no sense in having 8K pages?

You can have a while range of mTHP size 1mb, 512 kb, 256 kb, 128 kb, 64 kb, 32 kb, 16 kb I believe on arm for instance.

I think for AMD the benefits are a lot less clear.

Honestly I see this largely right now as an arm64 feature.

Lorenzo Stoakes

@jann Hm I own an AMD machine, maybe I should experiment with it...

Jann Horn

@ljs yeah I enabled 16K mTHP on my chonky AMD home PC

Jann Horn

@ljs maybe Chrome is one of the applications that will get the biggest performance boost from that because almost all their VMAs are smaller than 2M for guard page reasons, and they also happen to use 16K as some kinda allocation block size in their allocator

Jann Horn

@ljs this is an image (though it is years out of date) of the main heap of some Chrome process: https://drive.google.com/file/d/1rI3mFhKzvE9L8C0szPJBJeuN-lq9XdYC/view

Every line is 2MiB of virtual address space, and the barely visible dark stuff on the left and right is non-present PROT_NONE stuff