菜鸟学Linux 第111篇笔记 Memory
建议查看原文(因为复制版的格式可能有问题)
原文出自 Winthcloud 链接:
内容总览
内存子系统组件
Memory提升
Viewing system calls
Strategies for using memory
Tunning page allocation
Tuning overcommit
Slab cache
ARP cache
Page cache
调优策略(redhat 6 performance tuning guide官方文档)
进程间通信相关的调优
内存子系统组件
slab allocator
buddy system
kswapd
pdflush
mmu
虚拟化环境
PA --> HA --> MA
虚拟机转换:PA-->HA
GuestOS, OS
Shadow PT
Memory提升
lbs
Hugetlbfs
查看是否启用Hugetlbfs
cat /proc/meminfo | grep Huge
启用大页面(永久有效)
/etc/sysctl.conf
添加 vm.nr_hugepages = n
即时启用
sysctl -w vm.nr_hugepages=n
Configure hugetlbfs if needed by application
创建hugepage并挂载
mkdir /hugepages
mount -t hugetlbfs none /hugepages
Viewing system calls
Trace every system call made by a program
strace -o /tmp/strace.out -p PID
grep mmap /tmp/strace.out
Summarize system calls
strace -c -p PID or
strace -c COMMAND
Strategies for using memory
Reduce overhead for tiny memory objects
Slab cache
cat /proc/slabinfo
Reduce or defer service time for slower subsystems
Filesystem metadata: buffer cache(slab cache)
Disk IO: page cache
Interprocess communications: shared memory
Network IO: buffer cache, arp cache, connection tracking
使用buffer cache 缓存文件元数据
使用page cache缓存Disk IO
使用shm完成进程间通信
使用buffer cache, arp cache和connection tracking提升网络IO性能
Considerations when tunning memory
How should pages be reclaimed to avoid pressure?
Larger writes are usually more efficient due to re-sorting
Tunning page allocation
Set using
vm.min_free_kbytes
Tuning vm.min_free_kbytes only be necessary when an application regularly
needs to allocate a large block of memory, then frees that same memory
It may well be the case that the system has too little disk bandwith, too
little CPU power, or too little memory to handle its load.
Consequences
Reduces service time for demand paging
Memory is not available for other useage
Can cause pressure on ZONE_NORMAL
内存耗尽会使系统崩溃
Tuning overcommit
Set using
cat /proc/sys/vm/overcommit_memory
vm.overcommit_memory
0 = heuristic overcommit
1 = always overcommit
2 = commit all RAM plus a percentage of swap (may be > 100)
vm.overcommit_ratio
specified the percentage of physical memory allowed to be
overcommited when the vm.overcommit_memory set to 2
View Committed_AS in /proc/meminfo
An estimate of how much RAM is required to avoid an out of memory (OOM)
condition for the current workload on a system
OOM
Overcommit Of Memory
Slab cache
Tiny kernel objects are stored in slab
Extra overhead of tracking is better than using 1 page/object
Example: filesystem metadata(dentry and inode caches)
Monitoring
/proc/slabinfo
slabtop
vmstat -m
Tuning a particular slab cache
echo "cache_name limit batchcount shared" > /proc/slabinfo
limit the maximum number of objects that will be cached for each CPU
batchcount the maximum number of global cache objects that will be
trasferred to the per-CPU cache when it becomes empty
shared the sharing behavior for Symmetric MultiProcessing(SMP) systems
ARP cache
ARP entries map hardware address to protocol address
cached in /proc/net/arp
By default, the cache is limited to 512 entries as a soft limit
and 1024 entries as a hard limit
Garbage collection removes stale or older entries
Insufficient ARP cache leads to
Intermittent timeouts between hosts
ARP thrashing
Too much ARP cache puts pressure on ZONE_NORMAL
List entries
ip neighbor list
Flush cache
ip neighbor flush dev ethX
Tuning ARP cache
Adjust where the gc will leave arp table alone
net.ipv4.neigh.default.gc_thresh1
default 128
Soft upper limit
net.ipv4.neigh.default.gc_thresh2
default 512
Becomes hard limit after 5 seconds
Hard upper limit
net.ipv4.neigh.default.gc_thresh3
Garbage collection frequency in seconds
net.ipv4.neigh.default.gc_interval
Page cache
A large percentage of paging activity is due to I/O requests
File reads: each page of file read from disk into memory
These pages form the page cache
Page cache is always checked for IO requests
Drectory reads
Reading and writing regular files
Reading and writing via block device files, DISK IO
Accessing memory mapped files, mmap
Accessing swapped out pages
Page in the page cache are associated with file data
Tuning page cache
View page cache allocation in /proc/meminfo
Tune length/size of memory
vm.lowmen_reserve_ratio
vm.vfs_cache_pressure
Tune arrival/completion rate
vm.page-cluster
vm.zone_reclaim_mode
vm.lowmen_reserve_ratio
For some specialised workloads on highmem machines it is dangerous for the
kernel to allow process memory to be allocated from the "lowmem" zone
Linux page allocator has a mechanism which prevents allocations which could
use highmem from using too much lowmem
The 'lowmem_reserve_ratio' tunable determines how aggressive the kernel is
in defending these lower zones
If you have a machine which uses highmem or ISA DMA and Your applications
are using mlock(), or if you are running with no swap then you probably
should change the lowmem_reserve_ratio setting
vfs_cache_pressure
Controls the tendency of the kernel to reclaim the memory which is used for
caching of directory and inode objects
At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim
Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry
and inode caches
When vfs_cache_pressure=0, the kernel will never reclaim dentries and
inodes due to memory pressure and this can easily lead to out-of-memory
conditions
Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to
reclaim dentries and inodes.
page-cluster
page-cluster controls the number of pages which are written to swap in
single attempt
It is a logarithmic value-setting it to zero means "1 page", setting it to
1 means "2 pages", setting it to 2 means "4 pages", etc
The default value is three (eight pages at a time)
There may be some small benefits in tuning this to a different value if
your workload is swap-intensive
zone_reclaim_mode
Zone_reclaim_mode allows someone to set more or less aggressive approaches
to reclaim memory when a zone runs out of memory
If it is set to zero then no zone reclaim occurs
Allocations will be satisfied from other zones/nodes in the system
This is value ORed together of
1 = Zone reclaim on
2 = Zone reclaim writes dirty pages out
4 = Zone reclaim swaps pages
Anonymous pages
Anonymous pages can be another large consumer of data
Are not associated with a file, but instead contain:
Program data - arrays, head allocations, etc
Anonymous memory regions
Dirty memory mapped process private pages
IPC shared memory regions pages
View summary usage
grep Anon /proc/meminfo
cat /proc/PID/statm
Anonymous pages = RSS - Shared
Anonymous pages are eligible for swap
调优策略
硬件调优: 硬件选型
软件调优: 内核调优 /proc, /sys
应用调优
内核调优
1. 进程管理,CPU
2. 内存调优
3. I/O 调优
4. 文件系统
5. 网络子系统
调优思路
1. 查看各项性能指标,定位瓶颈
2. 调优
红帽官方提供一份文档 redhat 6 performance tuning guide 可以搜索到
进程间通信相关的调优
ipcs (interprocess communication facilities)
进程间通信管理命令
ipcs
ipcrm
shared memory
kernel.shmmni
Specifies the maximum number of shared memory segments
system-wide, default = 4096
kernel.shmall
Specifies the total amount of shared memory, in pages, that
can be used at one time on the system, default=2097152
This should be at least kernel.shammax/PAGE_SIZE
kernel.shmmax
Specifies the maximum size of a shared memory segment that
can be created
messages
kernel.msgmnb
Specifies the maximum number of bytes in single message
queue, default = 16384
kernel.msgmni
Specifies the maximum number of message queue identifiers,
default=16
kernel.msgmax
Specifies the maximum size of a message that can be passed
between processes
This memory cannot be swapped, default=8192