With JDK 8 getting adopted vastly, I just want to give a high level view of hotspot memory management and the garbage collection techniques
Quickly, about Garbage Collection
Allocates objects to young generation
Promotes aged objects to old generation
Marking old generation objects
Recovering space by removing unreachable or
compacting live objects
Available collectors in Hotspot JVM
1.
Serial collector – for single processor and low
concurrent apps. -XX:+UseSerialGC.
2.
Parallel collector (or throughput collector) –
by default one on the server machines and medium to large scale apps. User -XX:+UseParallelGC.
Its multithreaded GC. Select this when
high throughput is needed.
3.
Mostly concurrent collector – performs most of
its work concurrently i.e. app is still running. Select this when faster
response times are needed. Since the app is running most of the time without
any pauses, means RTs are not impacted often by the pause times. There are 2
types of mostly concurrent collectors – Concurrent Mark & sweep, G1
(garbage first gc – a generational algo preferred for heaps more than 4GB).
*throughput is termed as time spent in
GC vs time spent in app. The more timespent is app means it has high amount of
resources to deliver.
* for generational GCs i.e. GCs with
young,eden,tenured generations, both young and major gcs do pause the app in all
types of GC algos. Minor/young gc happens very fast but a major gc happens when
there is no space for tenured objects, so it involves entire heap collections
i.e. identifying the reachable objects, marking and sweeping unreachable objs
and compacting the space to avoid fragmentation. So it pauses longer time. CMS, G1 tries to reduce this by doing them always in the background while app is running and pausing app only in certain times and for short duration
Tuning first steps:
Go with the defaults
-
serial for single core machine,
-
parallel for high throughput requirement i.e.
where App’s RTs can be acceptable at a bit higher sec that means giving longer
pause times i.e. less frequent pauses – overall, the app will be paused at
times but longer (means high throughput but breaches RT as it causes spike) and
the parallel GC does its work with multiple threads in parallel to each and
consumes CPU.
-
Mostly concurrent – on large apps, multi
processor machines, when RT is important. Since RT is important, the pause
times are less but may be frequent however, some part of the gc work like
marking can happen without any pause. Use CMS when heap when heap less than 4gb else use G1
- Once the default is selected, check whether the requirement is met else, try playing around increasing the XMX. Then try tuning the heap areas.
- Once the default is selected, check whether the requirement is met else, try playing around increasing the XMX. Then try tuning the heap areas.
Parallel Collector:
-
-XX:+UseParallelGC
-
both minor and major collections are executed in
parallel
-
number of threads is calculated as 5/8 * N
(number of HW threads) or can be set manually as -XX:ParallelGCThreads=
-
since higher number of threads causes
fragmentation when promoting objects from young to tenured spaces, try reducing
the number or increase tenured space to cope up with freagmentation.
-
parallel collector tunes automatically based on behavior.
So no need to specify generation sizes or granular level tunings. Behavior
is mentioned are max gc pause time, throughput, foot print of heap size.
-
Max pausetime target: -XX:MaxGCPauseMillis=
-
Throughput target: -XX:GCTimeRatio= i.e. it tries to spend 1/(1+N) percentage in
GC. By default is 99 i.e. i/(1+99) = 1% in GC , 99% leaving it to app
-
Footprint : is basically the –Xmx
-
parallel collector tries to meet max pausetime
first then throughput then footprint targets. It plays around growth/shrink percentages
of generations to achieve these targets.
Mostly concurrent collectors:
CMS: if the importance is faster RT i.e.
lesser pause times and can afford CPU
-
-XX:+UseConcMarkSweepGC.
-
reduces pause time by always keeping few threads
to concurrently (i.e. without pausing the threads) to mark the reachable &
unreachable objects and sweeping of unreachable objects but pauses during a
major gc during movement of references. It always tries to keep the tenured
space clean to avoid longer pauses or accumulation of objects. When it fails
keep up the tenured space free then a full collection happens with all the
threads paused (a failure case)
-
In CMS, the tenured and young can happen
independently as they have different threads running in concurrent to the
application.
-
CMS output is a bit different to other GC
outputs. -verbose:gc and -XX:+PrintGCDetails (-XX:+PrintAllGCDetails
to print more details). It has below output
CMS-initial-mark
indicates the start of the concurrent collection cycle
CMS-concurrent-mark indicates the end
of the concurrent marking phase
CMS-concurrent-preclean
CMS-remark
CMS-concurrent-sweep
marks the end of the conc sweeping phase
CMS-concurrent-reset
(getting ready for next collection)
-
CMS and Parallel does compactions for the whole
heap when there is no consecutive space available
G1: for large heaps and pause times
targets can be met at higher probability and also achieving high throughput
(i.e. more time given to app)
- heap will be partitioned into a set of equally
sized heap regions, each a contiguous range of virtual memory. the algo performs a
concurrent global marking phase to determine the live objects of the heap. After the marking phase completes, it collects the mostly empty regions first, to yield a large amount of
free space. So it is called Garbage-First.
-
This always works to reduce fragmentation by
compacting during collections.
-
G1 is beneficial when there is large amount of
live data i.e. 50% of heap, allocation rate changes, when there is long
collection time or if there is long compaction times
Default configuration on server class
machines
On server-class
machines, the following are selected by default if not specified otherwise.
Throughput
garbage collector
Initial
heap size of 1/64 of physical memory up to 1 GB
Maximum
heap size of 1/4 of physical memory up to 1 GB
Server
runtime compiler
And, to view the default configurations use java -XX:+PrintFlagsFinal
-version