Tuesday, October 18, 2016

JDK 8 - HOTSPOT JVM


With JDK 8 getting adopted vastly, I just want to give a high level view of hotspot memory management and the garbage collection techniques


Quickly, about Garbage Collection 
Allocates objects to young generation
Promotes aged objects to old generation
Marking old generation objects
Recovering space by removing unreachable or compacting live objects


Available collectors in Hotspot JVM
1.       Serial collector – for single processor and low concurrent apps. -XX:+UseSerialGC.

2.       Parallel collector (or throughput collector) – by default one on the server machines and medium to large scale apps. User -XX:+UseParallelGC. Its multithreaded GC. Select this when high throughput is needed.

3.       Mostly concurrent collector – performs most of its work concurrently i.e. app is still running. Select this when faster response times are needed. Since the app is running most of the time without any pauses, means RTs are not impacted often by the pause times. There are 2 types of mostly concurrent collectors – Concurrent Mark & sweep, G1 (garbage first gc – a generational algo preferred for heaps more than 4GB).

*throughput is termed as time spent in GC vs time spent in app. The more timespent is app means it has high amount of resources to deliver. 
* for generational GCs i.e. GCs with young,eden,tenured generations, both young and major gcs do pause the app in all types of GC algos. Minor/young gc happens very fast but a major gc happens when there is no space for tenured objects, so it involves entire heap collections i.e. identifying the reachable objects, marking and sweeping unreachable objs and compacting the space to avoid fragmentation. So it pauses longer time. CMS, G1 tries to reduce this by doing them always in the background while app is running and pausing app only in certain times and for short duration

Tuning first steps:
Go with the defaults
-          serial for single core machine,
-          parallel for high throughput requirement i.e. where App’s RTs can be acceptable at a bit higher sec that means giving longer pause times i.e. less frequent pauses – overall, the app will be paused at times but longer (means high throughput but breaches RT as it causes spike) and the parallel GC does its work with multiple threads in parallel to each and consumes CPU.
-          Mostly concurrent – on large apps, multi processor machines, when RT is important. Since RT is important, the pause times are less but may be frequent however, some part of the gc work like marking can happen without any pause. Use CMS when heap  when heap less than 4gb else use G1
-          Once the default is selected, check whether the requirement is met else, try playing around increasing the XMX. Then try tuning the heap areas.


Parallel Collector:
-          -XX:+UseParallelGC
-          both minor and major collections are executed in parallel
-          number of threads is calculated as 5/8 * N (number of HW threads) or can be set manually as -XX:ParallelGCThreads=
-          since higher number of threads causes fragmentation when promoting objects from young to tenured spaces, try reducing the number or increase tenured space to cope up with freagmentation.
-          parallel collector tunes automatically based on behavior. So no need to specify generation sizes or granular level tunings. Behavior is mentioned are max gc pause time, throughput, foot print of heap size.
-          Max pausetime target:          -XX:MaxGCPauseMillis=
-          Throughput target: -XX:GCTimeRatio=  i.e. it tries to spend 1/(1+N) percentage in GC. By default is 99 i.e. i/(1+99) = 1% in GC , 99% leaving it to app
-          Footprint : is basically the –Xmx
-          parallel collector tries to meet max pausetime first then throughput then footprint targets. It plays around growth/shrink percentages of generations to achieve these targets.

Mostly concurrent collectors:
CMS: if the importance is faster RT i.e. lesser pause times and can afford CPU
-          -XX:+UseConcMarkSweepGC.
-          reduces pause time by always keeping few threads to concurrently (i.e. without pausing the threads) to mark the reachable & unreachable objects and sweeping of unreachable objects but pauses during a major gc during movement of references. It always tries to keep the tenured space clean to avoid longer pauses or accumulation of objects. When it fails keep up the tenured space free then a full collection happens with all the threads paused (a failure case)
-          In CMS, the tenured and young can happen independently as they have different threads running in concurrent to the application.
-          CMS output is a bit different to other GC outputs. -verbose:gc and -XX:+PrintGCDetails (-XX:+PrintAllGCDetails to print more details). It has below output
       CMS-initial-mark indicates the start of the concurrent collection cycle
                                 CMS-concurrent-mark indicates the end of the concurrent marking phase
CMS-concurrent-preclean
                CMS-remark
                                                                                CMS-concurrent-sweep marks the end of the conc sweeping phase
                                CMS-concurrent-reset (getting ready for next collection)
-          CMS and Parallel does compactions for the whole heap when there is no consecutive space available

G1: for large heaps and pause times targets can be met at higher probability and also achieving high throughput (i.e. more time given to app)
-       heap will be partitioned into a set of equally sized heap regions, each a contiguous range of virtual memory. the algo performs a concurrent global marking phase to determine the live objects of the heap. After the marking phase completes, it collects the mostly empty regions first, to yield a large amount of free space. So it is called Garbage-First. 
-          This always works to reduce fragmentation by compacting during collections.
-          G1 is beneficial when there is large amount of live data i.e. 50% of heap, allocation rate changes, when there is long collection time or if there is long compaction times

Default configuration on server class machines
On server-class machines, the following are selected by default if not specified otherwise.
Throughput garbage collector
Initial heap size of 1/64 of physical memory up to 1 GB
Maximum heap size of 1/4 of physical memory up to 1 GB
Server runtime compiler


And, to view the default configurations use java -XX:+PrintFlagsFinal -version  


Sunday, October 2, 2016

JMeter Solr Banana


Want to give some colors to JMeter ? We all know JMeter is a great tool and helps load testing the apps.. how about bringing two more awesome tools on to the table - Solr and Banana

With Solr, we can store/index/search large amount of data and Banana is a pretty tool with lot of latest HTML and javascript capabilities to draft some cool graphs to show the trends.

So, how we can use these tools ..

we can do a lot in fact, but to start with,

- Load test results to display the runtime stats - Response times/Transaction throughput/Bytes and the list goes on..
- Monitoring system resources like Memory/CPU/Disk/Load, JVM garbage collection activity etc.,

So, its all sounds interesting ? and do you think we can really build a good monitoring tool ? Well,  below are couple of dashboards

Load Test Report:



System Resources


These are few sample dashboards. The actual setup does a lot more.. the remote agents collect the data and pushes them to solr and the dashboard refreshes the stats.