Understand G1GC log

If you are using larger heap size or if your current GC algorithm not efferent to you then you may need to think in to switch for G1GC. Since this is not intended to help you on choosing GC algorithm I will keep that for different article. My intention of this article is to explain to understand G1GC log. Especially if you used to use some tool like logstash or splunk you may need to change your queries to process new type of log.

How G1GC works

Before go in to that lets try to understand what is the uniqueness of G1GC and how it works.

This is ideal for multiple processor and large heap environments. G1 Collector logically divide heap in to equally sized multiple regions. Since that G1 has freedom to adjust collection process on the fly based on various factors such as available memory and processes.

Each region further divides in to smaller sections call cards. (Currently specification is 512 byte per card) each of these cards has single byte space to keep reference to global card table. This is work as kind of a flag to understand which cards are being modified by mutator threads. Also each region maintains subset of its card call remembered set.
These remembered set a guarded by write barrier. This means whatever mutator thread comes to modify reference, must go through a set of code block. As an example consider we have regions as X , Y and Z. also we have reference from Y and Z to X (incoming references) in this case region X’s remembered set will recorded about card (X’s cards) where it has incoming references from Y and Z

I have explained up to here because we need to discuss about this remembered set (RS) when explain the GC log. I am stopping here and I will write other article to explain how exactly G1GC work. I am not going to mixup that in here as intention of this article not to discuss that.

Main thing you need to understand is G1GC is not like parallel GC. There is no FULL GC concept on G1. (that does not guaranteed it’s never going to have) Therefore calculating GC duration is really tricky. In parallel GC process FULL GC duration is always matters. Because that is call stop the world scenario. Since no process take place during there, your performance can drop. Also if your application maintains some queue with external interfaces then those queues can get filled and eventually lead for out of memory. Just think about cleaning smaller room vs larger room. When you clean smaller room you need smaller time and for larger room you need more time. So parallel GC will not efficient on larger heap sizes. In G1GC technically you should not have full GC. So GC pause time will not be a key factor on performance.

Inside log

Lets get sample GC log.

2015-08-12T10:48:00.827-0400: 2518335.564: [GC pause (young), 0.0758520 secs]
   [Parallel Time: 22.8 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 2518335564.5, Avg: 2518335564.5, Max: 2518335564.6, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 11.6, Avg: 13.4, Max: 15.3, Diff: 3.7, Sum: 53.6]
      [Update RS (ms): Min: 1.5, Avg: 3.4, Max: 4.8, Diff: 3.3, Sum: 13.5]
         [Processed Buffers: Min: 117, Avg: 163.5, Max: 192, Diff: 75, Sum: 654]
      [Scan RS (ms): Min: 0.7, Avg: 0.8, Max: 0.8, Diff: 0.2, Sum: 3.1]
      [Object Copy (ms): Min: 4.6, Avg: 4.9, Max: 5.3, Diff: 0.7, Sum: 19.6]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
      [GC Worker Total (ms): Min: 22.5, Avg: 22.5, Max: 22.6, Diff: 0.1, Sum: 90.2]
      [GC Worker End (ms): Min: 2518335587.0, Avg: 2518335587.1, Max: 2518335587.1, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 0.7 ms]
   [Other: 52.4 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 48.6 ms]
      [Ref Enq: 0.1 ms]
      [Free CSet: 2.4 ms]
   [Eden: 2644.0M(2644.0M)->0.0B(2648.0M) Survivors: 34.0M->32.0M Heap: 4341.8M(5120.0M)->1695.9M(5120.0M)]
 [Times: user=0.14 sys=0.00, real=0.07 secs] 

This is what you can see when JVM promote object from one region to other. Since those are live object this is call “evacuation process” and this considered as stop-the-world scenario. All thread will be halt and so that all application processes also.
Let’s get line by line

2015-08-12T10:48:00.827-0400: 2518335.564:

This first set 2015-08-12T10:48:00.827-0400 means the date time stamp where this GC pause fired. Last part means 2518335.564 indicate that this GC fired after 2518335.564 of application start. (JVM start) 2518335.564 divide by 3600 (hours) 24 (days) 2518335.564/(3600*24) results is 29 days after

[GC pause (young), 0.0758520 secs]

GC pause This means this is GC pause (copy LIVE object from one region to other) (young) means this pause happen in young gen area. This evacuation process taken 0.0758520 seconds
Sometime this can looks like

 [GC pause (mixed), 0.3905380 secs] 

this mean this evacuation effected in more than 1 area. Means may be across all young gen and old gen

[Parallel Time: 22.8 ms, GC Workers: 4]

Above line indicate that 4 GC worker thread spends 22.8 milliseconds (total) to this evacuation process. Refer the next line

[GC Worker Start (ms): Min: 2518335564.5, Avg: 2518335564.5, Max: 2518335564.6, Diff: 0.1]

This indicate when is each thread start in millisecond. In given log it shows as first avg and last. In also diff in 0.1 millisecond. This means first thread out of 4 started in 2518335564.5th millisecond from JVM start. Compare this number with 2518335.564 (first line) which is the time where GC pause started. 2518335564.5 (s) – 2518335.564 (ms) means first worker thread started just after 0.5 millisecond after GC triggered. And 4th thread out of 4 worker thread started on 2518335564.6th millisecond that means 0.1 millisecond after first thread. However total 22.8 ms taken by these 4 threads to complete evacuation process.

[Ext Root Scanning (ms): Min: 11.6, Avg: 13.4, Max: 15.3, Diff: 3.7, Sum: 53.6]

This means total 53.6 milliseconds taken to scan root which are not located in heap like stack, registers. Again min max and avg means time taken by single thread

[Update RS (ms): Min: 1.5, Avg: 3.4, Max: 4.8, Diff: 3.3, Sum: 13.5]

This is the time that threads are spend to update remembered set. (RS which is explained at start of this article)

[Processed Buffers: Min: 117, Avg: 163.5, Max: 192, Diff: 75, Sum: 654]

References which we discussed on remembered set (RS) are keeping in a buffer call update buffer. This is the time taken to update the buffers. Also this will update RS across all regions. (it has taken comparatively larger time)

[Scan RS (ms): Min: 0.7, Avg: 0.8, Max: 0.8, Diff: 0.2, Sum: 3.1]

When explain heap region and I have explained space call cards. This time spend to scan remembers set of all regions. Actually this process scan cards which used as RS. (1 region as multiple cards and sub set of cards call remembered set. Eventually remembered set means card) this phase scan all cards and look for references pointed into regions.

[Object Copy (ms): Min: 4.6, Avg: 4.9, Max: 5.3, Diff: 0.7, Sum: 19.6]

This is the actual time spent by each thread to copy live object from one region to other

[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]

Once thread finish own work go for termination. But before that each thread check work queue of other threads. If there is any work found it tries to steal that and process then go for termination. This is the total time spend by each thread finally terminate from initial terminate starts.

[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]

If there any time which is not captured from above main phases

[GC Worker End (ms): Min: 2518335587.0, Avg: 2518335587.1, Max: 2518335587.1, Diff: 0.1]

Time taken to end all threads. In this case all threads are terminated in .1 milliseconds. Also this contains timestamp when thread ends (same as start)

[GC Worker Total (ms): Min: 22.5, Avg: 22.5, Max: 22.6, Diff: 0.1, Sum: 90.2]

This is total time spend by GC worker threads during parallel processes. (do not include sub processes) Sum(53.6+13.5+3.1+19.6+0.3+0.1)=90.2
Take a note that these threads are working parallel way. That means this entire 90.2 is not pause time. This is total time taken by all threads. (sum)

[Code Root Fixup: 0.0 ms]

Time taken to update code root to correct location. (this is sequential task)

[Clear CT: 0.7 ms]

Time taken to clear card table (refer RS explanation) (this is sequential task)

[Other: 52.4 ms]

Time take to below tasks

[Choose CSet: 0.0 ms]

Time spend for identify collection set (this is sequential task)

[Ref Proc: 48.6 ms]

Time spent to process reference objects

[Ref Enq: 0.1 ms]

References with dead referents are added to the pending list. This is the time spent for that

[Free CSet: 2.4 ms]

Time spend to free and release collection structures.

[Eden: 2644.0M(2644.0M)->0.0B(2648.0M) Survivors: 34.0M->32.0M Heap: 4341.8M(5120.0M)->1695.9M(5120.0M)]

This line gives the details on the heap size changes in GC. This shows that Eden had the occupancy of 2644.0MB out of 2644.0MB before GC and after the collection, its occupancy got reduced to 0 since everything is evacuated/promoted from Eden during a collection, and its target size grew to 2648.0MB This value is the target size of the Eden.
Similarly, Survivors had the occupancy of 34MB bytes and it reduced to 32MB after the collection. The total heap occupancy 4341.8M out of total 5120MB and it became 1695.9MB after collection.

[Times: user=0.14 sys=0.00, real=0.07 secs]

Duration of the GC event, measured in under below categories:

user – Total CPU time that was consumed by Garbage Collector threads during this collection
sys – Time spent in OS calls or waiting for system event such as IO. This can go high when OS is busy with other woks
real – Clock time for which your application was stopped. This can consider as total GC pause time. But we need all these 3 values in order to come in to conclusion.

Ideally user + sys time should be less than real time. But in given sample it is not. Reason is this time collected from the JVM, where multiple GC threads are configured on a multi-core/multi-processors server. As multiple threads execute GC in parallel total load is shared so actual clock time (‘real’) is much less than total CPU time (‘user’ + ‘sys’). this real is the “real” stop the world

I hope now you know enough information to understand G1GC log and write own log processing queries on that.

Share this on your world...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someonePrint this page

Leave a Reply