ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

Alexandru Nedelcu-4

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?


--
Alexandru Nedelcu
alexn.org

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

Viktor Klang
Hi Alexandru,

Could you also try putting the old version of jsr166 used for Scala 2.11 but under the java.util.concurrent package name and put it, using -Xbootclasspath on the classpath of Scala 2.12?

(Technically there's no single Java 8 FJP since it depends on vendor and version so perf will differ)

--
Cheers,


On Dec 27, 2016 6:01 PM, "Alexandru Nedelcu" <[hidden email]> wrote:

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?


--
Alexandru Nedelcu

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

Jason Zaugg
I suspect you are seeing the same degradation in benchmarks that was reported by the Akka team recently and is being investigated under https://issues.scala-lang.org/browse/SI-10083

The version of FJ in 2.11 used busy waiting more aggressively which helped out on benchmarks with more cores than tasks. However, this comes at the expense of other workloads on the process/machine, especially because the JVM doesn't let the the busy waiting signal its intention to the CPU with a Spin Loop hint. This tradeoff was found during testing of the FJ as it was integrated into parallel j.u.stream.Streams, and is discussed in JDK-8080623.

To restore the performance of the benchmark, you would need to implement your own ExecutionContext in terms of the jsr166 backport of ForkJoin. This would be a useful library to make available for others to use.

-jason

On Wed, 28 Dec 2016 at 03:23 Viktor Klang <[hidden email]> wrote:
Hi Alexandru,

Could you also try putting the old version of jsr166 used for Scala 2.11 but under the java.util.concurrent package name and put it, using -Xbootclasspath on the classpath of Scala 2.12?

(Technically there's no single Java 8 FJP since it depends on vendor and version so perf will differ)

--
Cheers,


On Dec 27, 2016 6:01 PM, "Alexandru Nedelcu" <[hidden email]> wrote:

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?


--
Alexandru Nedelcu

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

Viktor Klang
Thanks Jason!

Using -Xbootclasspath/p:jsr166.jar is a non-intrusive (code wise) fix for Scala 2.12—from a user PoV.

Worth mentioning is that the reason Scala used to ship with an embeddd version was due to us collaborating with Doug to create the more scalable version of FJ which then was introduced in Java 8, but at the time, with Java 7, it had a single external task submission queue which created a major bottleneck, so we couldn't use the Java 7 FJ. However, embedding the JSR166 code made it a burden to maintain—not to mention the issue of having duplication and incompatibility across ManagedBlocker, ForkJoinTask etc between java.util.concurrent* and scala.concurrent.forkjoin.*.

The more fair behavior of the Java 8 FJ implementation incidentally makes sense for scala.concurrent.ExecutionContext.global since it is a global pool it should most definitely not create fairness issues.


On Wed, Dec 28, 2016 at 12:12 AM, Jason Zaugg <[hidden email]> wrote:
I suspect you are seeing the same degradation in benchmarks that was reported by the Akka team recently and is being investigated under https://issues.scala-lang.org/browse/SI-10083

The version of FJ in 2.11 used busy waiting more aggressively which helped out on benchmarks with more cores than tasks. However, this comes at the expense of other workloads on the process/machine, especially because the JVM doesn't let the the busy waiting signal its intention to the CPU with a Spin Loop hint. This tradeoff was found during testing of the FJ as it was integrated into parallel j.u.stream.Streams, and is discussed in JDK-8080623.

To restore the performance of the benchmark, you would need to implement your own ExecutionContext in terms of the jsr166 backport of ForkJoin. This would be a useful library to make available for others to use.

-jason

On Wed, 28 Dec 2016 at 03:23 Viktor Klang <[hidden email]> wrote:
Hi Alexandru,

Could you also try putting the old version of jsr166 used for Scala 2.11 but under the java.util.concurrent package name and put it, using -Xbootclasspath on the classpath of Scala 2.12?

(Technically there's no single Java 8 FJP since it depends on vendor and version so perf will differ)

--
Cheers,


On Dec 27, 2016 6:01 PM, "Alexandru Nedelcu" <[hidden email]> wrote:

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?


--
Alexandru Nedelcu

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Cheers,

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

Alexandru Nedelcu-4
OK, it makes sense.

Thanks Victor and Jason for these details.

--
Alexandru Nedelcu
alexn.org



On Wed, Dec 28, 2016, at 01:39, Viktor Klang wrote:
Thanks Jason!

Using -Xbootclasspath/p:jsr166.jar is a non-intrusive (code wise) fix for Scala 2.12—from a user PoV.
Worth mentioning is that the reason Scala used to ship with an embeddd version was due to us collaborating with Doug to create the more scalable version of FJ which then was introduced in Java 8, but at the time, with Java 7, it had a single external task submission queue which created a major bottleneck, so we couldn't use the Java 7 FJ. However, embedding the JSR166 code made it a burden to maintain—not to mention the issue of having duplication and incompatibility across ManagedBlocker, ForkJoinTask etc between java.util.concurrent* and scala.concurrent.forkjoin.*.
The more fair behavior of the Java 8 FJ implementation incidentally makes sense for scala.concurrent.ExecutionContext.global since it is a global pool it should most definitely not create fairness issues.


On Wed, Dec 28, 2016 at 12:12 AM, Jason Zaugg <[hidden email]> wrote:
I suspect you are seeing the same degradation in benchmarks that was reported by the Akka team recently and is being investigated under https://issues.scala-lang.org/browse/SI-10083

The version of FJ in 2.11 used busy waiting more aggressively which helped out on benchmarks with more cores than tasks. However, this comes at the expense of other workloads on the process/machine, especially because the JVM doesn't let the the busy waiting signal its intention to the CPU with a Spin Loop hint. This tradeoff was found during testing of the FJ as it was integrated into parallel j.u.stream.Streams, and is discussed in JDK-8080623.

To restore the performance of the benchmark, you would need to implement your own ExecutionContext in terms of the jsr166 backport of ForkJoin. This would be a useful library to make available for others to use.


-jason


On Wed, 28 Dec 2016 at 03:23 Viktor Klang <[hidden email]> wrote:
Hi Alexandru,

Could you also try putting the old version of jsr166 used for Scala 2.11 but under the java.util.concurrent package name and put it, using -Xbootclasspath on the classpath of Scala 2.12?

(Technically there's no single Java 8 FJP since it depends on vendor and version so perf will differ)

--
Cheers,



On Dec 27, 2016 6:01 PM, "Alexandru Nedelcu" <[hidden email]> wrote:

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?





--
Alexandru Nedelcu



--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--

Cheers,



--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ForkJoinPool (ExecutionContext.Implicits.global) performance problem for Scala 2.12

Alexandru Nedelcu-4
FYI, I just published those files in a project to be available for my own immediate purposes, but it's synchronizing to Maven Central, so reusable:


Cheers,

--
Alexandru Nedelcu
alexn.org



On Wed, Dec 28, 2016, at 10:10, Alexandru Nedelcu wrote:
OK, it makes sense.

Thanks Victor and Jason for these details.

--
Alexandru Nedelcu
alexn.org



On Wed, Dec 28, 2016, at 01:39, Viktor Klang wrote:
Thanks Jason!

Using -Xbootclasspath/p:jsr166.jar is a non-intrusive (code wise) fix for Scala 2.12—from a user PoV.
Worth mentioning is that the reason Scala used to ship with an embeddd version was due to us collaborating with Doug to create the more scalable version of FJ which then was introduced in Java 8, but at the time, with Java 7, it had a single external task submission queue which created a major bottleneck, so we couldn't use the Java 7 FJ. However, embedding the JSR166 code made it a burden to maintain—not to mention the issue of having duplication and incompatibility across ManagedBlocker, ForkJoinTask etc between java.util.concurrent* and scala.concurrent.forkjoin.*.
The more fair behavior of the Java 8 FJ implementation incidentally makes sense for scala.concurrent.ExecutionContext.global since it is a global pool it should most definitely not create fairness issues.


On Wed, Dec 28, 2016 at 12:12 AM, Jason Zaugg <[hidden email]> wrote:
I suspect you are seeing the same degradation in benchmarks that was reported by the Akka team recently and is being investigated under https://issues.scala-lang.org/browse/SI-10083

The version of FJ in 2.11 used busy waiting more aggressively which helped out on benchmarks with more cores than tasks. However, this comes at the expense of other workloads on the process/machine, especially because the JVM doesn't let the the busy waiting signal its intention to the CPU with a Spin Loop hint. This tradeoff was found during testing of the FJ as it was integrated into parallel j.u.stream.Streams, and is discussed in JDK-8080623.

To restore the performance of the benchmark, you would need to implement your own ExecutionContext in terms of the jsr166 backport of ForkJoin. This would be a useful library to make available for others to use.


-jason


On Wed, 28 Dec 2016 at 03:23 Viktor Klang <[hidden email]> wrote:
Hi Alexandru,

Could you also try putting the old version of jsr166 used for Scala 2.11 but under the java.util.concurrent package name and put it, using -Xbootclasspath on the classpath of Scala 2.12?

(Technically there's no single Java 8 FJP since it depends on vendor and version so perf will differ)

--
Cheers,



On Dec 27, 2016 6:01 PM, "Alexandru Nedelcu" <[hidden email]> wrote:

Hi folks,


In Scala 2.11.8 the scala.concurrent.forkjoin.ForkJoinPool implementation is basically a fork of the JSR-166 implementation by Doug Lea, in order to provide support for older Java versions. However in Scala 2.12.x the implementation is now an alias for java.util.concurrent.ForkJoinPool, given of its availability in Java 8 and the Scala 2.12 requirement to have Java 8 as a target.

Unfortunately these 2 implementations are NOT the same, as the old Scala 2.11 implementation has better throughput in testing. And the difference is quite significant, I discovered a scenario in which the old implementation has twice the throughput. This was measured in a personal benchmark with JMH, for which I ported the old implementation to Scala 2.12, such that the only difference is the ForkJoinPool implementation used and nothing else.

To be clear, given that Scala's own global execution context is backed by this ForkJoinPool implementation, this affects most code using Scala's Future.

Is this a known problem?





--
Alexandru Nedelcu



--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--

Cheers,



--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...