# Rethinking filter

11 messages
Open this post in threaded view
|

## Rethinking filter

 We have a number of open tickets which all want that   for (x <- xs if guard) body should expand to   for (x <- xs if (guard) body rather than the current expansion   for (x <- xs filter (x => guard)) body The motivations are (1) you might want to write a guard that depends on side effects in body. (2) you might want to avoid constructing a secondary list or other data structure containing the filtered elements. One problem is how to integrate this with for-yield collections. It would be unacceptable to have different behaviors of guards depending on whether we are in a for-loop or a for-yield. The other problem is how to do this for all kinds of generators, collections and others, in a uniform way. Unlike in Haskell, there's no unit element that we can use in a for-expression. So the only alternative seems to be to make use of filterForeach, filterMap, filterFlatMap operations in the generator classes. That is, for-expression expansion would still work as described by Jorge: >>>>   for (x <- c) f(x) simply translates to:   c.foreach(x => f(x)) Likewise:   for (x <- c) yield f(x) simply translates to:   c.map(x => f(x)) Something with a guard:   for (x <- c if p(x)) yield f(x) translates to:   c.filter(x => p(x)).map(x => f(x)) And a nested for:   for (x1 <- c1; x2 <- c2) yield f(x1, x2) translates to:   c1.flatMap(x1 => c2.map(x2 => f(x1, x2))) <<< But, there would be a post-processing step, where   c1.filter(p).map(f)   is rewritten to c1.filterMap(p, f)   c1.filter(p).flatMap(f)   is rewritten to c1.filterFlatMap(p, f)   c1.filter(p).foreach(f)   is rewritten to c1.filterForeach(p, f) These rewritings would only happen for filters/maps/flatMaps generated in for-expressions and only if the c1 type had defined the filterXXX operations. We don't want to demand that for every generator type, because that would make it too complicated to define generators. But collections would uniformly have those operations, and implement them with lazy filters. For instance, in class TraversableLike:   def filterForeach[U](p: A => Boolean, f: A => U): Unit =      this foreach (x => if (p(x)) f(x))   def filterMap[That](p: A => Boolean, f: A => B)(implicit bf: BuilderFactory[B, That, Repr]): Repr = {      val b = bf(repr)      this foreach (x => if (p(x)) b += f(x))      b.result   } To deal with multiple filters, we should also extend the rewriting rules to   c1.filter(p).filterMap(q, f)   is rewritten to c1.filterMap(p && q, f)   c1.filter(p).filterFlatMap(q, f)   is rewritten to c1.filterFlapMap(p && q, f)   c1.filter(p).filterForeach(f)   is rewritten to c1.filterForeach(p && q, f) and apply rewritings as long as possible. Question: Should we do this? Advantages I can see: (1) More efficient operations through some amount of deforestation. (2) Less surprises for users expecting imperative behavior. Disadvantages: (1) It complicates the rules, and possibly the behavior because now filterMap, etc would decide whether filter was strict or lazy. (2) It would break some code. Btw, one could also consider to go further with deforestation, for instance with rules like this:   c1.flatMap(x => f(x).map(g))  rewrites to    c1.mapFlatMap(g, f) Cheers  -- Martin
Open this post in threaded view
|

## Re: Rethinking filter

 +1 I'm looking for this so long time It would make for expression more powerful and useful, and encourage us to use for expression rather than map,filter,... etc. Martin Odersky wrote We have a number of open tickets which all want that   for (x <- xs if guard) body should expand to   for (x <- xs if (guard) body rather than the current expansion   for (x <- xs filter (x => guard)) body The motivations are (1) you might want to write a guard that depends on side effects in body. (2) you might want to avoid constructing a secondary list or other data structure containing the filtered elements. One problem is how to integrate this with for-yield collections. It would be unacceptable to have different behaviors of guards depending on whether we are in a for-loop or a for-yield. The other problem is how to do this for all kinds of generators, collections and others, in a uniform way. Unlike in Haskell, there's no unit element that we can use in a for-expression. So the only alternative seems to be to make use of filterForeach, filterMap, filterFlatMap operations in the generator classes. That is, for-expression expansion would still work as described by Jorge: >>>>   for (x <- c) f(x) simply translates to:   c.foreach(x => f(x)) Likewise:   for (x <- c) yield f(x) simply translates to:   c.map(x => f(x)) Something with a guard:   for (x <- c if p(x)) yield f(x) translates to:   c.filter(x => p(x)).map(x => f(x)) And a nested for:   for (x1 <- c1; x2 <- c2) yield f(x1, x2) translates to:   c1.flatMap(x1 => c2.map(x2 => f(x1, x2))) <<< But, there would be a post-processing step, where   c1.filter(p).map(f)   is rewritten to c1.filterMap(p, f)   c1.filter(p).flatMap(f)   is rewritten to c1.filterFlatMap(p, f)   c1.filter(p).foreach(f)   is rewritten to c1.filterForeach(p, f) These rewritings would only happen for filters/maps/flatMaps generated in for-expressions and only if the c1 type had defined the filterXXX operations. We don't want to demand that for every generator type, because that would make it too complicated to define generators. But collections would uniformly have those operations, and implement them with lazy filters. For instance, in class TraversableLike:   def filterForeach[U](p: A => Boolean, f: A => U): Unit =      this foreach (x => if (p(x)) f(x))   def filterMap[That](p: A => Boolean, f: A => B)(implicit bf: BuilderFactory[B, That, Repr]): Repr = {      val b = bf(repr)      this foreach (x => if (p(x)) b += f(x))      b.result   } To deal with multiple filters, we should also extend the rewriting rules to   c1.filter(p).filterMap(q, f)   is rewritten to c1.filterMap(p && q, f)   c1.filter(p).filterFlatMap(q, f)   is rewritten to c1.filterFlapMap(p && q, f)   c1.filter(p).filterForeach(f)   is rewritten to c1.filterForeach(p && q, f) and apply rewritings as long as possible. Question: Should we do this? Advantages I can see: (1) More efficient operations through some amount of deforestation. (2) Less surprises for users expecting imperative behavior. Disadvantages: (1) It complicates the rules, and possibly the behavior because now filterMap, etc would decide whether filter was strict or lazy. (2) It would break some code. Btw, one could also consider to go further with deforestation, for instance with rules like this:   c1.flatMap(x => f(x).map(g))  rewrites to    c1.mapFlatMap(g, f) Cheers  -- Martin
Open this post in threaded view
|

## Re: Rethinking filter

 In reply to this post by Martin Odersky martin odersky wrote: > We have a number of open tickets which all want that > >   for (x <- xs if guard) body > > should expand to > >   for (x <- xs if (guard) body (I assume this should read "for (x <- xs) if (guard) body") > > rather than the current expansion > >   for (x <- xs filter (x => guard)) body > > The motivations are > > (1) you might want to write a guard that depends on side effects in body. > (2) you might want to avoid constructing a secondary list or other > data structure containing the filtered elements. I'm against this suggestion if these are the only reasons for change. It's not hard to manually move the guard inside the for-body. Regarding the yield-case, the only motivation I can see is performance, but it's not worth complicating the rewrite rules for that. It's better to put more work into the optimizer in that case. /Jesper Nordenberg
Open this post in threaded view
|

## Re: Rethinking filter

 In reply to this post by Martin Odersky Hi Martin, On Thu, Oct 15, 2009 at 13:31, martin odersky wrote: We have a number of open tickets which all want that for (x <- xs if guard) bodyshould expand to  for (x <- xs if (guard) bodyrather than the current expansion for (x <- xs filter (x => guard)) bodyThe motivations are(1) you might want to write a guard that depends on side effects in body. (2) you might want to avoid constructing a secondary list or otherdata structure containing the filtered elements.One problem is how to integrate this with for-yield collections. Itwould be unacceptable to have different behaviors of guards depending on whether we are in a for-loop or a for-yield.The other problem is how to do this for all kinds of generators,collections and others, in a uniform way.Unlike in Haskell, there's no unit element that we can use in a for-expression. So the only alternative seems to be to make use offilterForeach, filterMap, filterFlatMap operations in the generatorclasses. That is, for-expression expansion would still work asdescribed by Jorge: >>>> for (x <- c) f(x)simply translates to: c.foreach(x => f(x))Likewise: for (x <- c) yield f(x)simply translates to: c.map(x => f(x)) Something with a guard: for (x <- c if p(x)) yield f(x)translates to: c.filter(x => p(x)).map(x => f(x))And a nested for: for (x1 <- c1; x2 <- c2) yield f(x1, x2) translates to: c1.flatMap(x1 => c2.map(x2 => f(x1, x2)))<< Boolean, f: A => U): Unit =     this foreach (x => if (p(x)) f(x)) def filterMap[That](p: A => Boolean, f: A => B)(implicit bf:BuilderFactory[B, That, Repr]): Repr = {    val b = bf(repr)    this foreach (x => if (p(x)) b += f(x))     b.result }To deal with multiple filters, we should also extend the rewriting rules to c1.filter(p).filterMap(q, f)   is rewritten to c1.filterMap(p && q, f) c1.filter(p).filterFlatMap(q, f)   is rewritten to c1.filterFlapMap(p && q, f)  c1.filter(p).filterForeach(f)   is rewritten to c1.filterForeach(p && q, f)and apply rewritings as long as possible.Question: Should we do this? Advantages I can see: (1) More efficientoperations through some amount of deforestation. (2) Less surprises for users expecting imperative behavior.Disadvantages: (1) It complicates the rules, and possibly the behaviorbecause now filterMap, etc would decide whether filter was strict orlazy. (2) It would break some code. Btw, one could also consider to go further with deforestation, forinstance with rules like this: c1.flatMap(x => f(x).map(g))  rewrites to    c1.mapFlatMap(g, f)Cheers  -- Martin   It is funny I thought about (some of) these things (but not very deeply, I should say) quite a long ago and my resolution was that Scala will be able in the future to optimize some common cases and eliminate intermediate structures. Not sure if this sheds any light though... --  __~O-\ <,       Christos KK Loverdos(*)/ (*)      http://ckkloverdos.com
Open this post in threaded view
|

## Re: Re: Rethinking filter

 In reply to this post by Jesper Nordenberg On Thu, Oct 15, 2009 at 1:47 PM, Jesper Nordenberg wrote: The motivations are (1) you might want to write a guard that depends on side effects in body. (2) you might want to avoid constructing a secondary list or other data structure containing the filtered elements. I'm against this suggestion if these are the only reasons for change. It's not hard to manually move the guard inside the for-body. Regarding the yield-case, the only motivation I can see is performance, but it's not worth complicating the rewrite rules for that. It's better to put more work into the optimizer in that case. The optimizer needs to finds out that 'filter' is side-effect free, that map builds the exact same collection type, that the predicate and mapping function are again side-effect free and can be reordered... Maybe when an effect type-system is in place this should be done, but I wouldn't hold my breath. If it was only for efficiency, I'd agree with you, but having guards that rely on side-effects in the mapping function is more intuitive for most people. The current behavior is a source of subtle bugs. And no matter who does the rewrite, the optimizer or the 'rewrite' rules, some complexity (and arguably, a lot more if we rely on the optimizer) will appear somewhere. Speccing it allows the programmer to rely on it, and actually simplifies the implementation. The added complexity in the specification doesn't seem that great to me, so I'm in favor of this change. cheers, iulian  /Jesper Nordenberg -- « Je déteste la montagne, ça cache le paysage »Alphonse Allais
Open this post in threaded view
|

## Re: Rethinking filter

 Iulian Dragos wrote: > The optimizer needs to finds out that 'filter' is side-effect free, that > map builds the exact same collection type, that the predicate and > mapping function are again side-effect free and can be reordered... > Maybe when an effect type-system is in place this should be done, but I > wouldn't hold my breath. Yes, an effect tracking system is required to make this "stream fusion" work properly (although it doesn't have to be integrated with the type system). > If it was only for efficiency, I'd agree with you, but having guards > that rely on side-effects in the mapping function is more intuitive for > most people. The current behavior is a source of subtle bugs. And no > matter who does the rewrite, the optimizer or the 'rewrite' rules, some > complexity (and arguably, a lot more if we rely on the optimizer) will > appear somewhere. Speccing it allows the programmer to rely on it, and > actually simplifies the implementation. The added complexity in the > specification doesn't seem that great to me, so I'm in favor of this change. Well, I think the key question is: are functions passed to filter, map and flatMap allowed to have side effects? And, if so, what restrictions apply and what properties can be relied upon? If we start by specifying this, I think the implementations details will fall out naturally. /Jesper Nordenberg
Open this post in threaded view
|

## Re: Re: Rethinking filter

 In reply to this post by Jesper Nordenberg >>>>> "Jesper" == Jesper Nordenberg <[hidden email]> writes:  >> The motivations are  >> (1) you might want to write a guard that depends on side effects in  >> body.  (2) you might want to avoid constructing a secondary list or  >> other data structure containing the filtered elements.  Jesper> I'm against this suggestion if these are the only reasons for  Jesper> change.  It's not hard to manually move the guard inside the  Jesper> for-body. Agree. (1) doesn't seem important enough to do for 2.8, given everything else that's going on, especially given that only imperative code (and rather tricky imperative code, too) benefits from the additional complication. As for (2), I'd point out that it's not hard to add ".view" to avoid constructing intermediates. -- Seth Tisue @ Northwestern University / http://tisue.netlead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Open this post in threaded view
|

## Re: Re: Rethinking filter

 I agree with Seth and Jesper.I rather like the simplicity of the current transformation. The "combine if it looks like this AND this method is defined" rule seems overly complicated and a can of worms. Would it apply the filterXXX methods based on the static type or the run-time type? How "deep" does the deforestation go? That said, I haven't seen the open tickets regarding this issue. Which tickets are they? Perhaps they contain compelling arguments for change.--jOn Thu, Oct 15, 2009 at 6:47 AM, Seth Tisue wrote: >>>>> "Jesper" == Jesper Nordenberg <[hidden email]> writes:  >> The motivations are  >> (1) you might want to write a guard that depends on side effects in  >> body.  (2) you might want to avoid constructing a secondary list or  >> other data structure containing the filtered elements.  Jesper> I'm against this suggestion if these are the only reasons for  Jesper> change.  It's not hard to manually move the guard inside the  Jesper> for-body. Agree. (1) doesn't seem important enough to do for 2.8, given everything else that's going on, especially given that only imperative code (and rather tricky imperative code, too) benefits from the additional complication. As for (2), I'd point out that it's not hard to add ".view" to avoid constructing intermediates. -- Seth Tisue @ Northwestern University / http://tisue.net lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Open this post in threaded view
|

## Re: Rethinking filter

 In reply to this post by Eastsun Eastsun wrote: > +1 > It would make for expression more powerful and useful, and encourage us to > use for expression rather than map,filter,... etc. +1 filterMap is an important performance win.
Open this post in threaded view
|

## Re: Re: Rethinking filter

 I have become convinced that filter in for comprehensions needs to be lazy. Why? Consider this program fragment:   var limit = 2   for (i <- List(0, 1, 2, 3) if i <= limit; j <- 0 until limit) yield { limit = 4; (i, j) } What does it return?   List((0,0), (0,1), (1,0), (1,1), (1,2), (1,3), (2,0), (2,1), (2,2), (2,3)) So you see that the first index is computed with the previously given limit. It is not affected by the assignment in the yield. No surprise because we know filter is strict. However, the second index does go up to 3 after the first loop iteration, so it does take the changed limit into account. If you look at the translation it's clear why it has to be so:  List(0, 1, 2, 3).filter(_ <= limit).flatMap(i => (0 unitl limit).map(j => {     limit = 4;     (i, j)   })) You see that the second map is in the closure passed to the first flatMap, so its evaluation gets interleaved with the flatMap iteration. But the filter is in the generator of that flatMap, so gets computed before the flatMap is even started. But for someone who has not 100% internalized that translation (and who has?), it's very weird indeed that _some_ parts of the for comprehension are evaluated before the body is started, but other parts are interleaved. So I think we need a better translation of `for'. I have found a scheme which is different from filterMap, filterFlatMap, filterForeach and which looks simpler. We postulate a new method withFilter which like filter takes a predicate as argument. The `for' translation is exactly as it is now, but using withFilter instead of filter. This means that one has the guarantee that the result of withFilter is further taken apart using one of map, flatMap, forEach, or another withFilter. For instance, the for expression above would translate to:   List(0, 1, 2, 3).withFilter(_ <= limit).flatMap(i => (0 unitl limit).map(j => {     limit = 4;     (i, j)   })) It is recommended (but not enforced) that withFilter is lazy. That is, it should return a proxy object with methods for map, flatMap, foreach, and withFilter. Each of these would combine a guard with the original operation. I append an implementation of withFilter for class TraversableLike which would be inherited by all collection classes. To ensure a smooth transition, I propose that if the `for' translation scheme cannot find the `withFilter'  method, it uses `filter' instead, but emits a deprecated warning. I believe this proposal has a lot going for it: It's not significantly more complex than the previous translation, it leads to more intuitive behavior, and it can be a win in performance because it reduces the number of intermediate structures that are built. The only possible downside is that it might break some code. But I doubt that would be very likely, as code that relies on a filter seeing a snapshot of the state before the loop body is executed seems rather weird. I think this also settles the case for Range. Range should be strict, and we do not need an exception for its filter anymore. Cheers  -- Martin Here's the proposed implementation of withFilter in class TraversableLike:   def withFilter(p: A => Boolean) = new WithFilter(p)   class WithFilter(p: A => Boolean) {     def map[B, That](f: A => B)(implicit bf: BuilderFactory[B, That, Repr]): That = {       val b = bf(repr)       for (x <- self)         if (p(x)) b += f(x)       b.result     }     def flatMap[B, That](f: A => Traversable[B])(implicit bf: BuilderFactory[B, That, Repr]): That = {       val b = bf(repr)       for (x <- self)         if (p(x)) b ++= f(x)       b.result     }     def foreach[U](f: A => U): Unit =       for (x <- self)         if (p(x)) f(x)     def withFilter(q: A => Boolean): WithFilter =       new WithFilter(x => p(x) && q(x))   }
Open this post in threaded view
|

## Re: Re: Rethinking filter

 I like this much better than the previous proposals. It doesn't change "filter" in unexpected ways, makes guards more intuitive, it is rather simple and has performance benefits.   And I have no queasy feelings about it, which I had about the other proposals, in which I found myself grudgingly agreeing that there was a problem, and yet disliked the proposed alternative.   So, +1. On Fri, Oct 16, 2009 at 7:24 AM, martin odersky wrote: I have become convinced that filter in for comprehensions needs to be lazy. Why?Consider this program fragment:  var limit = 2 for (i <- List(0, 1, 2, 3) if i <= limit; j <- 0 until limit) yield{ limit = 4; (i, j) }What does it return? List((0,0), (0,1), (1,0), (1,1), (1,2), (1,3), (2,0), (2,1), (2,2), (2,3)) So you see that the first index is computed with the previously givenlimit. It is not affected by the assignment in the yield. No surprisebecause we know filter is strict. However, the second index does go up to 3 after the first loop iteration, so it does take the changed limitinto account. If you look at the translation it's clear why it has tobe so: List(0, 1, 2, 3).filter(_ <= limit).flatMap(i => (0 unitl limit).map(j => {    limit = 4;   (i, j) }))You see that the second map is in the closure passed to the firstflatMap, so its evaluation gets interleaved with the flatMapiteration. But the filter is in the generator of that flatMap, so gets computed before the flatMap is even started.But for someone who has not 100% internalized that translation (andwho has?), it's very weird indeed that _some_ parts of the forcomprehension are evaluated before the body is started, but other parts are interleaved.So I think we need a better translation of `for'. I have found ascheme which is different from filterMap, filterFlatMap, filterForeachand which looks simpler. We postulate a new method withFilter which like filter takes a predicate as argument. The `for' translation isexactly as it is now, but using withFilter instead of filter. Thismeans that one has the guarantee that the result of withFilter isfurther taken apart using one of map, flatMap, forEach, or another withFilter. For instance, the for expression above would translate to: List(0, 1, 2, 3).withFilter(_ <= limit).flatMap(i => (0 unitllimit).map(j => {   limit = 4;   (i, j) }))It is recommended (but not enforced) that withFilter is lazy. That is, it should return a proxy object with methods for map, flatMap,foreach, and withFilter. Each of these would combine a guard with theoriginal operation. I append an implementation of withFilter for classTraversableLike which would be inherited by all collection classes. To ensure a smooth transition, I propose that if the `for' translationscheme cannot find the `withFilter'  method, it uses `filter' instead,but emits a deprecated warning.I believe this proposal has a lot going for it: It's not significantly more complex than the previous translation, it leads to more intuitivebehavior, and it can be a win in performance because it reduces thenumber of intermediate structures that are built.The only possible downside is that it might break some code. But I doubt that would be very likely, as code that relies on a filterseeing a snapshot of the state before the loop body is executed seemsrather weird.I think this also settles the case for Range. Range should be strict, and we do not need an exception for its filter anymore.Cheers -- MartinHere's the proposed implementation of withFilter in class TraversableLike: def withFilter(p: A => Boolean) = new WithFilter(p)  class WithFilter(p: A => Boolean) {   def map[B, That](f: A => B)(implicit bf: BuilderFactory[B, That,Repr]): That = {     val b = bf(repr)     for (x <- self)        if (p(x)) b += f(x)     b.result   }   def flatMap[B, That](f: A => Traversable[B])(implicit bf:BuilderFactory[B, That, Repr]): That = {     val b = bf(repr)     for (x <- self)        if (p(x)) b ++= f(x)     b.result   }   def foreach[U](f: A => U): Unit =     for (x <- self)        if (p(x)) f(x)   def withFilter(q: A => Boolean): WithFilter =     new WithFilter(x => p(x) && q(x)) } -- Daniel C. SobralSomething I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.