http://stackoverflow.com/questions/3400734/package-objects
http://www.naildrivin5.com/scalatour/wiki_pages/PackageObjects
Normally you would put your package object in a separate file called package.scala
in the package that it corresponds to. You can also use the nested package syntax but that is quite unusual.
The main use case for package objects is when you need definitions in various places inside your package as well as outside the package when you use the API defined by the package. Here is an example:
package foo
package object bar {
// package wide constants:
def BarVersionString = "1.0"
// or type aliases
type StringMap[+T] = Map[String,T]
// can be used to emulate a package wide import
// especially useful when wrapping a Java API
type DateTime = org.joda.time.DateTime
type JList[T] = java.util.List[T]
// Define implicits needed to effectively use your API:
implicit def a2b(a: A): B = // ...
}
One additional thing to note is that package objects are objects. Among other things, this means you can build them up from traits, using mix-in inheritance. Moritz's example could be written as
package object bar extends Versioning
with JodaAliases
with JavaAliases {
// package wide constants:
override val version = "1.0"
// or type aliases
type StringMap[+T] = Map[String,T]
// Define implicits needed to effectively use your API:
implicit def a2b(a: A): B = // ...
}
Here Versioning is an abstract trait, which says that the package object must have a "version" method, while JodaAliases and JavaAliases are concrete traits containing handy type aliases. All of these traits can be reused by many different package objects.
It's because of micro batching, spark gives you NEAR real time processing. Spark streaming processing is essentially a batch processing platform retrofitted for for near real time processing. Storm is built ground up for real time processing only.
Windowing operations can easily be performed in storm also. One of the common use cases in storm is doing aggregation in real time stream.
It would be interesting to see the turn around time from the point a message enters the system to the point a metric is updated, between Spark streaming & Storm.. I have not seen many benchmarks targetting that.
In addition to real time processing of streams in Spark , you can also do batch level processing as hadoop.
The performance way , Spark is clearly ahead of storm.
So i would say if I have the freedom to choose between Spark and Storm , i would select Spark.
Since Spark relies on micro batching to simulate real time processing, it's likely to be slower than real real time processing systems like storm. If you have bench mark data that shows otherwise, please share the links.
Finally to be fair, let's be clear that Storm and Spark Streaming cannot be compared directly as the aggregation operators are offered out of the box by Spark and must be implemented manually in Storm. As a result, the benchmark mainly depends on your implementation on Storm. To be more accurate we should compare Trident and Spark. Notice that usually Trident involved an important drop of performance because of the reliability mechanism it offers.
1. The throughput is not same as turnaround time, would love to see some benchmarks thr. As Storm relies on passing data through its system, the performance is bottlenecked by network.
2. This benchmark was done by Amplabs, hence storm may not be tuned for the best performance.
That said even if storm is equal or similar performance, I do believe the benefit of reusing your hadoop stack for streaming & having a framework which can be leveraged for data warehousing, machine learning & analytics tips the scales heavily in favour of Spark.