The future of Scala, Uber logs, and the use of Virtual Threads with PostgreSQL - JVM Weekly vol. 68
After a period of news, announcements and releases, I think it's worth to take a break, sitting back and sinking in some interesting case studies and essays.
1. What does the future hold for Scala?
Do we have any Scala fans on board? I ask for good reason - as there was an interesting recent text Scala's Future, in which Alexandru Nedelcu attempts to assess how much of a future the language has, and also provides thoughtful reflections on the evolution and current state of various programming languages, with a particular focus on Scala specifically.
The author begins with a topic that not long ago seemed just a tale to scare programmers around the campfire, but today is increasingly starting to be invoked as a starting point for reflecting on the state of the industry today - with a recollection of the difficult times of the Internet bubble, especially in Romania, where he comes from. It turns out that even then there was a big temporary decline in Java's popularity during this period, this one, however, proved to be very resilient to market fluctuations.
The historical context sets the stage for discussions about the 'technology bubble' burst in 2023. Although this 'burst' can still be viewed with a degree of satisfaction, it is already marked not only by significant job cuts in the tech industry, but also interestingly by a decrease in online activity across most programming languages. The author highlights how deceptive the perception of languages' popularity can be, citing Rust and Kotlin as examples - as it appears that, contrary to widespread belief and the image of predatory tigers waiting in the wings, their actual growth in popularity has somewhat plateaued. Alexander also underscores the significance of hard facts over excessive enthusiasm, and that languages have their specific areas - for instance, the previously mentioned Rust competes with C++ and is not a direct rival to languages with automatic memory management like Java and Scali.
In the discussion about Scala, the author emphasizes its robust community and varied ecosystem, pointing out the community's high productivity despite its comparatively smaller size. The article also explores Python's evolution and its lengthy, challenging transition to Python 3. Despite numerous hurdles and worries, Python has become the most popular language globally according to various statistics, dominating fields like ML. This prominence of Python isn't coincidental - Alexander draws comparisons between it and Scala's current path, suggesting that while the transition to Scala 3.0 will be time-consuming and undoubtedly challenging for the entire community, it's the correct course of action. By transforming Scala 3.0 into a language that's objectively superior to version 2.0, following a tough transition phase, the ecosystem will be better equipped to compete for developers' affection worldwide.
And what does it look like for you? Do you believe there is a bright future ahead for Scala? Or are you just about to be rewritten for Scala 3.0?
2. How does Uber manage hundreds of terabytes of logs?
I'm going to stretch it a bit with this one, but I rarely get the chance to share a cool case from Uber. I've always enjoyed their publications because often instead of focusing on fairly abstract architectural topics (I'm looking at you, Netflix) they descend into things that are more mundane but still a piece of interesting engineering. This is no different in Reducing Logging Cost by Two Orders of Magnitude using CLP. The text is now over a year old, but it has recently resurfaced on social media and has not aged an iota since.
Why does text like this make it into the Java Newsletter? That's because the article discusses how Uber has significantly reduced logging costs by integrating the Compressed Log Processor (CLP) tool with the Log4j library in their data processing platform. As you can easily guess, as Uber's business grew, the volume of logged data increased dramatically, leading to high costs and the need to rotate logs quickly. The integration of CLP enabled a compression ratio of 169x, significantly reducing storage, memory, and network bandwidth requirements. This fix has allowed Uber to retain logs for longer periods, enabling historical data insights and more efficient log searches without the need for decompression
The text also provides a closer look at the size of Uber itself - prior to the patch, the company was generating up to 200TB of logs (!) in peaks. As such, standard existing log management tools such as Elasticsearch and ClickHouse were either too expensive or ill-suited to effectively handle the scale of Uber. We learn from the text that at this scale, it turns out that even things seemingly negligible by most of us, such as saving log files in small batches over time, caused premature SSD wear and tear.
3. Using Virtual Threads is not always straightforward? Who would have thought...
And finally, I have an interesting example of using Virtual Threads in real, non-trivial cases. The author of How we switched to Java 21 virtual threads and got a deadlock in TPC-C for PostgreSQL is Evgeniy Ivanov, who works on the YDB database. Its developers are currently working on PostgreSQL compatibility with YDB, mainly due to high demand from PostgreSQL users. During this process, they have been working on the implementation of benchmarks, which has caused them to encounter unexpected problems.
This post discusses the challenges and solutions encountered when implementing the Transaction Processing Performance Council Benchmark C (TPC-C) for PostgreSQL using Java 21 virtual threads. TPC-C is a standard performance test for database systems developed by the Transaction Processing Performance Council (TPC) to evaluate the performance of transactional processing. As is often the case with benchmarks, their implementation can be "tricky", as they often require non-standard techniques. This was also the case here, as the authors initially encountered a concurrency limitation problem due to the creation of too many physical threads in their original TPC-C implementation. To remedy this, they decided to use virtual threads from JDK 21, wanting to deal with high concurrency more efficiently. However, they soon discovered that virtual threads can lead to unpredictable blocks, which are painful to track - especially when they occur deep in the libraries used.
As an example, the author encountered a jamming situation in their TPC-C implementation for PostgreSQL using c3p0 - a library for pulsing JDBC calls in a high concurrency situation, with support for caching and reusing PreparedStatements
to pulse calls. To solve the problem, they started going lower and lower and discovered that virtual threads waiting for a session were blocking their carrier thread. The solution was to use java.util.concurrent.Semaphore
to package the call, which prevented blocking inside c3p0 and enabled the efficient use of virtual threads. This example demonstrates the complexity of implementing virtual threads and the importance of understanding how they interact with other components in the system. It turns out that problems can hit from unexpected places, and even if we ourselves use Virtual Threads correctly, we can never trust that the dependencies we use will not give us a surprise.
The article is quite an interesting resource for those considering a move to virtual threads, as it touches on the nuances that are often lost under the Hello World rash. The post (and also previous YDB publications) also contains quite a bit of other interesting information for those interested in database concurrency. If you have ever been interested in the topic - highly recommended.
Scala's heyday was probably 2010-2018 or so. Since the early days of Scala:
- People who wanted to do academic programming or data science or AI moved to Python
- People who wanted a practical get-stuff-done server-side language moved to Golang.
- People who wanted a Java++ moved to Kotlin.
- Rust became the language with the most developer enthusiasm.
- Many cluster computing use cases switched from Akka+Scala to Kubernetes.
- Java changed from a stagnant legacy language to a good, productive language worth using.
Next, when I read about the Scala 3.x improvements, the improvements sound esoteric without clear benefit. It still seems like the big features of Scala are all from the old days. Most of the big Scala projects have shrunk or collapsed. Anecdotally, most companies I see using Scala are supporting older projects; I don't see Scala chosen for new projects. I'm not rooting against it, but it seems like a fading legacy platform.