Java, Debugging, DevOps & Open Source

Software Testing as a Debugging Tool

Shai Almog — Tue, 16 Apr 2024 12:13:39 GMT

Debugging is not just about identifying errorsit's about instituting a reliable process for ensuring software health and longevity. In this post we discuss the role of software testing in debugging, including foundational concepts and how they converge to improve software quality.

https://youtu.be/yap509UZz6M

As a side note, if you like the content of this and the other posts in this series check out my Debugging book that covers this subject. If you have friends that are learning to code I'd appreciate a reference to my Java Basics book. If you want to get back to Java after a while check out my Java 8 to 21 book.

The Intersection of Debugging and Testing

Debugging and testing play distinct roles in software development. Debugging is the targeted process of identifying and fixing known bugs. Testing, on the other hand, encompasses a adjacent scope, identifying unknown issues by validating expected software behavior across a variety of scenarios.

Both are a part of the debug fix cycle which is a core concept in debugging. Before we cover the cycle we should first make sure we're aligned on the basic terminology.

Unit Tests

Unit tests are tightly linked to debugging efforts, focusing on isolated parts of the applicationtypically individual functions or methods. Their purpose is to validate that each unit operates correctly in isolation, making them a swift and efficient tool in the debugging arsenal. These tests are characterized by their speed and consistency, enabling developers to run them frequently, sometimes even automatically as code is written within the IDE.

Since software is so tightly bound it is nearly impossible to compose unit tests without extensive mocking. Mocking involves substituting a genuine component with a stand-in that returns predefined results, thus a test method can simulate scenarios without relying on the actual object. This is a powerful yet controversial tool. By using mocking we're in-effect creating a synthetic environment that might misrepresent the real world. We're reducing the scope of the test and might perpetuate some bugs.

Integration Tests

Opposite to unit tests, integration tests examine the interactions between multiple units, providing a more comprehensive picture of the system's health. While they cover broader scenarios, their setup can be more complex due to the interactions involved. However, they are crucial in catching bugs that arise from the interplay between different software components.

In general mocking can be used in integration tests but it is discouraged. They take longer to run and are sometimes harder to set up. However, many developers (myself included) would argue that they are the only benchmark for quality. Most bugs express themselves in the seams between the modules and integration tests are better at detecting that.

Since they are far more important some developers would argue that unit tests are unnecessary. This isn't true, unit test failures are much easier to read and understand. Since they are faster we can run them during development, even while typing. In that sense the balance between the two approaches is the important part.

Coverage

Coverage is a metric that helps quantify the effectiveness of testing by indicating the proportion of code exercised by tests. It helps identify potential areas of the code that have not been tested, which could harbor undetected bugs. However, striving for 100% coverage can be a case of diminishing returns; the focus should remain on the quality and relevance of the tests rather than the metric itself. In my experience, chasing high coverage numbers often results in bad test practices that persist problems.

It is my opinion that unit tests should be excluded from coverage metrics due to the importance of integration tests to overall quality. To get a sense of quality coverage should focus on integration and end to end tests.

The Debug-Fix Cycle

The debug-fix cycle is a structured approach that integrates testing into the debugging process. The stages include identifying the bug, creating a test that reproduces the bug, fixing the bug, verifying the fix with the test, and finally, running the application to ensure the fix works in the live environment. This cycle emphasizes the importance of testing in not only identifying but also in preventing the recurrence of bugs.

Notice that this is a simplified version of the cycle with a focus on the testing aspect only. The full cycle includes discussion of the issue tracking and versioning as part of the whole process. I discuss this more in-depth in other posts in the series and my book.

Composing Tests with Debuggers

A powerful feature of using debuggers in test composition is their ability to "jump to line" or "set value." Developers can effectively reset the execution to a point before the test and rerun it with different conditions, without recompiling or rerunning the entire suite. This iterative process is invaluable for achieving desired test constraints and improves the quality of unit tests by refining the input parameters and expected outcomes.

Increasing test coverage is about more than hitting a percentage; it's about ensuring that tests are meaningful and that they contribute to software quality. A debugger can significantly assist in this by identifying untested paths. When a test coverage tool highlights lines or conditions not reached by current tests, the debugger can be used to force execution down those paths. This helps in crafting additional tests that cover missed scenarios, ensuring that the coverage metric is not just a number but a true reflection of the software's tested state.

In this case you will notice that the next line in the body is a rejectValue call which will throw an exception. I dont want an exception thrown as I still want to test all the permutations of the method. I can drag the execution pointer (arrow on the left) and place it back at the start of the method.

Test-Driven Development

How does all of this fit with disciplines like Test-Driven Development (TDD)?

It doesn't fit well. Before we get into that let's revisit the basics of TDD. Weak TDD typically means just writing tests before writing the code. Strong TDD involves a red-green-refactor cycle:

Red: Write a test that fails because the feature it tests isn't implemented yet.
Green: Write the minimum amount of code necessary to make the test pass.
Refactor: Clean up the code while ensuring that tests continue to pass.

This rigorous cycle guarantees that new code is continually tested and refactored, reducing the likelihood of complex bugs. It also means that when bugs do appear, they are often easier to isolate and fix due to the modular and well-tested nature of the codebase. At least, that's the theory.

https://youtu.be/yImkjlm08Cw

TDD can be especially advantageous for scripting and loosely typed languages. In environments lacking the rigid structure of compilers and linters, TDD steps in to provide the necessary checks that would otherwise be performed during compilation in statically typed languages. It becomes a crucial substitute for compiler/linter checks, ensuring that type and logic errors are caught early.

In real-world application development, TDD's utility is nuanced. While it encourages thorough testing and upfront design, it can sometimes hinder the natural flow of development, especially in complex systems that evolve through numerous iterations. The requirement for 100% test coverage can lead to an unnecessary focus on fulfilling metrics rather than writing meaningful tests.

The biggest problem in TDD is its focus on unit testing. TDD is impractical with integration tests as the process would take too long. But as we determined in the start of this post, integration tests are the true benchmark for quality. In that test TDD is a methodology that provides great quality for arbitrary tests, but not necessarily great quality for the final product. You might have the best cog in the world, but if doesn't fit well into the machine then it isn't great.

Final Word

Debugging is a tool that not only fixes bugs but also actively aids in crafting tests that bolster software quality. By utilizing debuggers in test composition and increasing coverage, developers can create a suite of tests that not only identifies existing issues but also guards against future ones, thus ensuring the delivery of reliable, high-quality software.

Debugging lets us increase coverage and verify edge cases effectively. It's part of a standardized process for issue resolution that's critical for reliability and prevents regressions.

Wireshark & tcpdump: A Debugging Power Couple

Shai Almog — Tue, 02 Apr 2024 13:25:31 GMT

Wireshark, the free open-source packet sniffer and network protocol analyzer, has cemented itself as an indispensable tool in network troubleshooting, analysis, and security (on both sides). This blog post delves into the features, uses, and practical tips for harnessing the full potential of Wireshark, expanding on aspects that may have been glossed over in discussions or demonstrations. Whether you're a developer, security expert, or just curious about network operations, this guide will enhance your understanding of Wireshark and its applications.

https://youtu.be/QVWRomT2Ppo

Introduction to Wireshark

Wireshark was initially developed by Eric Rescorla and Gerald Combs, designed to capture and analyze network packets in real-time. Its capabilities extend across various network interfaces and protocols, making it a versatile tool for anyone involved in networking. Unlike its command-line counterpart, tcpdump, Wireshark's graphical interface simplifies the analysis process, presenting data in a user-friendly "proto view" that organizes packets in a hierarchical structure. This facilitates quick identification of protocols, ports, and data flows.

The key features of Wireshark are:

Graphical User Interface (GUI): Eases the analysis of network packets compared to command-line tools.
Proto View: Displays packet data in a tree structure, simplifying protocol and port identification.
Compatibility: Supports a wide range of network interfaces and protocols.

Browser Network Monitors

FireFox and Chrome contain a far superior network monitor tool built into them. It is superior because it is simpler to use and works with secure websites out of the box. If you can use the browser to debug the network traffic you should do that.

In the cases where your traffic requires low level protocol information or is outside of the browser, Wireshark is the next best thing.

Installation and Getting Started

To begin with Wireshark, visit their official website for the download. The installation process is straightforward, but attention should be paid to the installation of command-line tools, which may require separate steps. Upon launching Wireshark, users are greeted with a selection of network interfaces as seen below. Choosing the correct interface, such as the loopback for local server debugging, is crucial for capturing relevant data.

When debugging a Local Server (localhost) use the loopback interface. Remote servers will probably fit with the en0 network adapter. You can use the activity graph next to the network adapter to identify active interfaces for capture.

Navigating Through Noise with Filters

One of the challenges of using Wireshark is the overwhelming amount of data captured, including irrelevant "background noise" as seen in the following image.

Wireshark addresses this with powerful display filters, allowing users to hone in on specific ports, protocols, or data types. For instance, filtering TCP traffic on port 8080 can significantly reduce unrelated data, making it easier to debug specific issues.

Notice that the there is a completion widget on top of the Wireshark UI that lets you find out the values more easily.

In this case we filter by port tcp.port == 8080 which is the port used typically in Java servers (e.g. Spring Boot/tomcat).

But this isn't enough as HTTP is more concise. We can filter by protocol by adding http to the filter which narrows the view to HTTP requests and responses as shown in the following image.

Deep Dive into Data Analysis

Wireshark excels in its ability to dissect and present network data in an accessible manner. For example, HTTP responses carrying JSON data are automatically parsed and displayed in a readable tree structure as seen below. This feature is invaluable for developers and analysts, providing insights into the data exchanged between clients and servers without manual decoding.

Wireshark parses and displays JSON data within the packet analysis pane. It offers both hexadecimal and ASCII views for raw packet data.

Beyond Basic Usage

While Wireshark's basic functionalities cater to a wide range of networking tasks, its true strength lies in advanced features such as ethernet network analysis, HTTPS decryption, and debugging across devices. These tasks, however, may involve complex configuration steps and a deeper understanding of network protocols and security measures.

There are two big challenges when working with Wireshark:

HTTPS Decryption: Decrypting HTTPS traffic requires additional configuration but offers visibility into secure communications.
Device Debugging: Wireshark can be used to troubleshoot network issues on various devices, requiring specific knowledge of network configurations.

The Basics of HTTPS Encryption

HTTPS uses the Transport Layer Security (TLS) or its predecessor, Secure Sockets Layer (SSL), to encrypt data. This encryption mechanism ensures that any data transferred between the web server and the browser remains confidential and untouched. The process involves a series of steps including handshake, data encryption, and data integrity checks.

Decrypting HTTPS traffic is often necessary for developers and network administrators to troubleshoot communication errors, analyze application performance, or ensure that sensitive data is correctly encrypted before transmission. It's a powerful capability in diagnosing complex issues that cannot be resolved by simply inspecting unencrypted traffic or server logs.

Methods for Decrypting HTTPS in Wireshark

Important: Decrypting HTTPS traffic should only be done on networks and systems you own or have explicit permission to analyze. Unauthorized decryption of network traffic can violate privacy laws and ethical standards.

Pre-Master Secret Key Logging

One common method involves using the pre-master secret key to decrypt HTTPS traffic. Browsers like Firefox and Chrome can log the pre-master secret keys to a file when configured to do so. Wireshark can then use this file to decrypt the traffic:

Configure the Browser: Set an environment variable (SSLKEYLOGFILE) to specify a file where the browser will save the encryption keys.
Capture Traffic: Use Wireshark to capture the traffic as usual.
Decrypt the Traffic: Point Wireshark to the file with the pre-master secret keys (through Wireshark's preferences) to decrypt the captured HTTPS traffic.

Using a Proxy

Another approach involves routing traffic through a proxy server that decrypts HTTPS traffic and then re-encrypts it before sending it to the destination. This method might require setting up a dedicated decryption proxy that can handle the TLS encryption/decryption:

Set Up a Decryption Proxy: Tools like Mitmproxy or Burp Suite can act as an intermediary that decrypts and logs HTTPS traffic.
Configure Network to Route Through Proxy: Ensure the client's network settings route traffic through the proxy.
Inspect Traffic: Use the proxy's tools to inspect the decrypted traffic directly.

Integrating tcpdump with Wireshark for Enhanced Network Analysis

While Wireshark offers a graphical interface for analyzing network packets, there are scenarios where using it directly may not be feasible due to security policies or operational constraints. tcpdump, a powerful command-line packet analyzer, becomes invaluable in these situations, providing a flexible and less intrusive means of capturing network traffic.

The Role of tcpdump in Network Troubleshooting

tcpdump allows for the capture of network packets without a graphical user interface, making it ideal for use in environments with strict security requirements or limited resources. It operates under the principle of capturing network traffic to a file, which can then be analyzed at a later time or on a different machine using Wireshark.

https://www.youtube.com/watch?v=nLXu3_fzHhQ

Key Scenarios for tcpdump Usage:

High-security Environments: In places like banks or government institutions where running network sniffers might pose a security risk, tcpdump offers a less intrusive alternative.
Remote Servers: Debugging issues on a cloud server can be challenging with Wireshark due to the graphical interface; tcpdump captures can be transferred and analyzed locally.
Security-conscious Customers: Customers may be hesitant to allow third-party tools to run on their systems; tcpdump's command-line operation is often more palatable.

Using tcpdump Effectively

Capturing traffic with tcpdump involves specifying the network interface and an output file for the capture. This process is straightforward but powerful, allowing for detailed analysis of network interactions:

Command Syntax: The basic command structure for initiating a capture involves specifying the network interface (e.g., en0 for wireless connections) and the output file name.
Execution: Once the command is run, tcpdump silently captures network packets. The capture continues until it's manually stopped, at which point the captured data can be saved to the specified file.
Opening Captures in Wireshark: The file generated by tcpdump can be opened in Wireshark for detailed analysis, utilizing Wireshark's advanced features for dissecting and understanding network traffic.

The following shows the tcpdump command and its output:

$ sudo tcpdump -i en0 -w outputPassword:tcpdump: listening on en, link-type EN10MB (Ethernet), capture size 262144 bytes^C3845 packets captured4189 packets received by filter0 packets dropped by kernel

Challenges and Considerations

Identifying the correct network interface for capture on remote systems might require additional steps, such as using the ifconfig command to list available interfaces. This step is crucial for ensuring that relevant traffic is captured for analysis.

Final Word

Wireshark stands out as a powerful tool for network analysis, offering deep insights into network traffic and protocols. Whether it's for low-level networking work, security analysis, or application development, Wireshark's features and capabilities make it an essential tool in the tech arsenal. With practice and exploration, users can leverage Wireshark to uncover detailed information about their networks, troubleshoot complex issues, and secure their environments more effectively.

Wireshark's blend of ease of use with profound analytical depth ensures it remains a go-to solution for networking professionals across the spectrum. Its continuous development and wide-ranging applicability underscore its position as a cornerstone in the field of network analysis.

Combining tcpdump's capabilities for capturing network traffic with Wireshark's analytical prowess offers a comprehensive solution for network troubleshooting and analysis. This combination is particularly useful in environments where direct use of Wireshark is not possible or ideal. While both tools possess a steep learning curve due to their powerful and complex features, they collectively form an indispensable toolkit for network administrators, security professionals, and developers alike.

This integrated approach not only addresses the challenges of capturing and analyzing network traffic in various operational contexts but also highlights the versatility and depth of tools available for understanding and securing modern networks.

Mastering jhsdb: The Hidden Gem for Debugging JVM Issues

Shai Almog — Tue, 26 Mar 2024 11:53:09 GMT

jhsdb is a relatively underexplored yet incredibly powerful tool for debugging JVM issues. Whether you're tackling native code that crashes the JVM or delving into complex performance analysis, understanding how to use jhsdb effectively can be a game-changer in your debugging arsenal.

https://www.youtube.com/watch?v=UelhmnOR0lI

Introduction

Java 9 introduced many changes, with modules as the highlight. However, among these significant shifts, jhsdb didnt get the attention it deserved. Officially, Oracle describes jhsdb as a Serviceability Agent tool, part of the JDK aimed at snapshot debugging, performance analysis, and offering deep insights into the Hotspot JVM and Java applications running on it. Simply put, jhsdb is your go-to for delving into JVM internals, understanding core dumps, and diagnosing JVM or native library failures.

Getting Started with jhsdb

To begin we can invoke:

$ jhsdb --helpclhsdb           command line debuggerhsdb             ui debuggerdebugd --help    to get more informationjstack --help    to get more informationjmap   --help    to get more informationjinfo  --help    to get more informationjsnap  --help    to get more information

This command in reveals that jhsdb includes six distinct tools:

debugd: A remote debug server for connecting and diagnosing remotely.
jstack: Provides detailed stack and lock information.
jmap: Offers insights into heap memory.
jinfo: Displays basic JVM information.
jsnap: Assists with performance data.
Command Line Debugger: Although there's a preference for the GUI, we'll focus on GUI Debugging for a more visual approach.

Let's dive into these tools and explore how they can aid in diagnosing and resolving JVM issues.

Understanding and Using debugd

debugd might not be your first choice for production environments due to its remote debugging nature. Yet, it could be valuable for local container debugging. To use it we first need to detect the JVM process ID (PID) which we can accomplish using the jps command. Unfortunately, because of a bug in the UI you cant currently connect to a remote server via the GUI debugger. I could only use this with command-line tools such as jstack (discussed below).

With the command:

jhsdb debugd --pid 1234

We can connect to the process 1234. We can then use a tool like jstack to get additional information:

jhsdb jstack --connect localhost

Notice that the --connect argument applies globally and should work for all commands.

Leveraging jstack for Thread Dumps

jstack is instrumental in generating thread dumps, crucial for analyzing stack processes in user machines or production environments. This command can reveal detailed JVM running states, including deadlock detection, thread statuses, and compilation insights.

Typically we would use jstack locally which removes the need for debugd:

$ jhsdb jstack --pid 1234Attaching to process ID 1234, please wait...Debugger attached successfully.Server compiler detected.JVM version is 11.0.13+8-LTSDeadlock Detection:No deadlocks found."Keep-Alive-Timer" #189 daemon prio=8 tid=0x000000011d81f000 nid=0x881f waiting on condition [0x0000000172442000]   java.lang.Thread.State: TIMED_WAITING (sleeping)   JavaThread state: _thread_blocked - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - sun.net.www.http.KeepAliveCache.run() @bci=3, line=168 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=829 (Interpreted frame) - jdk.internal.misc.InnocuousThread.run() @bci=20, line=134 (Interpreted frame)"DestroyJavaVM" #171 prio=5 tid=0x000000011f809000 nid=0x2703 waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE   JavaThread state: _thread_blocked

This snapshot can help us infer many details about how the application acts locally and in production.

Is our code compiled?

Is it waiting on a monitor?

What other threads are running and what are they doing?

Heap Memory Analysis with jmap

For a deep dive into RAM and heap memory, jmap is unmatched. It displays comprehensive heap memory details, aiding in GC tuning and performance optimization. Particularly useful is the histo flag for identifying potential memory leaks through a histogram of RAM usage.

Typical usage of jmap is very similar to jstack and other tools mentioned in this post:

$ jhsdb jmap --pid 1234 --heapAttaching to process ID 1234, please wait...Debugger attached successfully.Server compiler detected.JVM version is 11.0.13+8-LTSusing thread-local object allocation.Garbage-First (G1) GC with 9 thread(s)Heap Configuration:   MinHeapFreeRatio         = 40   MaxHeapFreeRatio         = 70   MaxHeapSize              = 17179869184 (16384.0MB)   NewSize                  = 1363144 (1.2999954223632812MB)   MaxNewSize               = 10305404928 (9828.0MB)   OldSize                  = 5452592 (5.1999969482421875MB)   NewRatio                 = 2   SurvivorRatio            = 8   MetaspaceSize            = 21807104 (20.796875MB)   CompressedClassSpaceSize = 1073741824 (1024.0MB)   MaxMetaspaceSize         = 17592186044415 MB   G1HeapRegionSize         = 4194304 (4.0MB)Heap Usage:G1 Heap:   regions  = 4096   capacity = 17179869184 (16384.0MB)   used     = 323663048 (308.6691360473633MB)   free     = 16856206136 (16075.330863952637MB)   1.8839668948203325% usedG1 Young Generation:Eden Space:   regions  = 66   capacity = 780140544 (744.0MB)   used     = 276824064 (264.0MB)   free     = 503316480 (480.0MB)   35.483870967741936% usedSurvivor Space:   regions  = 8   capacity = 33554432 (32.0MB)   used     = 33554432 (32.0MB)   free     = 0 (0.0MB)   100.0% usedG1 Old Generation:   regions  = 4   capacity = 478150656 (456.0MB)   used     = 13284552 (12.669136047363281MB)   free     = 464866104 (443.3308639526367MB)   2.7783193086322986% used

In most cases this might seem like gibberish but when we experience GC thrashing this might be a secret weapon in your arsenal. You can use this to fine tune GC settings and determine the right parameters to set. Since this can easily run in production you can base this on real world observations.

If you could reproduce a memory leak but you dont have a debugger attached, you can use this to generate a memory histogram:

$ jhsdb jmap --pid 1234 --histoAttaching to process ID 72640, please wait...Debugger attached successfully.Server compiler detected.JVM version is 11.0.13+8-LTSIterating over heap. This may take a while...Object Histogram:num       #instances    #bytes    Class description--------------------------------------------------------------------------1:        225689    204096416    int[]2:        485992    59393024    byte[]3:        17221    23558328    sun.security.ssl.CipherSuite[]4:        341376    10924032    java.util.HashMap$Node5:        117706    9549752    java.util.HashMap$Node[]6:        306720    7361280    java.lang.String7:        12718    6713944    char[]8:        113884    5466432    java.util.HashMap9:        64683    4657176    java.util.regex.Matcher10:        95612    4615720    java.lang.Object[]11:        106233    4249320    java.util.HashMap$KeyIterator12:        16166    4090488    long[]13:        126977    4063264    java.util.concurrent.ConcurrentHashMap$Node14:        150789    3618936    java.util.ArrayList15:        130167    3546016    java.lang.String[]16:        156237    3227152    java.lang.Class[]17:        33145    2916760    java.lang.reflect.Method18:        32193    2575440    nonapi.io.github.classgraph.fastzipfilereader.FastZipEntry19:        17314    2051672    java.lang.Class20:        32043    1794408    io.github.classgraph.ClasspathElementZip$121:        107918    1726688    java.util.HashSet22:        105970    1695520    java.util.HashMap$KeySet

This can help narrow down the source of the issue. There are better tools for that in the IDE and during development. But if you're running a server even locally, it can instantly give you a snapshot of RAM.

Basic JVM Insights with jinfo

Though not as detailed as other commands, jinfo is useful for a quick glance at system properties and JVM flags, especially on unfamiliar machines. It's a straightforward tool that requires just a PID to function.

jhsdb jinfo --pid 1234

Performance Metrics with jsnap

jsnap offers a wealth of internal metrics and statistics, such as thread counts and peak numbers. This data is vital for fine-tuning aspects like thread pool sizes, directly impacting production overhead.

$ jhsdb jsnap --pid 72640Attaching to process ID 72640, please wait...Debugger attached successfully.Server compiler detected.JVM version is 11.0.13+8-LTSjava.threads.started=418 event(s)java.threads.live=12java.threads.livePeak=30java.threads.daemon=8java.cls.loadedClasses=16108 event(s)java.cls.unloadedClasses=0 event(s)java.cls.sharedLoadedClasses=0 event(s)java.cls.sharedUnloadedClasses=0 event(s)java.ci.totalTime=23090159603 tick(s)java.property.java.vm.specification.version=11java.property.java.vm.specification.name=Java Virtual Machine Specificationjava.property.java.vm.specification.vendor=Oracle Corporationjava.property.java.vm.version=11.0.13+8-LTSjava.property.java.vm.name=OpenJDK 64-Bit Server VMjava.property.java.vm.vendor=Azul Systems, Inc.java.property.java.vm.info=mixed modejava.property.jdk.debug=release

GUI Debugging: A Visual Approach

We'll skip over the CLI debugger, the GUI debugger deserves a mention for its user-friendly interface, allowing connections to core files, servers, or PIDs with ease. This visual tool opens up a new dimension in debugging, especially when working with JNI native code.

The GUI debugger can be launched just like any other of the jhsdb tools:

jhsdb hsdb --pid 1234

The GUI layout is designed for ease of navigation, offering a comprehensive view of JVM internals at a glance. Here are some key features and how to use them:

File Menu: This is your starting point for connecting to debugging targets. You can load core files for post-mortem analysis, attach to running processes to diagnose live issues, or connect to remote debug servers if youre dealing with distributed systems.
Threads and Monitors: The GUI provides a real-time view of thread states, making it easier to identify deadlocks, thread contention, and monitor locks. This visual representation simplifies the process of pinpointing concurrency issues that could be affecting application performance.
Heap Summary: For memory analysis, the GUI debugger gives a graphical overview of heap usage, including generations (for GC analysis), object counts, and memory footprints. This makes identifying memory leaks and optimizing garbage collection strategies more intuitive.
Method and Stack Inspection: Delving into method executions and stack frames is seamless, allowing you to trace the execution path, inspect local variables, and evaluate the state of the application at different points in time.

Final Word

jhsdb stands out as an essential tool in the debugging toolkit, especially for those dealing with JVM and native code issues. Its range of capabilities, from deep memory analysis to performance metrics, makes it a versatile choice for developers and system administrators alike.

The biggest benefit is in debugging the interaction between Java code and native code. Such code often fails in odd ways and on end user machines. In such situations a typical debugger might not be the best tool and might not expose the whole picture. This is especially true if you get a JVM core dump which is the main use case for jhsdb.

Debugging Streams with Peek

Shai Almog — Tue, 12 Mar 2024 10:21:47 GMT

I blogged about Java Stream debugging in the past but I skipped an important method that's worthy of a post of its own: peek. This blog post delves into the practicalities of using peek() to debug Java streams, complete with code samples and common pitfalls.

Understanding Java Streams

Java Streams represent a significant shift in how Java developers work with collections and data processing, introducing a functional approach to handling sequences of elements. Streams facilitate declarative processing of collections, enabling operations such as filter, map, reduce, and more in a fluent style. This not only makes the code more readable but also more concise compared to traditional iterative approaches.

A Simple Stream Example

To illustrate, consider the task of filtering a list of names to only include those that start with the letter "J" and then transforming each name into uppercase. Using the traditional approach, this might involve a loop and some if statements. However, with streams, this can be accomplished in a few lines:

List names = Arrays.asList("John", "Jacob", "Edward", "Emily");// Convert list to streamList filteredNames = names.stream()                         // Filter names that start with "J"                  .filter(name -> name.startsWith("J"))                    // Convert each name to uppercase                  .map(String::toUpperCase)                                // Collect results into a new list                  .collect(Collectors.toList());         System.out.println(filteredNames);

Output:

[JOHN, JACOB]

This example demonstrates the power of Java streams: by chaining operations together, we can achieve complex data transformations and filtering with minimal, readable code. It showcases the declarative nature of streams, where we describe what we want to achieve rather than detailing the steps to get there.

What is the `peek()` Method?

At its core, peek() is a method provided by the Stream interface, allowing developers a glance into the elements of a stream without disrupting the flow of its operations. The signature of peek() is as follows:

Stream peek(Consumersuper T> action)

It accepts a Consumer functional interface, which means it performs an action on each element of the stream without altering them. The most common use case for peek() is logging the elements of a stream to understand the state of data at various points in the stream pipeline. To understand peek lets look at a sample similar to the previous one:

List collected = Stream.of("apple", "banana", "cherry")                               .filter(s -> s.startsWith("a"))                               .collect(Collectors.toList());System.out.println(collected);

This code filters a list of strings, keeping only the ones that start with "a". While it's straightforward, understanding what happens during the filter operation is not visible.

Debugging with `peek()`

Now, let's incorporate peek() to gain visibility into the stream:

List collected = Stream.of("apple", "banana", "cherry")                               .peek(System.out::println) // Logs all elements                               .filter(s -> s.startsWith("a"))                               .peek(System.out::println) // Logs filtered elements                               .collect(Collectors.toList());System.out.println(collected);

By adding peek() both before and after the filter operation, we can see which elements are processed and how the filter impacts the stream. This visibility is invaluable for debugging, especially when the logic within the stream operations becomes complex.

We can't step over stream operations with the debugger, but peek() provides a glance into the code that is normally obscured from us.

Uncovering Common Bugs with `peek()`

Filtering Issues

Consider a scenario where a filter condition is not working as expected:

List collected = Stream.of("apple", "banana", "cherry", "Avocado")                               .filter(s -> s.startsWith("a"))                               .collect(Collectors.toList());System.out.println(collected);

Expected output might be ["apple"], but let's say we also wanted "Avocado" due to a misunderstanding of the startsWith method's behavior. Since "Avocado" is spelled with an upper case "A" this code will return false: Avocado".startsWith("a"). Using peek(), we can observe the elements that pass the filter:

List debugged = Stream.of("apple", "banana", "cherry", "Avocado")                              .peek(System.out::println)                              .filter(s -> s.startsWith("a"))                              .peek(System.out::println)                              .collect(Collectors.toList());System.out.println(debugged);

Large Data Sets

In scenarios involving large datasets, directly printing every element in the stream to the console for debugging can quickly become impractical. It can clutter the console and make it hard to spot the relevant information. Instead, we can use peek() in a more sophisticated way to selectively collect and analyze data without causing side effects that could alter the behavior of the stream.

Consider a scenario where we're processing a large dataset of transactions, and we want to debug issues related to transactions exceeding a certain threshold:

class Transaction {    private String id;    private double amount;    // Constructor, getters, and setters omitted for brevity}List transactions = // Imagine a large list of transactions// A placeholder for debugging informationList highValueTransactions = new ArrayList<>();List processedTransactions = transactions.stream()    // Filter transactions above a threshold    .filter(t -> t.getAmount() > 5000)     .peek(t -> {        if (t.getAmount() > 10000) {            // Collect only high-value transactions for debugging            highValueTransactions.add(t);        }     })     .collect(Collectors.toList());// Now, we can analyze high-value transactions separately, without overloading the consoleSystem.out.println("High-value transactions count: " +        highValueTransactions.size());

In this approach, peek() is used to inspect elements within the stream conditionally. High-value transactions that meet a specific criterion (e.g., amount > 10,000) are collected into a separate list for further analysis. This technique allows for targeted debugging without printing every element to the console, thereby avoiding performance degradation and clutter.

Addressing Side Effects

While streams shouldn't have side effects. In fact, such side effects would break the stream debugger in IntelliJ which I discussed in the past. It's crucial to note that while collecting data for debugging within peek() avoids cluttering the console, it does introduce a side effect to the stream operation, which goes against the recommended use of streams. Streams are designed to be side-effect-free to ensure predictability and reliability, especially in parallel operations.

Therefore, while the above example demonstrates a practical use of peek() for debugging, it's important to use such techniques judiciously. Ideally, this debugging strategy should be temporary and removed once the debugging session is completed to maintain the integrity of the stream's functional paradigm.

Limitations and Pitfalls

While peek() is undeniably a useful tool for debugging Java streams, it comes with its own set of limitations and pitfalls that developers should be aware of. Understanding these can help avoid common traps and ensure that peek() is used effectively and appropriately.

Potential for Misuse in Production Code

One of the primary risks associated with peek() is its potential for misuse in production code. Because peek() is intended for debugging purposes, using it to alter state or perform operations that affect the outcome of the stream can lead to unpredictable behavior. This is especially true in parallel stream operations, where the order of element processing is not guaranteed. Misusing peek() in such contexts can introduce hard-to-find bugs and undermine the declarative nature of stream processing.

Performance Overhead

Another consideration is the performance impact of using peek(). While it might seem innocuous, peek() can introduce a significant overhead, particularly in large or complex streams. This is because every action within peek() is executed for each element in the stream, potentially slowing down the entire pipeline. When used excessively or with complex operations, peek() can degrade performance, making it crucial to use this method judiciously and remove any peek() calls from production code after debugging is complete.

Side Effects and Functional Purity

As highlighted in the enhanced debugging example, peek() can be used to collect data for debugging purposes, but this introduces side effects to what should ideally be a side-effect-free operation. The functional programming paradigm, which streams are a part of, emphasizes purity and immutability. Operations should not alter state outside their scope. By using peek() to modify external state (even for debugging), you're temporarily stepping away from these principles. While this can be acceptable for short-term debugging, it's important to ensure that such uses of peek() do not find their way into production code, as they can compromise the predictability and reliability of your application.

The Right Tool for the Job

Finally, it's essential to recognize that peek() is not always the right tool for every debugging scenario. In some cases, other techniques such as logging within the operations themselves, using breakpoints and inspecting variables in an IDE, or writing unit tests to assert the behavior of stream operations might be more appropriate and effective. Developers should consider peek() as one tool in a broader debugging toolkit, employing it when it makes sense and opting for other strategies when they offer a clearer or more efficient path to identifying and resolving issues.

Navigating the Pitfalls

To navigate these pitfalls effectively:

Reserve peek() strictly for temporary debugging purposes. If you have a linter as part of your CI tools it might make sense to add a rule that block code from invoking peek().
Always remove peek() calls from your code before committing it to your codebase, especially for production deployments.
Be mindful of performance implications and the potential introduction of side effects.
Consider alternative debugging techniques that might be more suited to your specific needs or the particular issue you're investigating.

By understanding and respecting these limitations and pitfalls, developers can leverage peek() to enhance their debugging practices without falling into common traps or inadvertently introducing problems into their codebases.

Final Thoughts

The peek() method offers a simple yet effective way to gain insights into Java stream operations, making it a valuable tool for debugging complex stream pipelines. By understanding how to use peek() effectively, developers can avoid common pitfalls and ensure their stream operations perform as intended. As with any powerful tool, the key is to use it wisely and in moderation.

The true value of peek() is in debugging massive data sets, these elements are very hard to analyze even with dedicated tools. By using peek() we can dig into said data set and understand the source of the issue programmatically.

Debugging Using JMX Revisited

Shai Almog — Tue, 27 Feb 2024 08:45:39 GMT

Debugging effectively requires a nuanced approach, similar to using tongs that tightly grip the problem from both sides. While low-level tools have their place in system-level service debugging, today's focus shifts towards a more sophisticated segment of the development stack: advanced management tools. Understanding these tools is crucial for developers, as it bridges the gap between code creation and operational deployment, enhancing both efficiency and effectiveness in managing applications across extensive infrastructures.

https://youtu.be/rQjHAMM3XfY

The Need for Advanced Management Tools in Development

Development and DevOps teams utilize an array of tools, often perceived as complex or alien by developers. These tools, designed for scalability, enable the management of thousands of servers simultaneously. Such capabilities, although not always necessary for smaller scales, offer significant advantages in application management. Advanced management tools facilitate the navigation and control over multiple machines, making them indispensable for developers seeking to optimize application performance and reliability.

Introduction to JMX (Java Management Extensions)

One of the pivotal standards in application management is Java Management Extensions (JMX), which Java introduced to simplify the interaction with and management of applications. JMX allows both applications and the Java Development Kit (JDK) itself to expose critical information and functionalities, enabling external tools to manipulate these elements dynamically. Although activating JMX falls outside this discussion, its significance cannot be overstated, with ample resources available for those interested in its implementation.

Setting up JMX

JMX isn't enabled by default, to enable it we need the following steps:

Modify the JVM Startup Parameters: To enable JMX on a Java application, you must adjust the Java Virtual Machine (JVM) startup parameters. This involves adding specific flags to your application's startup command. The essential flags for enabling JMX are:
- -Dcom.sun.management.jmxremote: This flag activates the JMX remote management and monitoring.
- -Dcom.sun.management.jmxremote.port=: Replace with a specific port number where the JMX remote connection will listen.
- -Dcom.sun.management.jmxremote.ssl=false: This flag disables SSL for JMX connections. For development environments, SSL might be disabled for simplicity, but for production environments, consider enabling SSL for security.
- -Dcom.sun.management.jmxremote.authenticate=false: This flag disables authentication. Similar to SSL, authentication may be disabled in development but should be enabled in production to ensure secure access.
Restart Your Application: With the JVM parameters set, restart your application. This will apply the new startup parameters, activating JMX.
Verify JMX Connectivity: After restarting your application, you can verify that JMX is enabled by connecting to it using a JMX client such as JConsole, VisualVM, or a custom management application. Use the port number specified in the startup parameters to establish the connection.

JMX Security Considerations

While enabling JMX provides powerful management capabilities, it's crucial to consider security implications, especially when JMX is exposed over a network. When deploying applications in production, always enable SSL and authentication to protect against unauthorized access. Additionally, consider firewall rules and network policies to restrict JMX access to trusted clients.

Understanding MBeans

Central to JMX are Management Beans (MBeans), which serve as the control points within an application. These beans enable developers to publish specific functionalities for runtime monitoring and configuration. The ability to export application metrics to dashboards through MBeans is particularly valuable, facilitating real-time decision-making based on accurate, up-to-date information. Furthermore, operations such as user management can be exposed through MBeans, enhancing administrative capabilities.

Spring and Management Beans

Spring Framework's Actuator module exemplifies the integration of management capabilities within development, offering extensive metrics and operational details. This integration propels applications to "production-ready" status, allowing developers to monitor and manage applications with unprecedented depth and efficiency.

Tooling for JMX Management

While JMX can be accessed through various web interfaces and administrative tools, command-line tooling offers a direct, efficient method for interacting with JMX-enabled applications on production servers. Tools like JMXTerm complement visual tools by providing a streamlined interface for rapid insights, especially in environments unfamiliar to the developer.

Getting Started with JMXTerm

JMXTerm is a powerful utility for managing JMX without the need for graphical visualization, ideal for quick diagnostics or high-level server insights. After enabling JMX on the JVM and setting up the necessary configurations, developers can connect to servers, explore different JMX domains, and manipulate MBeans directly from the command line.

We can accomplish all of the following via visual tools and sometimes using a web interface. Normally, that's the approach I use. However, as a learning tool I think JMXTerm is fantastic since it exposes things in a way that's consistent and verbose. If we can understand JMXTerm the GUI version would be a walk in the park...

We can launch JMXTerm using the command line, in my case I used the following command:

java -jar ~/Downloads/jmxterm-1.0.2-uber.jar --url localhost:30002

Once the connection is made we can issue commands to JMX and retrieve information about the JVM or the application e.g. I can list the domains which you can think of as similar to "packages" or "modules" a way to organize the various beans:

$>domains#following domains are availableJMImplementationcom.sun.managementjava.langjava.niojava.util.loggingjavax.cachejdk.management.jfr

I can select a specific domain and thus perform future operations within said domain:

$>domain java.util.logging#domain is set to java.util.logging

Once inside the domain I can select a specific bean and perform operations on it. For this I need to first list the beans in the domain, in this case there's only the logging bean. I can then select that bean using the bean command:

$>beans#domain = java.util.logging:java.util.logging:type=Logging$>bean java.util.logging:type=Logging#bean is set to java.util.logging:type=Logging

I can perform many operations on beans, perhaps the most useful is the info command which lets me query a bean. Notice that a bean can have attributes, think of them like object fields and operations which you can think of as methods. There are also notifications which you can think of as events:

$>info#mbean = java.util.logging:type=Logging#class name = sun.management.ManagementFactoryHelper$PlatformLoggingImpl# attributes  %0   - LoggerNames ([Ljava.lang.String;, r)  %1   - ObjectName (javax.management.ObjectName, r)# operations  %0   - java.lang.String getLoggerLevel(java.lang.String p0)  %1   - java.lang.String getParentLoggerName(java.lang.String p0)  %2   - void setLoggerLevel(java.lang.String p0,java.lang.String p1)#there's no notifications

I can run operations and pass various commands e.g. I can get the logger level, set it and then check that the logger level was indeed updated:

$>run getLoggerLevel "org.apache.tomcat.websocket.WsWebSocketContainer"#calling operation getLoggerLevel of mbean java.util.logging:type=Logging with params [org.apache.tomcat.websocket.WsWebSocketContainer]#operation returns:$>run setLoggerLevel org.apache.tomcat.websocket.WsWebSocketContainer INFO#calling operation setLoggerLevel of mbean java.util.logging:type=Logging with params [org.apache.tomcat.websocket.WsWebSocketContainer, INFO]#operation returns: null$>run getLoggerLevel "org.apache.tomcat.websocket.WsWebSocketContainer"#calling operation getLoggerLevel of mbean java.util.logging:type=Logging with params [org.apache.tomcat.websocket.WsWebSocketContainer]#operation returns: INFO

This is just the tip of the iceberg. We can get many things such as spring settings, internal VM information, etc. In this example I can query VM information directly from the console:

$>domain com.sun.management#domain is set to com.sun.management$>beans#domain = com.sun.management:com.sun.management:type=DiagnosticCommandcom.sun.management:type=HotSpotDiagnostic$>bean com.sun.management:type=HotSpotDiagnostic#bean is set to com.sun.management:type=HotSpotDiagnostic$>info#mbean = com.sun.management:type=HotSpotDiagnostic#class name = com.sun.management.internal.HotSpotDiagnostic# attributes  %0   - DiagnosticOptions ([Ljavax.management.openmbean.CompositeData;, r)  %1   - ObjectName (javax.management.ObjectName, r)# operations  %0   - void dumpHeap(java.lang.String p0,boolean p1)  %1   - javax.management.openmbean.CompositeData getVMOption(java.lang.String p0)  %2   - void setVMOption(java.lang.String p0,java.lang.String p1)#there's no notifications

Leveraging JMX in Debugging and Management

JMX stands out as a robust tool for wiring management consoles, allowing developers to expose critical settings and metrics for their projects. Beyond its conventional use, JMX can be leveraged as part of the debugging process, serving as a pseudo-interface for triggering debugging scenarios or observing debugging sessions within the management UI. This approach not only simplifies the management of server applications but also enhances the developer's ability to diagnose and resolve issues efficiently.

Exposing MBeans in Spring Boot

Up until now we discussed the process of working with beans that are a part of the JVM or Spring. But what about our own application logic?

We can expose our own applications internal state so we (and our SREs) can review these in production and staging. Instead of building a custom control panel or logging everything, we can just expose the data. If a flag is problematic we can change it in production, if you want to query a specific state it too can be exposed.

Spring Boot simplifies the management and monitoring of applications through its comprehensive support for JMX. By leveraging Spring's infrastructure, we can easily expose their application's beans as JMX Managed Beans (MBeans), making them accessible for monitoring and management via JMX clients.

Understanding Spring Boot JMX Support

Spring Boot automatically configures JMX for you and exposes any beans annotated with @ManagedResource as JMX MBeans. This feature, combined with Spring Boots Actuator, provides a rich set of management endpoints, covering various aspects of the application, from metrics to thread dumps.

Expose an MBean in Spring Boot

To expose a bean we need to take the following steps:

Define a Management Interface: Create an interface that defines the operations and attributes you wish to expose via JMX. This interface should be annotated with JMX annotations such as @ManagedOperation for methods and @ManagedAttribute for fields or getter/setter methods.
Implement the MBean: Implement the interface in a class that performs the actual logic for the operations and attributes defined. This class represents your MBean and can be a regular Spring-managed bean.
Annotate the Bean with@ManagedResource: Annotate your MBean implementation class with @ManagedResource to indicate that it should be exposed as an MBean. You can specify the object name for the MBean in this annotation, which is how it will be identified in JMX clients.
Enable JMX in Spring Boot: Ensure that JMX is enabled in your Spring Boot application. This is usually the default behavior, but you can explicitly enable it by setting spring.jmx.enabled=true in your application.properties or application.yml file.
Access the MBean via a JMX Client: Once your application is running, you can access the exposed MBean through any standard JMX client, such as JConsole, VisualVM, or a custom client. Connect to the Spring Boot application's JMX domain, and you'll find the MBean you exposed, ready for interaction.

Example: Exposing a Simple Configuration MBean

// Define a management interfacepublic interface ConfigurationMBean {    @ManagedAttribute    String getApplicationName();    @ManagedOperation    void updateApplicationName(String name);}// Implement the MBean@Component@ManagedResource(objectName = "com.example:type=Configuration")public class Configuration implements ConfigurationMBean {    private String applicationName = "MyApp";    @Override    public String getApplicationName() {        return applicationName;    }    @Override    public void updateApplicationName(String name) {        this.applicationName = name;    }}

In this example, the Configuration class is annotated with @ManagedResource, making it available as an MBean with operations and attributes accessible via JMX clients.

Exposing MBeans in Spring Boot is a powerful feature that enhances the management and monitoring capabilities of applications. By following the steps outlined above, developers can provide external tools with dynamic access to application internals, offering a window into the runtime behavior and allowing for adjustments on the fly. This not only aids in debugging and performance tuning but also aligns with best practices for building manageable, robust applications.

Final Word

Advanced management tools, particularly JMX and its integration with frameworks like Spring, offer developers powerful capabilities for application monitoring, configuration, and debugging. By understanding and utilizing these tools, developers can achieve a deeper level of control over their applications, enhancing both performance and reliability. Whether through graphical interfaces or command-line utilities like JMXTerm, the dynamic manipulation and monitoring of applications in runtime environments open new avenues for effective software development and management. As the bridge between development and operations continues to narrow, mastering these advanced tools becomes essential for any developer looking to excel in today's fast-paced technological landscape.

Unleashing the Power of Git Bisect

Shai Almog — Tue, 13 Feb 2024 12:31:14 GMT

We don't usually think of Git as a debugging tool. Surprisingly, Git shines not just as a version control system but also as a potent debugging ally when dealing with the tricky matter of regressions.

https://youtu.be/yZuPHEBbjYI

The Essence of Debugging with Git

Before we tap into the advanced aspects of git bisect, it's essential to understand its foundational premise. Git is known for tracking changes and managing code history, but the git bisect tool is a hidden gem for regression detection. Regressions are distinct from generic bugs, they signify a backward step in functionalitywhere something that once worked flawlessly now fails. Pinpointing the exact change causing a regression can be akin to finding a needle in a haystack, particularly in extensive codebases with long commit histories.

Traditionally, developers would employ a manual, binary search strategychecking out different versions, testing them, and narrowing down the search scope. This method, while effective, is painstakingly slow and error-prone. Git bisect automates this search, transforming what used to be a marathon into a swift sprint.

Setting the Stage for Debugging

Imagine you're working on a project, and recent reports indicate a newly introduced bug affecting the functionality of a feature that previously worked flawlessly. You suspect a regression but are unsure which commit introduced the issue among the hundreds made since the last stable version.

Initiating Bisect Mode

To start, you'll enter bisect mode in your terminal within the project's Git repository:

git bisect start

This command signals Git to prepare for the bisect process.

Marking the Known Good Revision

Next, you identify a commit where the feature functioned correctly, often a commit tagged with a release number or dated before the issue was reported. Mark this commit as "good":

git bisect good a1b2c3d

Here, a1b2c3d represents the hash of the known good commit.

Marking the Known Bad Revision

Similarly, you mark the current version or a specific commit where the bug is present as "bad":

git bisect bad z9y8x7w

z9y8x7w is the hash of the bad commit, typically the latest commit in the repository where the issue is observed.

Bisecting to Find the Culprit

Upon marking the good and bad commits, Git automatically jumps to a commit roughly in the middle of the two and waits for you to test this revision. After testing (manually or with a script), you inform Git of the result:

If the issue is present: git bisect bad
If the issue is not present: git bisect good

Git then continues to narrow down the range, selecting a new commit to test based on your feedback.

Expected Output

After several iterations, Git will isolate the problematic commit, displaying a message similar to:

Bisecting: 0 revisions left to test after this (roughly 3 steps)[abcdef1234567890] Commit message of the problematic commit

Reset and Analysis

Once the offending commit is identified, you conclude the bisect session to return your repository to its initial state:

git bisect reset

Notice that bisect isn't linear. Bisect doesn't scan through the revisions in a sequential manner. Based on the good and bad markers, Git automatically selects a commit approximately in the middle of the range for testing (e.g., commit #6 in the following diagram). This is where the non-linear, binary search pattern starts, as Git divides the search space in half instead of examining each commit sequentially. This means fewer revisions get scanned and the process is faster.

Advanced Usage and Tips

The magic of git bisect lies in its ability to automate the binary search algorithm within your repository, systematically halving the search space until the rogue commit is identified.

Git bisect offers a powerful avenue for debugging, especially for identifying regressions in a complex codebase. To elevate your use of this tool, consider delving into more advanced techniques and strategies. These tips not only enhance your debugging efficiency but also provide practical solutions to common challenges encountered during the bisecting process.

Script Automation for Precision and Efficiency

Automating the bisect process with a script is a game-changer, significantly reducing manual effort and minimizing the risk of human error. This script should ideally perform a quick test that directly targets the regression, returning an exit code based on the test's outcome.

Example: Imagine you're debugging a regression where a web application's login feature breaks. You could write a script that attempts to log in using a test account and checks if the login succeeds. The script might look something like this in a simplified form:

#!/bin/bash# Attempt to log in and check for successif curl -s http://yourapplication/login -d "username=test&password=test" | grep -q "Welcome"; then  exit 0 # Login succeeded, mark this commit as goodelse  exit 1 # Login failed, mark this commit as badfi

By passing this script to git bisect run, Git automatically executes it at each step of the bisect process, effectively automating the regression hunt.

Handling Flaky Tests with Strategy

Flaky tests, which sometimes pass and sometimes fail under the same conditions, can complicate the bisecting process. To mitigate this, your automation script can include logic to rerun tests a certain number of times or to apply more sophisticated checks to differentiate between a true regression and a flaky failure.

Example: Suppose you have a test that's known to be flaky. You could adjust your script to run the test multiple times, considering the commit "bad" only if the test fails consistently:

#!/bin/bash# Run the flaky test three timessuccess_count=0for i in {1..3}; do  if ./run_flaky_test.sh; then    ((success_count++))  fidone# If the test succeeds twice or more, consider it a passif [ "$success_count" -ge 2 ]; then  exit 0else  exit 1fi

This approach reduces the chances that a flaky test will lead to incorrect bisect results.

Skipping Commits with Care

Sometimes, you'll encounter commits that cannot be tested due to reasons like broken builds or incomplete features. git bisect skip is invaluable here, allowing you to bypass these commits. However, use this command judiciously to ensure it doesn't obscure the true source of the regression.

Example: If you know that commits related to database migrations temporarily break the application, you can skip testing those commits. During the bisect session, when Git lands on a commit you wish to skip, you would manually issue:

git bisect skip

This tells Git to exclude the current commit from the search and adjust its calculations accordingly. It's essential to only skip commits when absolutely necessary, as skipping too many can interfere with the accuracy of the bisect process.

These advanced strategies enhance the utility of git bisect in your debugging toolkit. By automating the regression testing process, handling flaky tests intelligently, and knowing when to skip untestable commits, you can make the most out of git bisect for efficient and accurate debugging. Remember, the goal is not just to find the commit where the regression was introduced but to do so in the most time-efficient manner possible. With these tips and examples, you're well-equipped to tackle even the most elusive regressions in your projects.

Unraveling a Regression Mystery

In the past we got to use git bisect, when working on a large-scale web application. After a routine update, users began reporting a critical feature failure: the application's payment gateway stopped processing transactions correctly, leading to a significant business impact.

We knew the feature worked in the last release but had no idea which of the hundreds of recent commits introduced the bug. Manually testing each commit was out of the question due to time constraints and the complexity of the setup required for each test.

Enter git bisect. The team started by identifying a "good" commit where the payment gateway functioned correctly and a "bad" commit where the issue was observed. We then crafted a simple test script that would simulate a transaction and check if it succeeded.

By running git bisect start, followed by marking the known good and bad commits, and executing the script with git bisect run, we set off on an automated process that identified the faulty commit. Git efficiently navigated through the commits, automatically running the test script on each step. In a matter of minutes, git bisect pinpointed the culprit: a seemingly innocuous change to the transaction logging mechanism that inadvertently broke the payment processing logic.

Armed with this knowledge we reverted the problematic change, restoring the payment gateway's functionality and averting further business disruption. This experience not only resolved the immediate issue but also transformed our approach to debugging, making git bisect a go-to tool in our arsenal.

Final Word

The story of the payment gateway regression is just one example of how git bisect can be a lifesaver in the complex world of software development. By automating the tedious process of regression hunting, git bisect not only saves precious time but also brings a high degree of precision to the debugging process.

As developers continue to navigate the challenges of maintaining and improving complex codebases, tools like git bisect underscore the importance of leveraging technology to work smarter, not harder. Whether you're dealing with a mysterious regression or simply want to refine your debugging strategies, git bisect offers a powerful, yet underappreciated, solution to swiftly and accurately identify the source of regressions. Remember, the next time you're faced with a regression, git bisect might just be the debugging partner you need to uncover the truth hidden within your commit history.

The Best Way to Diagnose a Patient is to Cut Him Open

Shai Almog — Tue, 06 Feb 2024 14:19:01 GMT

"The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." -- Brian Kernighan.

Cutting a patient open and using print for debugging used to be the best ways to diagnose problems. If you still advocate either one of those as the superior approach to troubleshooting then you're either facing a very niche problem or need to update your knowledge. This is a frequent occurrence, e.g this recent tweet:

This specific tweet got to the HN front page and people chimed in with that usual repetitive nonsense. No, its not the best way for the vast majority of developers. It should be discouraged just as surgery should be avoided when possible.

Fixating on print debugging is a form of a mental block, debugging isnt just stepping over code. It requires a completely new way of thinking about issue resolution. A way that is far superior to merely printing a few lines.

Before I continue, my bias is obvious. I wrote a book about debugging and I blog about it a lot. This is a pet peeve of mine.

I want to start with the exception to the rule though, when do we need to print something...

Logging is NOT Print Debugging!

One of the most important debugging tools in our arsenal is a logger, but it is not the same as print debugging in any way:

	Logger	Print
Permanence of output	Permanent	Ephemeral
Permanence in code	Permanent	Should be removed
Globally Toggleable	Yes	No
Intention	Added as part of design	Added ad-hoc

A log is something we add with forethought, we want to keep the log for future bugs and might even want to expose it to the users. We can control its verbosity often at the module level and can usually disable it entirely. Its permanent in code and usually writes to a permanent file we can review at our leisure.

Print debugging is code we add to locate a temporary problem. If such a problem has the potential of recurring then a log would typically make more sense in the long run. This is true for almost every type of system, we see developers adding print statements and removing them constantly instead of creating a simple log to track frequent problems.

There are special cases where print debuggings makes some sense, in mission critical embedded systems a log might be impractical in terms of device constraints. Debuggers are awful in those environments and print debugging is a simple hack. Debugging system level tools like a kernel, compiler, debugger,or JIT can be difficult with a debugger. Logging might not make sense in all of these cases e.g. I dont want my JIT to print every bytecode its processing and the metadata involved.

Those are the exceptions, not the rules. Very few of us write such tools. I do, and even then its a fraction of my work. E.g. When working at Lightrun I was working on a production debugger. Debugging the agent code thats connected to the executable was one of the hardest things to do. A mix of C++ and JVM code thats connected to a completely separate binary... Print debugging of that portion was simpler, and even then we tried to aim towards logging. But the visual aspects of the debugger within the server backend and the IDE were perfect targets for the debugger.

Why Debug?

There are three reasons to use a debugger instead of printouts or even logs:

Features - modern debuggers can provide spectacular capabilities that are unfamiliar to many developers. Sadly, there are very few debugging courses in academia since its a subject thats hard to test.
Low overhead - in the past running with the debugger meant slow execution and a lot of overhead. This is no longer true. Many of us use the debug action when launching an application instead of running and theres no noticeable overhead for most applications. When there is overhead, some debuggers provide means to improve performance by disabling some features.
Library code - a debugger can step into a library or framework and track the bug there. Doing this with print debugging will require compiling code that you might not want to deal with.

I dug into the features I mentioned in my book and series on debugging (linked above) but lets pick a few fantastic capabilities of the debugger that I wrote about in the past.

For the sake of positive dialog, here are some of my top features of modern debuggers.

Tracepoints

Whenever someone opens the print debugging discussion all I hear is I dont know about tracepoints. They arent a new feature in debuggers, yet so few are aware of them. A tracepoint is a breakpoint that doesnt stop, it just keeps running. Instead of stopping you can do other things at that point, such as print to the console. This is similar to print debugging only it doesnt suffer from many of the drawbacks: no runtime overhead, no accidental commit to the code base, no need to restart the application when changing it etc.

https://www.youtube.com/watch?v=eXRqKqSp7x0

Grouping and Naming

The previous video/post included a discussion of grouping and naming. This lets us group tracepoints together, disable them as a group etc. This might seem like a minor feature until you start thinking about the process of print debugging. We slowly go through the code adding a print and restarting. Then suddenly we need to go back, or if a call comes in and we need to debug something else...

When we package the tracepoints and breakpoints into a group we can set aside a debugging session much like we set aside a branch in version control. It makes it much easier to preserve our train of thought and jump right back to the applicable lines of code.

Object Marking

When asked about my favorite debugging feature Im always conflicted, Object Marking is one of my top two features... It seems like a simple thing, we can mark an object and it gets saved with a specific name.

However, this is a powerful and important feature. I used to write down the pointers to objects or memory areas while debugging. This is valuable as sometimes an area of memory would look the same but would have a different address or it might be hard to track objects with everything going on. Object Marking allows us to save a global reference to an object, use it in conditional breakpoints or for visual comparison.

https://www.youtube.com/watch?v=DGjVVKCNosM

Renderers

My other favorite feature is the renderer, it lets us define how elements look in the debugger watch area. Imagine you have a sophisticated object hierarchy but rarely need that information... A renderer lets you customize the way IntelliJ/IDEA presents the object to you.

https://www.youtube.com/watch?v=oaUf8KXHsd0&t

Tracking New Instances

One of the often overlooked capabilities of the debugger is memory tracking. A Java debugger can show you a searchable set of all object instances in heap, that is a fantastic capability that can expose unintuitive behavior But it can go further, it can track new allocations of an object and provide you with the stack to the applicable object allocation.

https://www.youtube.com/watch?v=dFOFOEg2W4k

Tip of the Iceberg

I wrote a lot about debugging, theres no point in repeating all of that in this post. If youre a person who feels more comfortable using print debugging then ask yourself this: why?

Dont hide behind an out of date Brian Kernighan quote. Things change. Are you working in one of the edge cases where print debugging is the only option?

Are you treating logging as print debugging or vice versa?

Or is it just that print debugging was how your team always worked and it stuck in place. If its one of those then it might be time to re-evaluate the current state of debuggers.

Regenerate Immediately and RSS

Shai Almog — Sun, 04 Feb 2024 11:48:17 GMT

Note: this post was originally published on the gdocweb blog.

gdocweb has its first few users and one of the big complaints is about the tediousness of going through that wizard every time you just want to test a change to a document. I dont want gdocweb working in the background reading my documents and publishing, that would be an invasion of privacy. But I would like it to work instantly.

With that we now have a link to regenerate the site quickly. Not with a single click, you still need to go through a Google login for security. Once thats done you will reach the final stage of the wizard directly and the website will be updated. You can do that by visiting https://gdocweb.com/regenerateSite.

RSS (Really Simple Syndication) and Sitemap

Many of us go through life without knowing the RSS exists, its a workhorse that powers a great deal of functionality on the internet. Yet, we remain oblivious to it. RSS lets a website broadcast about the changes it went through, e.g. this blog now features an RSS feed that can notify you about every new post to the site.

Typically one would read an RSS feed using a dedicated application (e.g. Feedly), but browsers also have some basic support for RSS. The biggest benefit of RSS is in syndication, it means that other sites can publish a feed from this site detailing the latest bit of news. Its great for search engine optimization and a wonderful feature for users of your website. You can see the feed.xml RSS file here.

Sitemaps are even more important. They let search engines know about the pages you have in the website and the dates in which they were last updated. This helps search engines keep track of everything and makes your site easier to find. The sitemap for this site can be found here. Normally you wouldnt care about it, but search engines care about it...

Both files require a full URL to the generated pages in order to function. Unfortunately, this isnt trivial. The gdocweb blog can be reached both on: https://shai-almog.github.io/GdocwebBlog/ and on https://blog.gdocweb.com/. The latter is the correct link and the former correctly redirects to it, however if we have links to the former this will reduce the search engine ranking. We need to link to the correct blog URL but gdocweb cant guess it from the project name.

Thats why all of this good stuff will only work if you set the value of the Base URL entry in the GitHub repository selection stage of the wizard. Once that is set as shown in the following image, this will all work as expected.

As a bonus, setting this value will also set the canonical URL for each page. This is an important attribute of an HTML file that helps search engines find your website.

Target Directory and Automatic Merge

In the previous image we could see two additional features that are also quite important but mostly geared towards the technical crowd. The first is the target directory. By default, gdocweb generates everything into the GitHub Projects root directory. This is great for a site, however if youre building documentation for a pre-existing project then generating the project to a docs directory might be a better approach. The Target Directory option is a great tool for developers building documentation and websites for their projects.

gdocweb generates a pull request for the project and merges that pull request for you automatically. This default behavior might not be right for all projects and also includes a risk. You might want an additional review for changes. In that case you can uncheck the Merge Automatically flag and disable the default behavior. This means a run of the gdocweb wizard will result with a new pull request for you to merge manually. For me that has been a valuable debugging tool as I could experiment with changes to the blog without merging them in.

You can see the document that generated this post here.

strace Revisited: Simple is Beautiful

Shai Almog — Tue, 30 Jan 2024 12:46:38 GMT

In the realm of system debugging, particularly on Linux platforms, strace stands out as a powerful and indispensable tool. Its simplicity and efficacy make it the go-to solution for diagnosing and understanding system-level operations, especially when working with servers and containers. In this blog post, we'll delve into the nuances of strace, from its history and technical functioning to practical applications and advanced features. Whether you're a seasoned developer or just starting out, this exploration will enhance your diagnostic toolkit and provide deeper insights into the workings of Linux systems.

https://youtu.be/bgi7PJXtEzc

Understanding strace and its Origins

A Look Back: strace and dtrace

The journey of strace begins with its predecessor, dtrace, which we covered last time. However, dtrace's availability is limited, particularly on Linux systems where most server and container debugging takes place. This is where strace comes into the picture, offering a simpler yet effective alternative.

Originating from Sun Microsystems

strace, like dtrace, traces its roots back to Sun Microsystems, emerging in the 90s (a decade before dtrace). This isn't surprising given the impressive array of technologies that originated from Sun. However, strace differentiates itself by its straightforwardness in both usage and capabilities. Unlike DTrace, which demands deep operating system support and thus remained absent as an official feature in common Linux distributions, strace thrives in the Linux environment. Its simplicity and ease of implementation make it a popular choice for Linux users, offering a distinct approach to system diagnostics.

Technical Functioning of strace

The Role of ptrace in strace

The cornerstone of strace's functionality is the ptrace kernel feature. ptrace, pre-existing in Linux, spares users from the need to add additional kernel code or modules, a requirement often associated with DTrace. This fundamental difference not only simplifies the use of strace but also broadens its accessibility.

Comparing with DTrace

While DTrace offers a more in-depth analysis through deeper kernel support, strace operates on a more surface level. This simplicity, however, does not undermine its effectiveness. strace works essentially by logging every kernel call made by a process, providing verbose but incredibly detailed insights into the system's operation. This method allows users to trace the inner workings of a process, understanding each interaction with the kernel.

Practical Usage and Advantages

Ease of Use and Accessibility

One of the most appealing aspects of strace is its user-friendly nature. It doesn't require special privileges or complex setup procedures. This ease of use is particularly beneficial for developers and system administrators who need to quickly diagnose and address issues in a Linux environment. Unlike DTrace, strace is readily available and doesnt demand advanced configurations or permissions.

Favored in Linux Environments

strace's popularity in Linux circles is not only due to its accessibility but also its practicality. Being able to run without special privileges makes it a go-to tool for diagnosing various system-related issues. However, it's important to note that strace should be used cautiously in production environments. Its extensive logging can create a significant performance overhead, potentially impacting the efficiency of a live system. This is why strace is generally recommended for use in development or isolated testing environments rather than in production.

strace in Action: A Closer Look at System Calls

Basic Usage and Output Analysis

Using strace is straightforward: you simply pass the command line to it.

strace java -classpath . PrimeMain

This simplicity belies its power, as the output offers a wealth of information. Each line in the strace output corresponds to a system call made by the process as you can see below:

execve("/home/ec2-user/jdk1.8.0_45/bin/java", ["java", "-classpath.", "PrimeMain"], 0x7fffd689ec20 /* 23 vars */) = 0brk(NULL)                               = 0xb85000mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0294272000readlink("/proc/self/exe", "/home/ec2-user/jdk1.8.0_45/bin/j"..., 4096) = 35access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)open("/home/ec2-user/jdk1.8.0_45/bin/../lib/amd64/jli/tls/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)stat("/home/ec2-user/jdk1.8.0_45/bin/../lib/amd64/jli/tls/x86_64", 0x7fff37af09a0) = -1 ENOENT (No such file or directory)open("/home/ec2-user/jdk1.8.0_45/bin/../lib/amd64/jli/tls/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)stat("/home/ec2-user/jdk1.8.0_45/bin/../lib/amd64/jli/tls", 0x7fff37af09a0) = -1 ENOENT (No such file or directory)

By analyzing these calls, users can gain insights into the intricate operations of their applications. For instance, if a Java process attempts to load a library and fails, strace can reveal the underlying system call and its exit code, providing clues about potential issues like missing files or directories. E.g. in this line:

open("/home/ec2-user/jdk1.8.0_45/bin/../lib/amd64/jli/tls/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Java tries to load the pthread library from the tls directory using a system call open to load the file. The exit code of the system call is -1, which means that the file isn't there. Under normal circumstances, we should get back a file descriptor value from this API (positive non-zero integer). Looking in the directory, it seems the tls directory is missing. I'm guessing that this is because of a missing JCE (Java Cryptography Extensions) installation. This is probably OK but might have been interesting in some cases.

Interpreting System Calls for Debugging

The output of strace, while verbose, is a goldmine for troubleshooting. For example, a negative exit code in a system call indicates an error, such as a missing file, which could be crucial for diagnosing issues in an application. This level of detail, although overwhelming at times, is invaluable for understanding the interactions between your application and the Linux system.

Advanced Features and Tips

Filtering System Calls for Efficiency

A common challenge with strace is managing its voluminous output. Fortunately, strace offers options to filter system calls, significantly enhancing its usability. By using the -e argument, you can instruct strace to log only specific types of system calls, such as open or connect e.g.:

strace -e open java -classpath . PrimeMain

This selective logging not only makes the output more manageable but also allows for focused troubleshooting, speeding up the debugging process.

Exploring a Variety of System Calls

strace's utility extends beyond just tracking file access or network interactions. It can be used to monitor a range of system calls, offering insights into various aspects of application behavior. By understanding and utilizing different system calls, users can gain a comprehensive view of their application's interaction with the operating system, leading to more effective debugging and optimization.

strace and Java: A Special Case

strace with the JVM

While strace predates Java and operates at a low level with no specific awareness of the Java Virtual Machine (JVM), it remains highly effective for debugging Java applications. The JVM, like most platforms, relies on system calls for its operations, which strace can monitor and report. However, certain aspects of the JVM's behavior may be less visible to strace due to its unique approach to problem-solving.

Allocations and Threading in Java

For instance, Java's memory management differs significantly from standard system tools. While typical applications use malloc, which directly maps to kernel allocation logic, Java manages its own memory. This approach, aimed at efficiency and streamlined garbage collection, means that some memory allocation activities are obscured from strace's view.

Similarly, Java threading is currently well-represented in strace output, but this is changing with Java 21 and Project Loom. Java 21 added support for Virtual Threads which are only partially visible to the operating system hence 1,000 threads can seem like 16 threads. These changes could affect the clarity of strace outputs in complex, heavily threaded Java applications.

Final Word

strace stands out as an exceptionally versatile and powerful tool in the Linux debugging arsenal. Its ability to provide detailed insights into system calls makes it invaluable for diagnosing and understanding the inner workings of applications. Despite its simplicity, strace is capable of handling complex debugging scenarios, especially when used with its advanced filtering options.

For developers and system administrators working in Linux environments, strace is more than just a diagnostic tool; it's a lens through which the intricate interactions between applications and the operating system can be viewed and understood. As technologies evolve, tools like strace adapt, continuing to offer relevant and critical insights into system behaviors.

Whether you are troubleshooting a stubborn issue or simply curious about how your applications interact with the Linux kernel, strace is a tool that you will likely find yourself returning to time and again.

Styling and Dark Mode

Shai Almog — Fri, 26 Jan 2024 13:13:14 GMT

Note: this post was originally published on the gdocweb blog.

Styling a Document isnt as Trivial as you Might Assume

When using Google Docs its often tempting to reach out to the font or color toolbar to design your document as you see fit. This would work great for simple sites but as gdocweb evolves you might find it produces a result that isnt as attractive as you might want. The reason for that is due to two important features we just added to gdocweb: custom colors and dark mode.

Dark Mode Support

We recently updated gdocweb with a new theme: Adaptive.

The Adaptive theme is identical to the default Basic theme but when its running on a device set to dark mode it will render the website with a dark version of the theme. As you can see below, this blog automatically adapts to light or dark mode. However, the transition wasnt seamless.

Title Styling

The initial design of the blog was based on a template from Google Docs. In this template Introducing gdocweb was defined as a Heading 1 style, but it was later customized to appear a bit differently from the actual Heading 1 definition.

Headings are an important part of the web, in general your page should have only one Heading 1 and multiple Heading 2 elements (and possibly 3, 4, 5 etc.). If you break these rules you might suffer a search engine penalty. But it goes deeper than that, once you customize the style of the heading we cant override it. That results in dark blue text on black background. Not very readable for the most important line within a post...

The solution is simple and it includes two parts. The first is to use styles in the document by placing the cursor on the heading and invoking Update Heading 1 to match (or the appropriate style you want to customize). We then need to select Apply Heading 1 to re-apply it to the current line. Besides the obvious advantages with dark mode, this also helps keep the document consistent. E.g. if we want to change the font or color for all Heading 2 entries we can do that in one place and they will all update.

But the real advantage is in gdocweb. We can now customize specific colors within the theme. This is useful for dark mode but also useful for applying brand colors on top of the theme. We can pick replacement default colors for background and foreground of various elements and make the replacements apply only for light or dark mode. This is all a part of the themes section as seen in the image below.

Future Improvements

Ideally the themes should include the right colors out of the box but the ability to customize these colors is a crucial one. Let us know in the comments if you need deeper styling and what your technical level is. One of our planned features is a style override that will let you inject a custom style file to the page. However, this would require technical understanding and control of CSS, Im not sure if this is something that would fit with our main demographic.

You can see the document that generated this post here.

DTrace Revisited: Advanced Debugging Techniques

Shai Almog — Tue, 23 Jan 2024 15:12:51 GMT

When we think of debugging we think of breakpoints in IDEs, stepping over, inspecting variables, etc. However, there are instances where stepping outside the conventional confines of an IDE becomes essential to track and resolve complex issues. This is where tools like DTrace come into play, offering a more nuanced and powerful approach to debugging than traditional methods. This blog post delves into the intricacies of DTrace, an innovative tool that has reshaped the landscape of debugging and system analysis.

https://youtu.be/3M0AhZnVoUk

As a side note, if you like the content of this and the other posts in this series check out my Debugging book that covers this subject. If you have friends that are learning to code I'd appreciate a reference to my Java Basics book. If you want to get back to Java after a while check out my Java 8 to 21 book.

DTrace Overview

DTrace was first introduced by Sun Microsystems in 2004, DTrace quickly garnered attention for its groundbreaking approach to dynamic system tracing. Originally developed for Solaris, it has since been ported to various platforms, including MacOS, Windows, and Linux. DTrace stands out as a dynamic tracing framework that enables deep inspection of live systems from operating systems to running applications. Its capacity to provide real-time insights into system and application behavior without significant performance degradation marks it as a revolutionary tool in the domain of system diagnostics and debugging.

Understanding DTraces Capabilities

DTrace, short for Dynamic Tracing, is a comprehensive toolkit for real-time system monitoring and debugging, offering an array of capabilities that span across different levels of system operation. Its versatility lies in its ability to provide insights into both high-level system performance and detailed process-level activities.

System Monitoring and Analysis

At its core, DTrace excels in monitoring various system-level operations. It can trace system calls, file system activities, and network operations. This enables developers and system administrators to observe the interactions between the operating system and the applications running on it. For instance, DTrace can identify which files a process accesses, monitor network requests, and even trace system calls to provide a detailed view of what's happening within the system.

Process and Performance Analysis

Beyond system-level monitoring, DTrace is particularly adept at dissecting individual processes. It can provide detailed information about process execution, including CPU and memory usage, helping to pinpoint performance bottlenecks or memory leaks. This granular level of detail is invaluable for performance tuning and debugging complex software issues.

Customizability and Flexibility

One of the most powerful aspects of DTrace is its customizability. With a scripting language based on C syntax, DTrace allows the creation of customized scripts to probe specific aspects of system behavior. This flexibility means that it can be adapted to a wide range of debugging scenarios, making it a versatile tool in a developers arsenal.

Real-World Applications

In practical terms, DTrace can be used to diagnose elusive performance issues, track down resource leaks, or understand complex interactions between different system components. For example, it can be used to determine the cause of a slow file operation, analyze the reasons behind a process crash, or understand the system impact of a new software deployment.

Performance and Compatibility of DTrace

A standout feature of DTrace is its ability to operate with remarkable efficiency. Despite its deep system integration, DTrace is designed to have minimal impact on overall system performance. This efficiency makes it a feasible tool for use in live production environments, where maintaining system stability and performance is crucial. Its non-intrusive nature allows developers and system administrators to conduct thorough debugging and performance analysis without the worry of significantly slowing down or disrupting the normal operation of the system.

Cross-Platform Compatibility

Originally developed for Solaris, DTrace has evolved into a cross-platform tool, with adaptations available for MacOS, Windows, and various Linux distributions. Each platform presents its own set of features and limitations. For instance, while DTrace is a native component in Solaris and MacOS, its implementation in Linux often requires a specialized build due to kernel support and licensing considerations.

Compatibility Challenges on MacOS

On MacOS, DTrace's functionality intersects with System Integrity Protection (SIP), a security feature designed to prevent potentially harmful actions. To utilize DTrace effectively, users may need to disable SIP, which should be done with caution. This process involves booting into recovery mode and executing specific commands, a step that highlights the need for a careful approach when working with such powerful system-level tools.

We can disable SIP using the command:

csrutil disable

We can optionally use a more refined approach of enabling SIP without dtrace using the following command:

csrutil enable --without dtrace

Be extra careful when issuing these commands and when working on machines where dtrace is enabled. Back up your data properly!

Customizability and Flexibility of DTrace

A key feature that sets DTrace apart in the realm of system monitoring tools is its highly customizable nature. DTrace employs a scripting language that bears similarity to C syntax, offering users the ability to craft detailed and specific diagnostic scripts. This scripting capability allows for the creation of custom probes that can be fine-tuned to target particular aspects of system behavior, providing precise and relevant data.

Adaptability to Various Scenarios

The flexibility of DTrace's scripting language means it can adapt to a multitude of debugging scenarios. Whether it's tracking down memory leaks, analyzing CPU usage, or monitoring I/O operations, DTrace can be configured to provide insights tailored to the specific needs of the task. This adaptability makes it an invaluable tool for both developers and system administrators who require a dynamic approach to problem-solving.

Examples of Customizable Probes

Users can define probes to monitor specific system events, track the behavior of certain processes, or gather data on system resource usage. This level of customization ensures that DTrace can be an effective tool in a variety of contexts, from routine maintenance to complex troubleshooting tasks. Following in a simple hello world dtrace probe:

sudo dtrace -qn 'syscall::write:entry, syscall::sendto:entry /pid == $target/ { printf("(%d) %s %s", pid, probefunc, copyinstr(arg1)); }' -p 9999

The kernel is instrumented with hooks that match various callbacks. dtrace connects to these hooks and can perform interesting tasks when these hooks are triggered. They have a naming convention, specially: provider:module:function:name. In this case the provider is a system call in both cases. We have no module so we can leave that part blank between the colon (:) symbols. We grab a write operation and sendto entries. When an application will write or tries to send a packet, the following code event will trigger.

These things happen frequently which is why we restrict the process ID to the specific target with pid == $target. This means the code will only trigger for the PID passed to us in the command line. The rest of the code should be simple for anyone with basic C experience, it's a printf that would list the processes and the data passed.

Real-World Applications of DTrace

DTrace's diverse capabilities extend far beyond theoretical use, playing a pivotal role in resolving real-world system complexities. Its ability to provide deep insights into system operations makes it an indispensable tool in a variety of practical applications.

To get a sense of how dtrace can be used we can use the man -k dtrace command whose output on my mac is below:

bitesize.d(1m)           - analyse disk I/O size by process. Uses DTracecpuwalk.d(1m)            - Measure which CPUs a process runs on. Uses DTracecreatbyproc.d(1m)        - snoop creat()s by process name. Uses DTracedappprof(1m)             - profile user and lib function usage. Uses DTracedapptrace(1m)            - trace user and library function usage. Uses DTracedispqlen.d(1m)           - dispatcher queue length by CPU. Uses DTracedtrace(1)                - dynamic tracing compiler and tracing utilitydtruss(1m)               - process syscall details. Uses DTraceerrinfo(1m)              - print errno for syscall fails. Uses DTraceexecsnoop(1m)            - snoop new process execution. Uses DTracefddist(1m)               - file descriptor usage distributions. Uses DTracefilebyproc.d(1m)         - snoop opens by process name. Uses DTracehotspot.d(1m)            - print disk event by location. Uses DTraceiofile.d(1m)             - I/O wait time by file and process. Uses DTraceiofileb.d(1m)            - I/O bytes by file and process. Uses DTraceiopattern(1m)            - print disk I/O pattern. Uses DTraceiopending(1m)            - plot number of pending disk events. Uses DTraceiosnoop(1m)              - snoop I/O events as they occur. Uses DTraceiotop(1m)                - display top disk I/O events by process. Uses DTracekill.d(1m)               - snoop process signals as they occur. Uses DTracelastwords(1m)            - print syscalls before exit. Uses DTraceloads.d(1m)              - print load averages. Uses DTracenewproc.d(1m)            - snoop new processes. Uses DTraceopensnoop(1m)            - snoop file opens as they occur. Uses DTracepathopens.d(1m)          - full pathnames opened ok count. Uses DTraceperldtrace(1)            - Perl's support for DTracepidpersec.d(1m)          - print new PIDs per sec. Uses DTraceplockstat(1)             - front-end to DTrace to print statistics about POSIX mutexes and read/write lockspriclass.d(1m)           - priority distribution by scheduling class. Uses DTracepridist.d(1m)            - process priority distribution. Uses DTraceprocsystime(1m)          - analyse system call times. Uses DTracerwbypid.d(1m)            - read/write calls by PID. Uses DTracerwbytype.d(1m)           - read/write bytes by vnode type. Uses DTracerwsnoop(1m)              - snoop read/write events. Uses DTracesampleproc(1m)           - sample processes on the CPUs. Uses DTraceseeksize.d(1m)           - print disk event seek report. Uses DTracesetuids.d(1m)            - snoop setuid calls as they occur. Uses DTracesigdist.d(1m)            - signal distribution by process. Uses DTracesyscallbypid.d(1m)       - syscalls by process ID. Uses DTracesyscallbyproc.d(1m)      - syscalls by process name. Uses DTracesyscallbysysc.d(1m)      - syscalls by syscall. Uses DTracetopsyscall(1m)           - top syscalls by syscall name. Uses DTracetopsysproc(1m)           - top syscalls by process name. Uses DTraceTcl_CommandTraceInfo(3tcl), Tcl_TraceCommand(3tcl), Tcl_UntraceCommand(3tcl) - monitor renames and deletes of a commandbitesize.d(1m)           - analyse disk I/O size by process. Uses DTracecpuwalk.d(1m)            - Measure which CPUs a process runs on. Uses DTracecreatbyproc.d(1m)        - snoop creat()s by process name. Uses DTracedappprof(1m)             - profile user and lib function usage. Uses DTracedapptrace(1m)            - trace user and library function usage. Uses DTracedispqlen.d(1m)           - dispatcher queue length by CPU. Uses DTracedtrace(1)                - dynamic tracing compiler and tracing utilitydtruss(1m)               - process syscall details. Uses DTraceerrinfo(1m)              - print errno for syscall fails. Uses DTraceexecsnoop(1m)            - snoop new process execution. Uses DTracefddist(1m)               - file descriptor usage distributions. Uses DTracefilebyproc.d(1m)         - snoop opens by process name. Uses DTracehotspot.d(1m)            - print disk event by location. Uses DTraceiofile.d(1m)             - I/O wait time by file and process. Uses DTraceiofileb.d(1m)            - I/O bytes by file and process. Uses DTraceiopattern(1m)            - print disk I/O pattern. Uses DTraceiopending(1m)            - plot number of pending disk events. Uses DTraceiosnoop(1m)              - snoop I/O events as they occur. Uses DTraceiotop(1m)                - display top disk I/O events by process. Uses DTracekill.d(1m)               - snoop process signals as they occur. Uses DTracelastwords(1m)            - print syscalls before exit. Uses DTraceloads.d(1m)              - print load averages. Uses DTracenewproc.d(1m)            - snoop new processes. Uses DTraceopensnoop(1m)            - snoop file opens as they occur. Uses DTracepathopens.d(1m)          - full pathnames opened ok count. Uses DTraceperldtrace(1)            - Perl's support for DTracepidpersec.d(1m)          - print new PIDs per sec. Uses DTraceplockstat(1)             - front-end to DTrace to print statistics about POSIX mutexes and read/write lockspriclass.d(1m)           - priority distribution by scheduling class. Uses DTracepridist.d(1m)            - process priority distribution. Uses DTraceprocsystime(1m)          - analyse system call times. Uses DTracerwbypid.d(1m)            - read/write calls by PID. Uses DTracerwbytype.d(1m)           - read/write bytes by vnode type. Uses DTracerwsnoop(1m)              - snoop read/write events. Uses DTracesampleproc(1m)           - sample processes on the CPUs. Uses DTraceseeksize.d(1m)           - print disk event seek report. Uses DTracesetuids.d(1m)            - snoop setuid calls as they occur. Uses DTracesigdist.d(1m)            - signal distribution by process. Uses DTracesyscallbypid.d(1m)       - syscalls by process ID. Uses DTracesyscallbyproc.d(1m)      - syscalls by process name. Uses DTracesyscallbysysc.d(1m)      - syscalls by syscall. Uses DTracetopsyscall(1m)           - top syscalls by syscall name. Uses DTracetopsysproc(1m)           - top syscalls by process name. Uses DTrace

There's a lot here, we don't need to read everything. The point is that when you run into a problem you can just search through this list and find a tool dedicated to debugging that problem.

Lets say you're facing elevated disk write issues that are causing the performance of your application to degrade... But is it your app at fault or some other app?

rwbypid.d can help you with that, it can generate a list of processes and the number of calls they have for read/write based on the process id as seen in the following screenshot:

We can use this information to better understand IO issues in our code or even in 3rd party applications/libraries. iosnoop is another tool that helps us track IO operations but with more details:

In diagnosing elusive system issues, DTrace shines by enabling detailed observation of system calls, file operations, and network activities. For instance, it can be used to uncover the root cause of unexpected system behaviors or to trace the origin of security breaches, offering a level of detail that is often unattainable with other debugging tools.

Performance optimization is main area where DTrace demonstrates its strengths. It allows administrators and developers to pinpoint performance bottlenecks, whether they lie in application code, system calls, or hardware interactions. By providing real-time data on resource usage, DTrace helps in fine-tuning systems for optimal performance.

Final Words

In conclusion, DTrace stands as a powerful and versatile tool in the realm of system monitoring and debugging. We've explored its broad capabilities, from in-depth system analysis to individual process tracing, and its remarkable performance efficiency that allows for its use in live environments. Its cross-platform compatibility, coupled with the challenges and solutions specific to MacOS, highlights its widespread applicability. The customizability through scripting provides unmatched flexibility, adapting to a myriad of diagnostic needs. Real-world applications of DTrace in diagnosing system issues and optimizing performance underscore its practical value.

DTrace's comprehensive toolkit offers an unparalleled window into the inner workings of systems, making it an invaluable asset for system administrators and developers alike. Whether it's for routine troubleshooting or complex performance tuning, DTrace provides insights and solutions that are essential in the modern computing landscape.

Blog Comments

Shai Almog — Fri, 19 Jan 2024 15:15:52 GMT

Note: this post was originally published on the gdocweb blog.

Comments are Now Available in gdocweb

gdocweb draws its power from Google Docs. It also inherits some of the Google Docs limitations. One such limitation is the rigidness of the platform, a great example would be in blog comments. With most tools we can simply add a comment widget and end on a high note. However, Google Docs has no feature like that.

One suggested solution is to extend Google Docs, but I think it goes a bit against the grain of a product like gdocweb. Ideally, we just want to give a small hint: comments should go here. This should be simple. Ideally, it would be hidden. That way the document can be used both in print and for the web, but we found no way to do that. In that spirit, we chose to borrow an idea from markdown and use a special syntax to convey special meanings.

We do that by using a special syntax ::COMMENTS::. The quotes arent part of the syntax. In order to use this hint it must be the first entry in a paragraph, otherwise it will be ignored. You can see how this works by looking at the Google Doc we used to generate this post.

Setting up Comments

Typically, a comment requires server side code and a lot of messy management. Its a pain. These comments rely on a tool called utterances. Utterances uses GitHubs issue tracker which was designed to track bugs, as part of that it includes extensive comment and discussion capabilities. If an issue doesnt exist, utterances will automatically create that issue for you. It created this issue for the comments in this page...

As the administrator of the GitHub repository I can delete comments and block commenters. Comments can include rich text using markdown and are pretty powerful.

The server side functionality is handled by a GitHub application from Utterance. To get it working you first need to install the Utterance app and give it permissions. When visiting https://github.com/apps/utterances you will need to click install:

This will lead you to this page in which you will need to select the account for the app. Since the blog is hosted on my personal GitHub account I chose that.

Next we need to pick the repositories or alternatively allow utterance access to all repositories. Once this is done the comments should work on any page where they are included...

Future Improvements

This is the tip of the iceberg. We plan to include additional hints to allow features ranging from custom HTML to sophisticated stylings and customizations. Let us know what you think in the comments!

You can see the document that generated this post here.

Building gdocweb with Java 21, Spring Boot 3.x and Beyond

Shai Almog — Tue, 16 Jan 2024 11:18:16 GMT

Starting a new project is always a mix of excitement and tough decisions, especially when you're stitching together familiar tools like Google Docs with powerhouses like GitHub Pages. This is the story of building gdocweb, a tool that I hoped would make life easier for many. I'll be diving into why I chose Java 21 and Spring Boot 3.x, ditched GraalVM after some trial and error, and why a simple VPS with Docker Compose won out over more complex options. I also went with Postgres and JPA, but steered clear of migration tools like Flyway. It's a no-frills, honest recount of the choices, changes, and the occasional 'aha' moments of an engineer trying to make something useful and efficient.

Introducing gdocweb

Before we dive into the technical intricacies and the decision-making labyrinth of building gdocweb, let's set the stage by understanding what gdocweb is and the problem it solves. In simple terms, gdocweb connects Google Docs to GitHub Pages. It's a simple web builder that generates free sites with all the raw power of GitHub behind it and all the usability of Google Docs.

I decided to build gdocweb to eliminate the complexities typically associated with website building and documentation. It's for users who seek a hassle-free way to publish and maintain their content, but also for savvy users who enjoy the power of GitHub but don't want to deal with markdown nuances.

Here's a short video explaining gdocweb for the general public:

https://youtu.be/aaDBFVx6qC8

Java 21 and Spring Boot 3.x: Innovation and Maturity

When you're spearheading a project on your own, like I was with gdocweb, you have the liberty to make technology choices that might be more challenging in a team or corporate environment. This freedom led me to choose Java 21 and Spring Boot 3.x for this project. The decision to go with the current Long-Term Support (LTS) version of Java was a no-brainer. It's always tempting to use the latest and greatest, but with Java 21, it wasn't just about using something new; it was about leveraging a platform that has stood the test of time and has evolved to meet modern development needs. Virtual threads were a major part of the decision to go with Java 21. Cost is a huge factor in such projects, and squeezing the maximum throughput from a server is crucial in these situations.

Java, being a mature technology, offered a sense of reliability even in its latest iteration. Similarly, Spring Boot 3.x, despite being a newer version, comes from a lineage of robust and well-tested frameworks. It's a conservative choice in the sense of its long-standing reputation, but innovative in its features and capabilities.

However, this decision wasn't without its hiccups. During the process of integrating Google API access, I had to go through a security CASA tier 2 review. Here's where the choice of Java 21 threw a curveball. The review tool was tailored for JDK 11, and although it worked with JDK 21, it still added a bit of stress to the process. It was a reminder that when you're working with cutting-edge versions of technologies, there can be unexpected bumps along the road. Even if they are as mature as Java.

The transition to Spring Boot 3.x had its own set of challenges, particularly with the changes in security configurations. These modifications rendered most online samples and guides obsolete, breaking a lot of what I had initially set up. It was a learning curve, adjusting to these changes and figuring out the new way of doing things. However, most other aspects were relatively simple and the best compliment I can give to Spring Boot 3.x is that it's very similar to Spring Boot 2.x.

GraalVM Native Image for Efficiency

My interest in GraalVM native image for gdocweb was primarily driven by its promise of reduced memory usage and faster startup times. The idea was that with lower memory requirements, I could run more server instances, leading to better scalability and resilience. Faster startup times also meant quicker recovery from failures, a crucial aspect for maintaining a reliable service.

Implementing GraalVM

Getting GraalVM to work was nontrivial but not too hard. After some trial and error, I managed to set up a Continuous Integration (CI) process that built the GraalVM project and uploaded it to Docker. This was particularly necessary because I'm using an M2 Mac, while my server runs on Intel architecture. This setup meant I had to deal with an 18-minute wait time for each update a significant delay for any development cycle.

Facing the Production Challenges

Things started getting rocky when I started to test the project production and staging environments. It became a 'whack-a-mole' scenario with missing library code from the native image. Each issue fixed seemed to only lead to another, and the 18-minute cycle for each update added to the frustration.

The final straw was realizing the incompatibility issues with Google API libraries. Solving these issues would require extensive testing on a GraalVM build, which was already burdened by slow build times. For a small project like mine, this became a bottleneck too cumbersome to justify the benefits.

The Decision to Move On

While GraalVM seemed ideal on paper for saving resources, the reality was different. It consumed my limited GitHub Actions minutes and required extensive testing, which was impractical for a project of this scale. Ultimately, I decided to abandon the GraalVM route.

If you do choose to use GraalVM then this was the GitHub Actions script I used, I hope it can help you with your journey:

name: Java CI with Mavenon:  push:    branches: [ "master" ]  pull_request:    branches: [ "master" ]jobs:  build:    runs-on: ubuntu-latest    services:      postgres:        image: postgres:latest        env:          POSTGRES_PASSWORD: yourpassword        ports:          - 5432:5432        options: >-          --health-cmd pg_isready          --health-interval 10s          --health-timeout 5s          --health-retries 5    steps:    - uses: actions/checkout@v3    - uses: graalvm/setup-graalvm@v1      with:        java-version: '21'        version: '22.3.2'         distribution: 'graalvm'        cache: 'maven'        components: 'native-image'        native-image-job-reports: 'true'        github-token: ${{ secrets.GITHUB_TOKEN }}    - name: Wait for PostgreSQL      run: sleep 10    - name: Build with Maven      run: mvn -Pnative native:compile    - name: Build Docker Image      run: docker build -t autosite:latest .    - name: Log in to Docker Hub      uses: docker/login-action@v1      with:        username: ${{ secrets.DOCKERHUB_USERNAME }}        password: ${{ secrets.DOCKERHUB_TOKEN }}    - name: Push Docker Image      run: |        docker tag autosite:latest mydockeruser/autosite:latest        docker push mydockeruser/autosite:latest

This configuration was a crucial part of my attempt to leverage GraalVM's benefits, but as the project evolved, so did my understanding of the trade-offs between idealism in technology choice and practicality in deployment and maintenance.

Deployment: VPS and Docker Compose

When it came to deploying gdocweb, I had a few paths to consider. Each option came with its pros and cons, but after careful evaluation, I settled on using a Virtual Private Server (VPS) with Docker Compose. Heres a breakdown of my thought process and why this choice made the most sense for my needs.

Avoiding Raw VPS Deployment

I immediately ruled out the straightforward approach of installing the application directly on a VPS. This method fell short in terms of migration ease, testing, and flexibility. Containers offer a more streamlined and efficient approach. They provide a level of abstraction and consistency across different environments, which is invaluable.

Steering Clear of Managed Containers & Orchestration

Managed containers and orchestration (e.g k8s) were another option, and while they offer scalability and ease of management, they introduce complexity in other areas. For instance, when using a managed Kubernetes service it would often mean relying on cloud storage for databases, which can get expensive quickly. My philosophy was to focus on cost before horizontal scale, especially in the early stages of a project.

If we don't optimize and stabilize when we're small, the problems will only get worse as we grow. Scaling should ideally start with vertical scaling before moving to horizontal, vertical scaling means more CPU/RAM while horizontal adds additional machines. Vertical scaling is not only more cost-effective but also crucial from a technical standpoint. It makes it easier to identify performance bottlenecks using simple profiling tools.

In contrast, horizontal scaling can often mask these issues by adding more instances, which could lead to higher costs and hidden performance problems.

The Choice of Docker Compose

Docker Compose emerged as the clear winner for several reasons. It allowed me to seamlessly integrate the database and the application container. Their communication is contained within a closed network, adding an extra layer of security with no externally open ports. Moreover, the cost is fixed and predictable, with no surprises based on usage.

This setup offered me the flexibility and ease of containerization without the overhead and complexity of more extensive container orchestration systems. It was the perfect middle ground, providing the necessary features without overcomplicating the deployment process.

By using Docker Compose, I maintained control over the environment and kept the deployment process straightforward and manageable. This decision aligned perfectly with the overall ethos of gdocweb simplicity, efficiency, and practicality.

Front-End: Thymeleaf Over Modern Alternatives

The front-end development of gdocweb presented a bit of a challenge for me. In an era where React and similar frameworks are dominating the scene, opting for Thymeleaf might seem like a step back. However, this decision was based on practical considerations and a clear understanding of the project's requirements and my strengths as a developer.

React: Modern but Not a One-Size-Fits-All Solution

React is undeniably modern and powerful, but it comes with its own set of complexities. My experience with React is akin to many developers dabbling outside their comfort zone - functional but not exactly proficient. I've seen the kind of perplexed expressions from seasoned React developers when they look at my code, much like the ones I have when I'm reading complex Java code written by others.

Reacts learning curve, coupled with its slower performance in certain scenarios and the risk of not achieving an aesthetically pleasing result without deep expertise, made me reconsider its suitability for gdocweb.

The Appeal of Thymeleaf

Thymeleaf, on the other hand, offered a more straightforward approach, aligning well with the project's ethos of simplicity and efficiency. Its HTML-based interfaces, while perhaps seen as antiquated next to frameworks like React, come with substantial advantages:

Simplicity in Page Flow: Thymeleaf provides an easy-to-understand and easy-to-debug flow, making it a practical choice for a project like this.
Performance and Speed: Its known for its fast performance, which is a significant factor in providing a good user experience.
No Need for NPM: Thymeleaf eliminates the need for additional package management, reducing complexity and potential vulnerabilities.
Lower Risk of Client-Side Vulnerabilities: The server-side nature of Thymeleaf inherently reduces the risk of client-side issues.

Considering HTMX for Dynamic Functionality

The idea of incorporating HTMX for some dynamic behavior in the front-end did cross my mind. HTMX has been on my radar for a while, promising to add dynamic functionalities easily. However, I had to ask myself if it was truly necessary for a tool like gdocweb, which is essentially a straightforward wizard. My conclusion was that opting for HTMX might be more of Resume Driven Design (RDD) on my part, rather than a technical necessity.

In summary, the choice of Thymeleaf was a blend of practicality, familiarity, and efficiency. It allowed me to build a fast, simple, and effective front-end without the overhead and complexity of more modern frameworks, which, while powerful, weren't necessary for the scope of this project.

Final Word

The key takeaway in this post is the importance of practicality in technology choices. When we're building our own projects it's much easier to experiment with newer technologies, but this is a slippery slope. We need to keep our feet grounded in familiar territories while experimenting.

My experience with GraalVM highlights the importance of aligning technology choices with project needs and being flexible in adapting to challenges. Its a reminder that in technology, sometimes the simpler, tried-and-tested paths can be the most effective.

Introducing gdocweb

Shai Almog — Fri, 12 Jan 2024 10:29:41 GMT

This Post is a Google Docs Document

Note: This post was originally published here.

Yes. This post started off as a Google Docs document that you can see by clicking that link. Im assuming youre seeing it on the web which is where its supposed to be. It was translated to a web page seamlessly using gdocweb. Gdocweb converts Google Docs to websites or more importantly to GitHub Pages. GitHub provides free website hosting, including custom domains. That means you end up with a no-strings-attached website.

This is pretty convenient as we dont have to write HTML or maintain files on GitHub. We can collaborate using Google Docs and get a fully functional site.

How does it Work?

In order to build this blog website we need to first create a few Google Docs documents. You can look for inspiration in the templates section on gdocweb.com. Since this is a blog site I will create two documents. The first is this document that youre reading right now, its the content of the blog. The second document is the homepage of the blog which will include the latest entries and links to the full posts. To link I can create a link to a web page as usual, or to another Google Docs document. Important: link translation will only work for documents that are in the current set of exported documents. If you link to a document that isnt a part of the site it would remain a link to a Google Docs document. Furthermore, theres a special case of linking to the current page. Those links arent translated since its assumed that you might want to provide readers with the ability to edit/review the actual document.

This is also a good time to sign up to GitHub if you dont have an account yet. Its the site that lets us host the website for free.

Once we create some documents we need to login on gdocweb.com and most crucially, approve the permissions to access both documents and drive. This is essential to scan the files and is 100% read-only. No data from your documents is retained or saved into disk. See the image below to understand the permission dialog from Google. Also notice that I just dragged this image into Google Docs and its now seamlessly a part of this website and blog post

TIP: When placing images into a document I would suggest checking their file size and trying to keep it below 200kb. Otherwise the performance of your website might suffer.

Once we go through that step we are faced with a list of our documents. You can use your browser's search function to find the relevant entry. Once there we can select the documents we want to include in the site. Notice that your selections are saved and will be shown the next time around.

In this case we need the index page and the first blog post. In the future we will need to add any new blog post we write. You can drag and drop the documents to arrange their order. You can hide them from navigation or make them a submenu. The first entry will always serve as an index.html file, the file that will be shown on the root directory.

Once we made the selection and arranged everything we press next. At this point we are presented with the themes page where we can choose the theme and the menu style. For menu the options are:

Horizontal - a standard pull down menu bar.
Vertical Sidebar

In this case I picked the vertical sidebar which will provide us with a list of all the posts as we write more of them. We can also pick the theme. I picked the Basic theme, were working on additional theme options to provide more choices.

The next step is Authorize GitHub Access which is required to save the generated website. This step happens only once, after you approve access to GitHub you will be redirected to a page where you can select the target repository. Since this is the first time Im running the wizard I typed in a new repository name, after the first time you can pick the same repository and the process will be smoother as you wont have to follow some of the additional steps.

The Finish button updates or creates the website for you. Once it finishes you will be presented with the final step. If this isnt the first time youre creating this website then your site will be up in a couple of minutes. If it is the first time youre running this process then there are a couple of additional steps.

You can check out the GitHub project for this blog right here.

Only on the First Time

You need to click the here button to go to the settings section on GitHub in the newly created repository. Which looks like the following picture.

Under the branch section we need to click the combo box that says None and select main (or master for older repositories). Then select root on the next combo and click Save as shown in the following image.

Once this is done it will take a couple of minutes to deploy the website to a free URL which you can see by clicking the View Your Site button.

Extra Credit - Domain

One of the cool features of GitHub is that it lets you bind a domain (a URL name like gdocweb.com) to a website for free. In this case I want to bind blog.gdocweb.com to the new blog. We can do that with the following steps. We go back to the GitHub settings page we saw previously as shown in the following image:

Notice theres a link on that page to a tutorial on domain binding. I wont repeat it here but the gist of it is that you need to add a CNAME entry for the new domain. In this case on cloudflare I used the following to bind the domain. Notice I disabled the proxy functionality as it isnt needed for GitHub.

Notice that this means you can only have one domain per GitHub account since this isnt project specific. Once this is saved I can type my new domain in GitHub and save. Notice that this might take a while since HTTPS certificate generation isnt immediate.

DNS sometimes takes a while to complete, if youre confident the settings are correct click retry a couple of times.

Future Improvements

gdocweb was designed as a general purpose tool for generating websites. Not as a blogging platform. As such it is lacking in many regards:

No commenting system.
We have to generate our own blog index.
No search or elaborate functionality.
Themes are still rudimentary.

I would like to improve on all of these when moving forward with this tool. If you feel strongly about any of these let us know on our social channels. This will help us define priorities for the basic issues.

Not a Single Trace

Shai Almog — Tue, 03 Oct 2023 12:44:34 GMT

Your team celebrates a success story where a trace identified a pesky latency issue in your application's authentication service. A fix was swiftly implemented, and we all celebrated a quick win in the next team meeting. But the celebrations are short-lived. Just days later, user complaints surged about a related payment gateway timeout. It turns out that the fix we made did improve performance at one point but created a situation in which key information was never cached. Other parts in the software react badly to the fix and we need to revert the whole thing.

While the initial trace provided valuable insights into the authentication service, it didnt explain why the system was built in this way. Relying solely on a single trace has given us a partial view of a broader problem.

This scenario underscores a crucial point: while individual traces are invaluable, their true potential is unlocked only when they are viewed collectively and in context. Let's delve deeper into why a single trace might not be the silver bullet we often hope for and how a more holistic approach to trace analysis can paint a clearer picture of our system's health and the way to combat problems.

The Limiting Factor

The first problem is the narrow perspective. Imagine debugging a multi-threaded Java application. If you were to only focus on the behavior of one thread, you might miss how it interacts with others, potentially overlooking deadlocks or race conditions.

Let's say a trace reveals that a particular method, fetchUserData(), is taking longer than expected. By optimizing only this method, you might miss that the real issue is with the synchronize block in another related method, causing thread contention and slowing down the entire system.

Temporal blindness is the second problem. Think of a Java Garbage Collection (GC) log. A single GC event might show a minor pause, but without observing it over time, you won't notice if there's a pattern of increasing pause times indicating a potential memory leak.

A trace might show that a Java application's response time spiked at 2 PM. However, without looking at traces over a longer period, you might miss that this spike happens daily, possibly due to a scheduled task or a cron job that's putting undue stress on the system.

The last problem is related to that and is the context. Imagine analyzing the performance of a Java method without knowing the volume of data it's processing. A method might seem inefficient, but perhaps it's processing a significantly larger dataset than usual.

A single trace might show that a Java method, processOrders(), took 5 seconds to execute. However, without context, you wouldn't know if it was processing 50 orders or 5,000 orders in that time frame. Another trace might reveal that a related method, fetchOrdersFromDatabase(), is retrieving an unusually large batch of orders due to a backlog, thus providing context to the initial trace.

Strength in Numbers

Think of traces as chapters in a book and metrics as the book's summary. While each chapter (trace) provides detailed insights, the summary (metrics) gives an overarching view. Reading chapters in isolation might lead to missing the plot, but when read in sequence and in tandem with the summary, the story becomes clear.

We need this holistic view. If individual traces show that certain Java methods like processTransaction() are occasionally slow, grouped traces might reveal that these slowdowns happen concurrently, pointing to a systemic issue. Metrics, on the other hand, might show a spike in CPU usage during these times, indicating that the system might be CPU-bound during high transaction loads.

This helps us distinguish between correlation and causation. Grouped traces might show that every time the fetchFromDatabase() method is slow, the updateCache() method also lags. While this indicates a correlation, metrics might reveal that cache misses (a specific metric) increase during these times, suggesting that database slowdowns might be causing cache update delays, establishing causation.

This is especially important in performance tuning. Grouped traces might show that the handleRequest() method's performance has been improving over several releases. Metrics can complement this by showing a decreasing trend in response times and error rates, confirming that recent code optimizations are having a positive impact.

I wrote about this extensively in a previous post about the Tong motion needed to isolate an issue. This motion can be accomplished purely through the use of observability tools such as traces, metrics, and logs.

Example

Observability is somewhat resistant to examples, everything I try to come up with feels a bit synthetic and unrealistic when I examine it after the fact. Having said that, I looked at my modified version of the venerable Spring Pet Clinic demo using digma.ai. Running it showed several interesting concepts taken by Digma.

Probably the most interesting feature is the ability to look at whats going on in the server at this moment. This is an amazing exploratory tool that provides a holistic view for a moment in-time. But the thing I want to focus on is the Insights column on the right. Digma tries to combine the separate traces into a coherent narrative. Its not bad at it but its still a machine, some of that value should probably still be done manually since it cant understand the why, only the what. It seems it can detect the venerable Spring N+1 problem seamlessly.

But this is only the start. One of my favorite things is the ability to look at tracing data next to a histogram and list of errors in a single view. Is performance impacted because there are errors?

How impactful is the performance on the rest of the application?

These become questions with easy answers at this point. When we see all the different aspects laid together.

Magical APIs

The N+1 problem I mentioned before is a common bug in Java Persistence API (JPA). The great Vlad Mihalcea has an excellent explanation. The TL;DR is rather simple. We write a simple database query using ORM. But we accidentally split the transaction causing the data to be fetched N+1 times where N is the number of records we fetch.

This is painfully easy to do since transactions are so seamless in JPA. This is the biggest problem in magical APIs like JPA. These are APIs that do so much that they feel like magic, but under the hood they still run regular old code, when that code fails its very hard to see what goes on. Observability is one of the best ways to understand why these things fail.

In the past, I used to reach to the profiler for such things, which would often entail a lot of work. Getting the right synthetic environment for running a profiling session is often very challenging. Observability lets us do that without the hassle.

Final Word

Relying on a single individual trace is akin to navigating a vast terrain with just a flashlight. While these traces offer valuable insights, their true potential is only realized when viewed collectively. The limitations of a single trace, such as a narrow perspective, temporal blindness, and lack of context, can often lead developers astray, causing them to miss broader systemic issues.

On the other hand, the combined power of grouped traces and metrics offers a panoramic view of system health. Together, they allow for a holistic understanding, precise correlation of issues, performance benchmarking, and enhanced troubleshooting. For Java developers, this tandem approach ensures a comprehensive and nuanced understanding of applications, optimizing both performance and user experience. In essence, while individual traces are the chapters of our software story, it's only when they're read in sequence and in tandem with metrics that the full narrative comes to life.

Debugging Tips and Tricks: A Comprehensive Guide

Shai Almog — Tue, 19 Sep 2023 14:31:11 GMT

Debugging is an integral part of software development. While we often discuss general strategies to tackle issues, it's essential to delve deeper into specific techniques that can enhance our debugging productivity. Here's a comprehensive guide to some core debugging tips and tricks.

https://youtu.be/vcT6QVdPN0g

Rubber Ducking: The Art of Talking it Out

The term "Rubber Ducking" traces back to a developer who carried a rubber duck to converse with when confronted with a problem.

Articulating the problem often highlights nuances we might overlook. This method remains effective even when conversing with an inanimate object or merely running the exercise mentally. Its a simple process:

Begin by stating, Heres the problem
Share your theory about the potential source of the issue.
Discuss why other parts of the code arent causing the problem.
Detail your verification process for each statement.

The duck is optional, but surprisingly helpful when practicing this approach. This is a helpful technique for developers who are often embarrassed by teammates. Some of us feel inadequate in such scenarios, speaking to a duck is freeing as theres no judgment.

As a very experienced developer, I gained the insight of no shame. I make fun of my bugs and have a great laugh when the junior developer finds my bug. Experienced developers make stupid bugs all the time, we just dont care because experience teaches us that everyone makes stupid mistakes

Moving the Goalposts: Redefining the Bug

Bugs often start with a user-level description. As we delve deeper, we might discover the root cause lies elsewhere. Redefining the bug narrows our focus, making it easier to pinpoint the issue. This iterative process is not only a mental exercise but also aids team communication.

The phrase "Moving the Goalposts" might initially sound like a negative term, suggesting inconsistency or a lack of clarity. However, when applied to debugging, it becomes a powerful technique that can streamline the problem-solving process. Let's delve deeper into this concept.

The Evolution of a Bug

When users or testers report a bug, they often describe it in terms of its symptoms. For instance, a user might say, "The application crashes when I click this button." This is a user-level description, which, while accurate, is symptomatic of a deeper issue. As developers, our task is to trace the symptom back to its root cause.

The Process of Redefinition

Initial Identification: Start with the user-level description. This is our starting point, our initial goalpost.
Dive Deeper: As we investigate, we might find that the crash is due to a particular function failing. Now, our understanding of the bug has evolved. It's no longer just about a button causing a crash; it's about a function not performing as expected.
Narrowing Down: Further investigation might reveal that the function fails because of incorrect data being fed into it. Now, the bug has been redefined again. The goalpost has moved from a UI element (the button) to a backend function, and then to data input.
Update Documentation: It's crucial to update the bug's description in the issue tracker to reflect our current understanding. This ensures that the team is aligned and that if someone else picks up the task, they have the most recent information.

Flipping the Direction: Multiple Angles of Attack

Every system has multiple angles of approach. If one direction doesn't yield results, try another. Engage in "pair debugging" with a teammate to gain fresh perspectives on the problem.

Just as a detective might approach a case from various angles to uncover the truth, developers can employ multiple strategies to identify and resolve bugs. The concept of "Flipping the Direction" emphasizes the importance of versatility and adaptability in the debugging process. Let's explore this idea further.

The Linear Approach to Debugging

Traditionally, when faced with a bug, a developer might follow a linear path:

Identify the Symptom: Recognize the issue based on user reports or personal observations.
Trace the Code: Follow the code flow to identify where things might be going awry.
Isolate the Issue: Narrow down to the specific section or line of code causing the problem.
Implement a Fix: Modify the code to resolve the issue.

While this approach is systematic and often effective, it might not always lead to a solution, especially with complex or elusive bugs.

"Flipping the Direction" is about challenging the conventional linear approach. It's about understanding that there isn't just one way to approach a problem. Here's how it can be done:

Reverse Engineering: Instead of starting from the symptom and tracing forward, begin at the end result and work backward. This can often highlight overlooked aspects or assumptions.
Change the Environment: If a bug is hard to reproduce in one environment, try replicating it in another. This can expose conditions or dependencies that might be causing the issue.
Collaborative Debugging: Engage in "pair debugging." A fresh pair of eyes can offer a different perspective, potentially identifying something you might have missed.
Challenge Assumptions: If you're convinced that a particular module or function is the source of the bug, deliberately look elsewhere. Sometimes, the real issue lies in the least expected places.

"Flipping the Direction" is more than just a debugging technique; it's a mindset. It encourages developers to be adaptable, to challenge their assumptions, and to recognize that there's always more than one way to solve a problem.

Disruptive Environments: Exposing Hidden Bugs

Hard-to-reproduce bugs can be maddening. To unearth them we can use disruption such as introducing external limiting factors e.g. network throttling or slow-motion modes. Disruption can even be switching your OS or development environment. For instance, toggling between Firefox and Chrome dev tools can offer different insights.

Hidden bugs are those that don't readily present themselves under standard testing or operational conditions. They might be triggered by:

Unusual user behaviors.
Specific combinations of actions.
Rare environmental conditions.
External system interactions.

Because of their elusive nature, these bugs often slip through standard testing phases and can be a source of significant frustration for developers.

Here are some tricks I used in the past to disrupt an environment I was debugging:

Network Throttling: By intentionally slowing down the network speed, developers can simulate conditions like poor connectivity. This can reveal issues related to data synchronization, timeouts, or resource loading.
Resource Limitation: Limiting system resources, such as memory or CPU, can expose bugs related to resource management, memory leaks, or inefficient algorithms.
Environment Switching: Changing the operating system, browser, or even hardware can bring to light compatibility issues or platform-specific bugs.
External Interferences: Connecting to different networks, like a tethered phone connection, can introduce unexpected variables. For instance, an application might inadvertently rely on specific network topologies or configurations.
Simulating Failures: Intentionally causing certain components or services to fail can help identify weaknesses in error handling or recovery mechanisms.
Time Manipulation: Altering system time or simulating different time zones can expose bugs related to scheduling, time calculations, or event triggering.

Leveraging Debugging Extensions and Tools

Familiarize yourself with the debugging tools specific to your development environment. These tools can provide deeper insights and even disrupt the application in ways that expose hidden issues.

While human intuition and experience play a significant role in debugging, the complexity of modern software often demands more precise and specialized approaches. Debugging tools provide insight that allows developers to peer into the inner workings of applications, revealing how data flows, how components interact, and where bottlenecks or errors might occur.

Automated tools can quickly pinpoint issues, reducing the time and effort required for manual debugging. Finally, they offer exact data, ensuring that developers address the root cause of a problem rather than its symptoms.

Tools are very domain-specific, in my current project I had to build custom tooling to enable debuggability however in most cases we can rely on some of these:

Browser Developer Tools: Modern browsers come equipped with powerful developer tools that allow for deep inspection of web pages. They can monitor network requests, inspect DOM elements, profile performance, and set breakpoints in JavaScript code. For instance, while Firefox's developer tools are popular for certain tasks, Chrome's DevTools might offer a different perspective on the same issue.
IDE Debuggers: Integrated Development Environments (IDEs) often have built-in debuggers that allow developers to step through code, watch variable values, and evaluate expressions in real time.
Profiling Tools: These tools monitor software performance, helping developers identify memory leaks, CPU bottlenecks, or inefficient algorithms.
Static Analysis Tools: By analyzing code without executing it, these tools can detect potential issues like code smells, security vulnerabilities, or violations of coding standards.
Logging, Observability and Monitoring Tools: Systems like Spring's actuator or automated logging aspects can provide real-time insights into application behavior, helping developers trace issues as they occur. Observability and developer observability tools can provide deep insight into production issues.
Specialized Environment Tools: Tools like JMX (Java Management Extensions) allow for deep monitoring, management, and configuration of Java applications.
Simulators and Emulators: For mobile app development, simulators (like iOS Simulator) or emulators (like Android Emulator) replicate how apps run on devices, revealing device-specific issues.
Extensions for Specific Tasks: Many tools offer extensions or plugins that provide additional functionality. For instance, browser extensions can simulate different visual impairments, helping developers ensure accessibility.

Disconnect and Reconnect: The Power of a Fresh Mindset

Sometimes, stepping away from the problem and returning with a fresh perspective can be the key to finding a solution. When you come back, approach the problem anew, without relying on previous assumptions.

When were engrossed in a problem we can sometimes develop a form of tunnel vision. We become so focused on a specific aspect or potential solution that we overlook other possibilities or simpler solutions. This narrowed perspective can limit Creativity due to fixation on one approach. This blocks thinking outside the box.

The increase in frustration is disheartening. Repeatedly hitting a wall with the same strategy can lead to mounting frustration and decreased productivity. Obsessing over a particular path might mean missing out on a quicker or more straightforward solution.

Going to sleep, lunch or just taking a walk can make a tremendous difference in your problem-solving process.

Embrace the Challenge: Finding Joy in Debugging

Debugging should be a stimulating puzzle. If you're not enjoying it, try debugging unfamiliar code or tasks outside your job scope. Remember, even the best developers face challenges, and it's okay to seek help or share your experiences. The developers who are best at debugging treat it like a challenge and enjoy the bugs more than coding.

At its core, debugging is a problem-solving exercise. It's about tracing anomalies, understanding intricate systems, and restoring harmony to a codebase. Like any challenge, it comes with its hurdles, but also with the potential for immense satisfaction upon resolution. It requires thinking outside of the box and holistic understanding.

Every debugging session is a learning opportunity. It allows developers to deepen their understanding of the system, discover new tools, or refine their problem-solving skills.

Strategies to Embrace the Debugging Challenge

You either love something or you dont and a lot of developers feel that they dont love debugging. I get that. Its frustrating. In fact, I often start my talks with the universal debugging gesture

However, since you made it here and are a software developer I think the potential for love is there. You just need to see debugging for what it is: a process. I think people dont hate debugging, were frustrated by our work environment, by the fact that we make bugs and by the fact we feel stupid. Debugging is just the process we use, its here to help.

Here are some of the common things we can do to make it more pleasant:

Reframe the Perspective: Instead of viewing debugging as a tedious chore, consider it a game or challenge. Adopting a playful mindset can reduce stress and make the process more enjoyable.
Celebrate Small Wins: Every bug resolved, no matter how minor, is a step forward. Celebrate these milestones to maintain motivation and positivity.
Collaborate: Engage in pair debugging or discuss the problem with colleagues. Sharing the challenge can introduce new perspectives, distribute the cognitive load, and make the process more social and enjoyable.
Take Breaks: As discussed in the "Disconnect and Reconnect" approach, taking breaks can refresh the mind, making it easier to enjoy the debugging process upon return.
Document and Reflect: Maintain a debugging journal. Documenting challenges faced, strategies employed, and solutions found can be a source of pride and a valuable resource for future challenges.
Seek External Challenges: If you find joy in debugging, consider seeking external challenges. Platforms like debugging competitions or bug bounties can offer exciting opportunities to test and hone your skills.

Most importantly, distinguish between job-related stress and personal embarrassment. Everyone makes mistakes, even seasoned developers. Sharing your experiences can be cathartic and offer perspective. If work stress is the culprit, consider discussing it with your manager or seeking mentorship.

Use a Process

I discussed the process of debugging and the underlying theory quite a bit in previous posts. Specifically the high-level process and the more hands-on tongs approach.

Both will make the process more rigid and less likely to drag you down a road chasing your own tail.

Conclusion

I dont know if love for debugging is in the cards for you. Its hard to enjoy yourself when youre feeling that something isnt working and you need to fix it. But I think that most of these tips circle around three core ideas:

Relax
Orient yourself
Use the tools at your disposal

You are not alone in this. We all have bug war stories and they are often stupid bugs. Its frustrating and most of us feel some antagonism towards that debugging process. Once we take a step back and get all of these things in order, the process will become more pleasing.

Theres nothing quite like solving a hard bug. Its an addictive feeling, even when its a bug in our own code.

The Systemic Process of Debugging

Shai Almog — Tue, 12 Sep 2023 14:17:44 GMT

Debugging is an integral part of software development. However, as projects grow in size and complexity, the process of debugging requires more structure and collaboration. This process is probably something you already do as this process is deeply ingrained into most teams.

https://youtu.be/JaYaDOHtbyA

It's also a core part of the academic theory behind debugging. Its purpose is to prevent regressions and increase collaboration in a team environment. Without this process, any issue we fix might come back to haunt us in the future. This process helps developers work cohesively and efficiently.

The Importance of Issue Tracking

I'm sure we all use an issue tracker. In that sense, we should all be aligned. But do you sometimes "just fix a bug"?

Without going through the issue tracker?

Honestly, I do that a lot. Mostly in hobby projects but occasionally even in professional settings. Even when working alone this can become a problem...

Avoiding Parallel Work on the Same Bug

When working on larger projects, it's crucial to avoid situations where multiple developers are unknowingly addressing the same issue. This can lead to wasted effort and potential conflicts in the codebase. To prevent this:

Always log bugs in your issue tracking system. Before starting work on a bug, ensure it's assigned to you and marked as active. This visibility allows the project manager and other team members to be aware, reducing the chances of overlapping work.
Stay updated on other issues. By keeping an eye on the issues your teammates are tackling, you can anticipate potential areas of conflict and adjust your approach accordingly.

Assuming you have a daily sync session or even a weekly session, it's important to discuss issues. This prevents collision where a teammate can hear the description of the bug and might raise a flag. This also helps in pinpointing the root cause of the bug in some situations, an issue might be familiar and communicating through it leaves a "paper trail".

As the project grows you will find that bugs keep coming back despite everything we do. History that was left behind in the issue tracker by teammates who are no longer on the team can be a lifesaver. Furthermore, the statistics we can derive from a properly classified issue tracker can help us pinpoint the problematic areas of the code that might need further testing and maybe refactoring.

The Value of Issue Over Pull Requests

We sometimes write the comments and information directly into the pull request instead of the issue tracker. This can work for some situations but isn't as ideal for the general case.

Issues in a tracking system are often more accessible than pull requests or specific commits. When addressing a regression, linking the pull request to the originating issue is vital. This ensures that all discussions and decisions related to the bug are centralized and easily traceable.

Communication: Issue Tracker vs. Ephemeral Channels

I use Slack a lot. This is a problem, it's convenient but it's ephemeral and in more than one case important information written in a Slack chat was gone. Emails aren't much of an improvement, especially in the long term. An email thread I had with a former colleague was cut short and I had no context as to where it ended.

Yes, having a conversation in the issue tracker is cumbersome and awkward but we have a record.

Why We Sometimes Avoid the Issue Tracker

Developers might sometimes avoid discussing issues in the tracker because:

Complex discussions: Some topics might feel too broad or intricate for the issue tracker.
Fear of public criticism: No one wants to appear ignorant or criticize a colleague in a permanent record. As a result, some discussions might shift to private or ephemeral channels.

However, while team cohesion and empathy are crucial, it's essential to log all relevant discussions in the issue tracker. This ensures that knowledge isn't lost, especially if a team member departs.

The Role of Daily Meetings

Daily meetings are invaluable for teams with multiple developers working on related tasks. These meetings provide a platform for:

Sharing updates: Inform the team about your current theories and direction.
Engaging in discussions: If a colleague's update sounds familiar, it's an opportunity to collaborate and avoid redundant work.

However, it's essential to keep these meetings concise. Detailed discussions should transition to the issue tracker for a comprehensive record. I prefer two weekly meetings as I find it's the optimal number. The first day of the week is usually a ramp-up day. Then we have the first meeting in the morning of the second day of the week and the second meeting two days later. That reduces the load of a daily meeting while still keeping information fresh.

The Role of Testing in Debugging

We all use tests when developing (hopefully) but debugging theory has a special place for tests.

Starting with Unit Tests

A common approach to debugging is to begin by creating a unit test that reproduces the issue. However, this might not always be feasible before understanding the problem. Nevertheless, once the problem is understood we should:

Create a test before fixing the issue. This test should be part of the pull request that addresses the bug.
Maintain a coverage ratio. Aim for a coverage ratio of 60% or higher per pull request to ensure that changes are adequately tested.

A test acts as a safeguard against a regression. If the bug resurfaces it will be a slightly different variant of that same bug.

Unit Tests vs. Integration Tests

While unit tests are fast and provide immediate feedback, they primarily prevent regressions. They might not be as effective in verifying overall quality. On the other hand, integration tests, though potentially slower, offer a comprehensive quality check. They can sometimes be the only way to reproduce certain issues. Most of the difficult bugs I ran into in my career were in the interconnect area between modules. This is an area that unit tests don't cover very well. That is why integration tests are far more important than unit tests for overall application quality.

To ensure quality focus on integration tests for coverage. Relying solely on unit test coverage can be misleading. It might lead to dead code and added complexity in the system. However, as part of the debugging process, it's very valuable to have a unit test as it's far easier to debug and much faster.

Final Word

A structured approach to debugging, combined with effective communication and a robust testing strategy, can significantly enhance the efficiency and quality of software development. This isn't about convenience, the process underlying debugging is like a paper trail for the debugging process.

I start every debugging session by searching the issue tracker. In many cases, it yields gold that might not lead me to the issue directly but still points me in the right direction.

The ability to rely on a unit test that was committed when solving a similar bug is invaluable. It gives me a leg up on resolving similar issues moving forward.

Eliminating Bugs Using the Tong Motion Approach

Shai Almog — Tue, 05 Sep 2023 10:55:09 GMT

Software debugging can often feel like a never-ending maze. Just when you think you're on the right track, you hit a dead-end. But, by employing the age-old technique of the process of elimination, and using the analogy of the 'Tong Motion,' we can navigate this maze more effectively.

https://youtu.be/K4FRRG4pnEM

As a sidenote, if you like the content of this and the other posts in this series check out my Debugging book that covers this subject. If you have friends that are learning to code I'd appreciate a reference to my Java Basics book. If you want to get back to Java after a while check out my Java 8 to 21 book.

Understanding the Process of Elimination in Debugging

The Basics

The process of elimination in debugging is straightforward in principle: continuously rule out non-problematic components until the root cause reveals itself. This can be achieved either by commenting out lines of code or using debugging techniques, such as the 'force return', which bypasses specific code paths.

Using External Tools

For front-end issues, replicating the problem using tools like curl or postman is valuable. It helps us determine if the bug is within the front-end code or elsewhere. This way, we can quickly narrow our focus, not merely addressing the symptoms but locating the actual bug.

The Power of Unit Tests in Debugging

Unit tests are our best allies when it comes to debugging. By focusing on isolated units, they hone in on potential problem areas.

Benefits of Mocking Frameworks

Mocking frameworks like Mockito come in handy as they can simulate large parts of the application. This way, we can drill down on the exact problem, circumventing potential disturbances. Moreover, using mocks can prevent regression and make our test cases cleaner.

However, while there are best practices regarding the extent of mocking, when debugging a specific problem, it's more pragmatic to mock as much as necessary to distill the problem to its essence.

The Challenges with Flaky Issues

The elimination technique is less straightforward with flaky issues - those bugs that appear irregularly or whose behavior changes as code is eliminated. The key strategy here is to focus on negatives. In simpler terms, if removing a certain block doesn't cause the problem to appear, it doesn't automatically indict that block. The absence could be due to the bug's unpredictable nature. Hence, it's crucial only to trust instances where the problem consistently reproduces.

The Concept of the 'Tong Motion'

Think of tongs. They grasp from both sides. Similarly, almost all software has at least two primary interfaces or points of input/output. For instance:

Enterprise Web Apps: Web UI on one side and the database on the other.
Operating System Kernel: User space app on one end and computer hardware on the other.
Video Games: The joystick and screen API on one side and the game database on the other.

Applying the Tong Motion to Debugging

Using the example of an enterprise web app:

Mocking the Web Tier: Begin by using tools like curl or postman to eliminate front-end issues.
Mocking the Database: Replace the actual database with mock data.
Narrowing Down Further: If the problem persists, move to testing the presentation tier directly, thereby eliminating the database from the equation.
Digging Deeper: Invoke the business method directly and mock its dependencies. This way, you are narrowing down on the actual method causing the issue while excluding the rest of the application.

One common pitfall is neglecting one prong of the tongs or misplacing the other. It's crucial to ensure both sides are appropriately positioned; otherwise, it might skew the results. If stuck, consider investigating from the opposite side, and then revert when needed.

An Illustrative Case: Debugging a Server Performance Issue

In a real-world scenario, while tackling a server performance issue, I employed the 'Tong Motion' technique. By replacing web calls with curl requests, I shifted focus to the problematic area. At the same time, I enhanced database logging to monitor its output as problematic SQL was replicated through curl. This dual-sided approach helped unearth a bug in the Object Relational Mapping layer.

This concrete example comprises of the following stages:

The tongs start by mocking the web tier with curl or postman. This eliminates front-end related issues.
The other side of the tong motion replaces the database with mock data.
If the issue can be reproduced we can further squeeze the tongs by invoking the presentation tier method directly in a test case.
We can then eliminate the database entirely from the equation by mocking it in a test case.
Finally, we can invoke the business method directly eliminating the presentation tier aspect.
We can mock its dependencies which means we narrow down on a specific method thats at fault while eliminating the rest of the application.

Wrapping Up

Debugging can be a daunting process. However, with the right techniques, like the process of elimination and the 'Tong Motion' approach, it becomes a more manageable task. Always remember to tackle issues methodically and from all angles to find and fix the root cause effectively.

Abstract: Once we press the merge button that code is no longer our responsibility. If it performs sub-optimally or has a bug it is now the problem of the DevOps team, the SRE, etc. Unfortunately, those teams work with a different toolset. If my code uses up too much RAM they will increase RAM. If the code runs slower they will increase CPU. If the code crashes they will increase concurrent instances.

It's 2AM Do you Know What Your Code is Doing?

Shai Almog — Tue, 22 Aug 2023 10:32:17 GMT

Once we press the merge button, that code is no longer our responsibility. If it performs sub-optimally or has a bug, it is now the problem of the DevOps team, the SRE, etc. Unfortunately, those teams work with a different toolset. If my code uses up too much RAM, they will increase RAM. When the code runs slower, they will increase CPU. In case the code crashes, they will increase concurrent instances.

If none of that helps they will call you up at 2AM. A lot of these problems are visible before they become a disastrous middle of the night call. Yes. DevOps should control production, but the information they gather from production is useful for all of us. This is at the core of developer observability which is a subject Im quite passionate about. Im so excited about it I dedicated a chapter to it in my debugging book.

Back when I wrote that chapter I dedicated most of it to active developer observability tools like Lightrun, Rookout, et al. These tools work like production debuggers, they are fantastic in that regard. When I have a bug and know where to look I can sometimes reach for one of these tools (I used to work at Lightrun so I always use it).

But there are other ways. Tools like Lightrun are active in their observability, we add a snapshot similarly to a breakpoint and get the type of data we expect. I recently started playing with Digma which takes a radically different approach to developer observability. To understand that we might need to revisit some concepts of observability first.

Observability isnt Pillars

Ive been guilty of listing the pillars of observability just as much as the next guy. Theyre even in my book (sorry). To be fair, I also discussed what observability really means

Observability means we can ask questions about our system and get answers or at least have a clearly defined path to get those answers. Sounds simple when running locally, but when you have a sophisticated production environment and someone asks you: is anyone even using that block of code?

How do you know?

You might have lucked out and had a log in that code and it might still be lucky that the log is in the right level and piped properly so you can check. The problem is that if you added too many logs or too much observability data, you might have created a disease worse than the cure: over-logging or over-observing.

Both can bring down your performance and significantly impact the bank account, so ideally we dont want too many logs (I discuss over-logging here) and we dont want too much observability.

Existing developer observability tools work actively. To answer the question if someone is using the code I can place a counter on the line and wait for results. I can give it a week's timeout and find out in a week. Not a terrible situation but not ideal either, I dont have that much patience.

Tracing and OpenTelemetry

Its a sad state of affairs that most developers dont use tracing in their day-to-day job. For those of you who dont know it, it is like a call stack for the cloud. It lets us see the stack across servers and through processes. No, not method calls. More at the entry point level, but this often contains details like the database queries that were made and similarly deep insights.

Theres a lot of history with OpenTelemetry which I dont want to get into, if youre an observability geek you already know it and if not then its boring. What matters is that OpenTelemetry is taking over the world of tracing. Its a runtime agent which means you just add it to the server and you get tracing information almost seamlessly. Its magic.

It also doesnt have a standard server which makes it very confusing. That means multiple vendors can use a single agent and display the information it collects to various demographics:

A vendor focused on performance can show the timing of various parts in the system.
A vendor focused on troubleshooting can detect potential bugs and issues.
A vendor focused on security can detect potential risky access.

Background Developer Observability

Im going to coin a term here since there isnt one: Background Developer Observability. What if the data you need was already here and a system already collected it for you in the background?

Thats what Digma is doing. In Digma's terms, it's called Continuous Feedback. Essentially, theyre collecting OpenTelemetry data, analyzing it and displaying it as information thats useful for developers. If Lightrun is like a debugger, then Digma is like SonarQube based on actual runtime and production information.

The cool thing is that you probably already use OpenTelemetry without even knowing it. DevOps probably installed that agent already, and the data is already there!

Going back to my question, is anyone using this API?

If you use Digma you can see that right away. OpenTelemetry already collected the information in the background and the DevOps team already paid the price of collection. We can benefit from that too.

Enough Exposition

I know, I go on Lets get to the meat and potatoes of why this rocks. Notice that this is a demo, when running locally the benefits are limited. The true value of these tools is in understanding production, still they can provide a lot of insight even when running locally and even when running tests.

Digma has a simple and well-integrated setup wizard for IntelliJ/IDEA. You need to have Docker Desktop running for setup to succeed. Note that you dont need to run your application using Docker, this is simply for the Digma server process where they collect the execution details.

Once it is installed, we can run our application, in my case I just ran the JPA unit test from my latest book and it produced standard traces which are already pretty cool, we can see them listed below:

When we click a trace for one of these, we get the standard trace view, this is nothing new, but its really nice to see this information directly in the IDE and readily accessible. I can imagine the immense value this will have for figuring out CI execution issues:

But the real value and where Digma becomes a Developer Observability tool instead of an Observability tool, is with the tool window here:

There is a strong connection to the code directly from the observability data and deeper analysis which doesnt show in my particular overly simplistic hello world. This Toolwindow highlights problematic traces, errors and helps understand real-world issues.

How Does This Help at 2AM?

Disasters happen because we arent looking. Id like to say I open my observability dashboard regularly but I dont. Then when theres a failure I take a while to get my bearings within it. The locality of the applicable data is important, it helps us notice issues when they happen. Detect regressions before they turn to failures and understand the impact of the code we just merged.

Prevention starts with awareness and as developers, we handed our situational awareness to the DevOps team.

When the failure actually happens the locality and accessibility of the data makes a big difference. Since we use tools that integrate in the IDE daily this reduces the meantime to a fix. No, a background developer observability tool might not include the information we need to fix a problem. But if it does, then the information is already there and we need nothing else. That is fantastic.

Final Word

With all the discussion about observability and open telemetry, you would think everyone is using them. Unfortunately, the reality is far from that. Yes, theres some saturation and familiarity in the DevOps crowd. This is not the case for developers.

This is a form of environmental blindness. How can our teams who are so driven by data and facts proceed with secondhand and often outdated data from OPS?

Should I spend time further optimizing this method or will I waste the effort, since few people use it?

We can benchmark things locally just fine, but real-world usage and impact are things that we all need to improve.

The Evolution of Bugs

Shai Almog — Tue, 15 Aug 2023 12:35:08 GMT

Programming, regardless of the era, has been riddled with bugs that vary in nature but often remain consistent in their basic problems. Whether we're talking about mobile, desktop, server, or different operating systems and languages, bugs have always been a constant challenge. Here's a dive into the nature of these bugs and how we can tackle them effectively.

https://youtu.be/KTtpr0JNn_o

As a sidenote, if you like the content of this and the other posts in this series check out my Debugging book that covers this subject. If you have friends that are learning to code I'd appreciate a reference to my Java Basics book. If you want to get back to Java after a while check out my Java 8 to 21 book.

Memory Management: The Past and The Present

Memory management, with its intricacies and nuances, has always posed unique challenges for developers. Debugging memory issues, in particular, has transformed considerably over the decades. Here's a dive into the world of memory-related bugs and how debugging strategies have evolved.

The Classic Challenges: Memory Leaks and Corruption

In the days of manual memory management, one of the primary culprits behind application crashes or slowdowns was the dreaded memory leak. This would occur when a program consumed memory but failed to release it back to the system, leading to eventual resource exhaustion.

Debugging such leaks was tedious. Developers would pour over code, looking for allocations without corresponding deallocations. Tools like Valgrind or Purify were often employed, which would track memory allocations and highlight potential leaks. They provided valuable insights but came with their own performance overheads.

Memory corruption was another notorious issue. When a program wrote data outside the boundaries of allocated memory, it would corrupt other data structures, leading to unpredictable program behavior. Debugging this required understanding the entire flow of the application and checking each memory access.

Enter Garbage Collection: A Mixed Blessing

The introduction of garbage collectors (GC) in languages brought in its own set of challenges and advantages. On the bright side, many manual errors were now handled automatically. The system would clean up objects not in use, drastically reducing memory leaks.

However, new debugging challenges arose. For instance, in some cases, objects remained in memory because unintentional references prevented the GC from recognizing them as garbage. Detecting these unintentional references became a new form of memory leak debugging. Tools like Java's VisualVM or .NET's Memory Profiler emerged to help developers visualize object references and track down these lurking references.

Memory Profiling: The Contemporary Solution

Today, one of the most effective methods for debugging memory issues is memory profiling. These profilers provide a holistic view of an application's memory consumption. Developers can see which parts of their program consume the most memory, track allocation and deallocation rates, and even detect memory leaks.

Some profilers can also detect potential concurrency issues, making them invaluable in multi-threaded applications. They help bridge the gap between the manual memory management of the past and the automated, concurrent future.

Concurrency: A Double-Edged Sword

Concurrency, the art of making software execute multiple tasks in overlapping periods, has transformed how programs are designed and executed. However, with the myriad of benefits it introduces, like improved performance and resource utilization, concurrency also presents unique and often challenging debugging hurdles. Let's delve deeper into the dual nature of concurrency in the context of debugging.

The Bright Side: Predictable Threading

Managed languages, those with built-in memory management systems, have been a boon to concurrent programming. Languages like Java or C# made threading more approachable and predictable, especially for applications that require simultaneous tasks but not necessarily high-frequency context switches. These languages provide in-built safeguards and structures, helping developers avoid many pitfalls that previously plagued multi-threaded applications.

Moreover, tools and paradigms, such as promises in JavaScript, have abstracted away much of the manual overhead of managing concurrency. These tools ensure smoother data flow, handle callbacks, and aid in better structuring asynchronous code, making potential bugs less frequent.

The Murky Waters: Multi-Container Concurrency

However, as technology progressed, the landscape became more intricate. Now, we're not just looking at threads within a single application. Modern architectures often involve multiple concurrent containers, microservices, or functions, especially in cloud environments, all potentially accessing shared resources.

When multiple concurrent entities, perhaps running on separate machines or even data centers, try to manipulate shared data, the debugging complexity escalates. Issues arising from these scenarios are far more challenging than traditional localized threading issues. Tracing a bug may involve traversing logs from multiple systems, understanding inter-service communication, and discerning the sequence of operations across distributed components.

Reproducing The Elusive: Threading Bugs

Thread-related problems have earned a reputation for being some of the hardest to solve. One of the primary reasons is their often non-deterministic nature. A multi-threaded application may run smoothly most of the time but occasionally produce an error under specific conditions, which can be exceptionally challenging to reproduce.

One approach to identify such elusive issues is logging the current thread and/or stack within potentially problematic code blocks. By observing logs, developers can spot patterns or anomalies that hint at concurrency violations. Furthermore, tools that create "markers" or labels for threads can help in visualizing the sequence of operations across threads, making anomalies more evident.

Deadlocks, where two or more threads indefinitely wait for each other to release resources, although tricky, can be more straightforward to debug once identified. Modern debuggers can highlight which threads are stuck, waiting for which resources, and which other threads are holding them.

In contrast, livelocks present a more deceptive problem. Threads involved in a livelock are technically operational, but they're caught in a loop of actions that render them effectively unproductive. Debugging this requires meticulous observation, often stepping through each thread's operations to spot a potential loop or repeated resource contention without progress.

Race Conditions: The Ever-Present Ghost

One of the most notorious concurrency-related bugs is the race condition. It occurs when software's behavior becomes erratic due to the relative timing of events, like two threads trying to modify the same piece of data. Debugging race conditions involves a paradigm shift: one shouldn't view it just as a threading issue but as a state issue. Some effective strategies involve field watchpoints, which trigger alerts when particular fields are accessed or modified, allowing developers to monitor unexpected or premature data changes.

The Pervasiveness of State Bugs

Software, at its core, represents and manipulates data. This data can represent everything from user preferences and current context to more ephemeral states, like the progress of a download. The correctness of software heavily relies on managing these states accurately and predictably. State bugs, which arise from incorrect management or understanding of this data, are among the most common and treacherous issues developers face. Let's delve deeper into the realm of state bugs and understand why they're so pervasive.

What Are State Bugs?

State bugs manifest when the software enters an unexpected state, leading to malfunction. This might mean a video player that believes it's playing while paused, an online shopping cart that thinks it's empty when items have been added, or a security system that assumes it's armed when it's not.

From Simple Variables to Complex Data Structures

One reason state bugs are so widespread is the breadth and depth of data structures involved. It's not just about simple variables. Software systems manage vast, intricate data structures like lists, trees, or graphs. These structures can interact, affecting one another's states. An error in one structure, or a misinterpreted interaction between two structures, can introduce state inconsistencies.

Interactions and Events: Where Timing Matters

Software rarely acts in isolation. It responds to user input, system events, network messages, and more. Each of these interactions can change the state of the system. When multiple events occur closely together or in an unexpected order, they can lead to unforeseen state transitions.

Consider a web application handling user requests. If two requests to modify a user's profile come almost simultaneously, the end state might depend heavily on the precise ordering and processing time of these requests, leading to potential state bugs.

Persistence: When Bugs Linger

State doesn't always reside temporarily in memory. Much of it gets stored persistently, be it in databases, files, or cloud storage. When errors creep into this persistent state, they can be particularly challenging to rectify. They linger, causing repeated issues until detected and addressed.

For example, if a software bug erroneously marks an e-commerce product as "out of stock" in the database, it will consistently present that incorrect status to all users until the incorrect state is fixed, even if the bug causing the error has been resolved.

Concurrency Compounds State Issues

As software becomes more concurrent, managing state becomes even more of a juggling act. Concurrent processes or threads may try to read or modify shared state simultaneously. Without proper safeguards like locks or semaphores, this can lead to race conditions, where the final state depends on the precise timing of these operations.

Tools and Strategies to Combat State Bugs

To tackle state bugs, developers have an arsenal of tools and strategies:

Unit Tests: These ensure individual components handle state transitions as expected.
State Machine Diagrams: Visualizing potential states and transitions can help in identifying problematic or missing transitions.
Logging and Monitoring: Keeping a close eye on state changes in real time can offer insights into unexpected transitions or states.
Database Constraints: Using database-level checks and constraints can act as a final line of defense against incorrect persistent states.

Exceptions: The Noisy Neighbor

When navigating the labyrinth of software debugging, few things stand out quite as prominently as exceptions. They are, in many ways, like a noisy neighbor in an otherwise quiet neighborhood: impossible to ignore and often disruptive. But just as understanding the reasons behind a neighbor's raucous behavior can lead to a peaceful resolution, diving deep into exceptions can pave the way for a smoother software experience.

What Are Exceptions?

At their core, exceptions are disruptions in the normal flow of a program. They occur when the software encounters a situation it wasn't expecting or doesn't know how to handle. Examples include attempting to divide by zero, accessing a null reference, or failing to open a file that doesn't exist.

The Informative Nature of Exceptions

Unlike a silent bug that might cause software to produce incorrect results without any overt indications, exceptions are typically loud and informative. They often come with a stack trace, pinpointing the exact location in the code where the issue arose. This stack trace acts as a map, guiding developers directly to the problem's epicenter.

Causes of Exceptions

There's a myriad of reasons why exceptions might occur, but some common culprits include:

Input Errors: Software often makes assumptions about the kind of input it will receive. When these assumptions are violated, exceptions can arise. For instance, a program expecting a date in the format "MM/DD/YYYY" might throw an exception if given "DD/MM/YYYY" instead.
Resource Limitations: If the software tries to allocate memory when none is available or opens more files than the system allows, exceptions can be triggered.
External System Failures: When software depends on external systems, like databases or web services, failures in these systems can lead to exceptions. This could be due to network issues, service downtimes, or unexpected changes in the external systems.
Programming Errors: These are straightforward mistakes in the code. For instance, trying to access an element beyond the end of a list or forgetting to initialize a variable.

Handling Exceptions: A Delicate Balance

While it's tempting to wrap every operation in try-catch blocks and suppress exceptions, such a strategy can lead to more significant problems down the road. Silenced exceptions can hide underlying issues that might manifest in more severe ways later.

Best practices recommend:

Graceful Degradation: If a non-essential feature encounters an exception, allow the main functionality to continue working while perhaps disabling or providing alternative functionality for the affected feature.
Informative Reporting: Rather than displaying technical stack traces to end-users, provide friendly error messages that inform them of the problem and potential solutions or workarounds.
Logging: Even if an exception is handled gracefully, it's essential to log it for developers to review later. These logs can be invaluable in identifying patterns, understanding root causes, and improving the software.
Retry Mechanisms: For transient issues, like a brief network glitch, implementing a retry mechanism can be effective. However, it's crucial to distinguish between transient and persistent errors to avoid endless retries.

Proactive Prevention

Like most issues in software, prevention is often better than cure. Static code analysis tools, rigorous testing practices, and code reviews can help identify and rectify potential causes of exceptions before the software even reaches the end user.

Faults: Beyond the Surface

When a software system falters or produces unexpected results, the term "fault" often comes into the conversation. Faults, in a software context, refer to the underlying causes or conditions that lead to an observable malfunction, known as an error. While errors are the outward manifestations we observe and experience, faults are the underlying glitches in the system, hidden beneath layers of code and logic. To understand faults and how to manage them, we need to dive deeper than the superficial symptoms and explore the realm below the surface.

What Constitutes a Fault?

A fault can be seen as a discrepancy or flaw within the software system, be it in the code, data, or even the software's specification. It's like a broken gear within a clock. You may not immediately see the gear, but you'll notice the clock's hands aren't moving correctly. Similarly, a software fault may remain hidden until specific conditions bring it to the surface as an error.

Origins of Faults

Design Shortcomings: Sometimes, the very blueprint of the software can introduce faults. This might stem from misunderstandings of requirements, inadequate system design, or failure to foresee certain user behaviors or system states.
Coding Mistakes: These are the more "classic" faults where a developer might introduce bugs due to oversights, misunderstandings, or simply human error. This can range from off-by-one errors, incorrectly initialized variables, to complex logic errors.
External Influences: Software doesn't operate in a vacuum. It interacts with other software, hardware, and the environment. Changes or failures in any of these external components can introduce faults into a system.
Concurrency Issues: In modern multi-threaded and distributed systems, race conditions, deadlocks, or synchronization issues can introduce faults that are particularly hard to reproduce and diagnose.

Detecting and Isolating Faults

Unearthing faults requires a combination of techniques:

Testing: Rigorous and comprehensive testing, including unit, integration, and system testing, can help identify faults by triggering the conditions under which they manifest as errors.
Static Analysis: Tools that examine the code without executing it can identify potential faults based on patterns, coding standards, or known problematic constructs.
Dynamic Analysis: By monitoring the software as it runs, dynamic analysis tools can identify issues like memory leaks or race conditions, pointing to potential faults in the system.
Logs and Monitoring: Continuous monitoring of software in production, combined with detailed logging, can offer insights into when and where faults manifest, even if they don't always cause immediate or overt errors.

Addressing Faults

Correction: This involves fixing the actual code or logic where the fault resides. It's the most direct approach but requires accurate diagnosis.
Compensation: In some cases, especially with legacy systems, directly fixing a fault might be too risky or costly. Instead, additional layers or mechanisms might be introduced to counteract or compensate for the fault.
Redundancy: In critical systems, redundancy can be used to mask faults. For example, if one component fails due to a fault, a backup can take over, ensuring continuous operation.

The Value of Learning from Faults

Every fault presents a learning opportunity. By analyzing faults, their origins, and their manifestations, development teams can improve their processes, making future versions of the software more robust and reliable. Feedback loops, where lessons from faults in production inform earlier stages of the development cycle, can be instrumental in creating better software over time.

Thread Bugs: Unraveling the Knot

In the vast tapestry of software development, threads represent a potent yet intricate tool. While they empower developers to create highly efficient and responsive applications by executing multiple operations simultaneously, they also introduce a class of bugs that can be maddeningly elusive and notoriously hard to reproduce: thread bugs.

This is such a difficult problem that some platforms eliminated the concept of threads entirely. This created a performance problem in some cases or shifted the complexity of concurrency to a different area. These are inherent complexities and while the platform can alleviate some of the difficulties, the core complexity is inherent and unavoidable.

A Glimpse into Thread Bugs

Thread bugs emerge when multiple threads in an application interfere with each other, leading to unpredictable behavior. Because threads operate concurrently, their relative timing can vary from one run to another, causing issues that might appear sporadically.

The Common Culprits Behind Thread Bugs

Race Conditions: This is perhaps the most notorious type of thread bug. A race condition occurs when the behavior of a piece of software depends on the relative timing of events, such as the order in which threads reach and execute certain sections of code. The outcome of a race can be unpredictable, and tiny changes in the environment can lead to vastly different results.
Deadlocks: These occur when two or more threads are unable to proceed with their tasks because they're each waiting for the other to release some resources. It's the software equivalent of a stand-off, where neither side is willing to budge.
Starvation: In this scenario, a thread is perpetually denied access to resources and thus can't make progress. While other threads might be operating just fine, the starved thread is left in the lurch, causing parts of the application to become unresponsive or slow.
Thread Thrashing: This happens when too many threads are competing for the system's resources, causing the system to spend more time switching between threads than actually executing them. It's like having too many chefs in a kitchen, leading to chaos rather than productivity.

Diagnosing the Tangle

Spotting thread bugs can be quite challenging due to their sporadic nature. However, some tools and strategies can help:

Thread Sanitizers: These are tools specifically designed to detect thread-related issues in programs. They can identify problems like race conditions and provide insights into where the issues are occurring.
Logging: Detailed logging of thread behavior can help identify patterns that lead to problematic conditions. Timestamped logs can be especially useful in reconstructing the sequence of events.
Stress Testing: By artificially increasing the load on an application, developers can exacerbate thread contention, making thread bugs more apparent.
Visualization Tools: Some tools can visualize thread interactions, helping developers see where threads might be clashing or waiting on each other.

Untangling the Knot

Addressing thread bugs often requires a blend of preventive and corrective measures:

Mutexes and Locks: Using mutexes or locks can ensure that only one thread accesses a critical section of code or resource at a time. However, overusing them can lead to performance bottlenecks, so they should be used judiciously.
Thread-safe Data Structures: Instead of retrofitting thread safety onto existing structures, using inherently thread-safe structures can prevent many thread-related issues.
Concurrency Libraries: Modern languages often come with libraries designed to handle common concurrency patterns, reducing the likelihood of introducing thread bugs.
Code Reviews: Given the complexity of multithreaded programming, having multiple eyes review thread-related code can be invaluable in spotting potential issues.

Race Conditions: Always a Step Ahead

The digital realm, while primarily rooted in binary logic and deterministic processes, is not exempt from its share of unpredictable chaos. One of the primary culprits behind this unpredictability is the race condition, a subtle foe that always seems to be one step ahead, defying the predictable nature we expect from our software.

What Exactly is a Race Condition?

A race condition emerges when two or more operations must execute in a sequence or combination to operate correctly, but the system's actual execution order is not guaranteed. The term "race" perfectly encapsulates the problem: these operations are in a race, and the outcome depends on who finishes first. If one operation 'wins' the race in one scenario, the system might work as intended. If another 'wins' in a different run, chaos might ensue.

Why are Race Conditions so Tricky?

Sporadic Occurrence: One of the defining characteristics of race conditions is that they don't always manifest. Depending on a myriad of factors such as system load, available resources, or even sheer randomness, the outcome of the race can differ, leading to a bug that's incredibly hard to reproduce consistently.
Silent Errors: Sometimes, race conditions don't crash the system or produce visible errors. Instead, they might introduce minor inconsistenciesdata might be slightly off, a log entry might get missed, or a transaction might not get recorded.
Complex Interdependencies: Often, race conditions involve multiple parts of a system or even multiple systems. Tracing the interaction that causes the problem can be like finding a needle in a haystack.

Guarding Against the Unpredictable

While race conditions might seem like unpredictable beasts, various strategies can be employed to tame them:

Synchronization Mechanisms: Using tools like mutexes, semaphores, or locks can enforce a predictable order of operations. For example, if two threads are racing to access a shared resource, a mutex can ensure that only one gets access at a time.
Atomic Operations: These are operations that run completely independently of any other operations and are uninterruptible. Once they start, they run straight through to completion without being stopped, altered, or interfered with.
Timeouts: For operations that might hang or get stuck due to race conditions, setting a timeout can be a useful fail-safe. If the operation doesn't complete within the expected time frame, it's terminated to prevent it from causing further issues.
Avoid Shared State: By designing systems that minimize shared state or shared resources, the potential for races can be significantly reduced.

Testing for Races

Given the unpredictable nature of race conditions, traditional debugging techniques often fall short. However:

Stress Testing: Pushing the system to its limits can increase the likelihood of race conditions manifesting, making them easier to spot.
Race Detectors: Some tools are designed to detect potential race conditions in code. They can't catch everything, but they can be invaluable in spotting obvious issues.
Code Reviews: Human eyes are excellent at spotting patterns and potential pitfalls. Regular reviews, especially by those familiar with concurrency issues, can be a strong defense against race conditions.

Performance Pitfalls: Monitor Contention and Resource Starvation

Performance optimization is at the heart of ensuring that software runs efficiently and meets the expected requirements of end users. However, two of the most overlooked yet impactful performance pitfalls developers face are monitor contention and resource starvation. By understanding and navigating these challenges, developers can significantly enhance software performance.

Monitor Contention: A Bottleneck in Disguise

Monitor contention occurs when multiple threads attempt to acquire a lock on a shared resource but only one succeeds, causing the others to wait. This creates a bottleneck as multiple threads are contending for the same lock, slowing down the overall performance.

Why It's Problematic

Delays and Deadlocks: Contention can cause significant delays in multi-threaded applications. Worse, if not managed correctly, it can even lead to deadlocks where threads wait indefinitely.
Inefficient Resource Utilization: When threads are stuck waiting, they aren't doing productive work, leading to wasted computational power.

Mitigation Strategies

Fine-grained Locking: Instead of having a single lock for a large resource, divide the resource and use multiple locks. This reduces the chances of multiple threads waiting for a single lock.
Lock-Free Data Structures: These structures are designed to manage concurrent access without locks, thus avoiding contention altogether.
Timeouts: Set a limit on how long a thread will wait for a lock. This prevents indefinite waiting and can help in identifying contention issues.

Resource Starvation: The Silent Performance Killer

Resource starvation arises when a process or thread is perpetually denied the resources it needs to perform its task. While it's waiting, other processes might continue to grab available resources, pushing the starving process further down the queue.

The Impact

Degraded Performance: Starved processes or threads slow down, causing the system's overall performance to dip.
Unpredictability: Starvation can make system behavior unpredictable. A process that should typically complete quickly might take much longer, leading to inconsistencies.
Potential System Failure: In extreme cases, if essential processes are starved for critical resources, it might lead to system crashes or failures.

Solutions to Counteract Starvation

Fair Allocation Algorithms: Implement scheduling algorithms that ensure each process gets a fair share of resources.
Resource Reservation: Reserve specific resources for critical tasks, ensuring they always have what they need to function.
Prioritization: Assign priorities to tasks or processes. While this might seem counterintuitive, ensuring critical tasks get resources first can prevent system-wide failures. However, be cautious, as this can sometimes lead to starvation for lower-priority tasks.

The Bigger Picture

Both monitor contention and resource starvation can degrade system performance in ways that are often hard to diagnose. A holistic understanding of these issues, paired with proactive monitoring and thoughtful design, can help developers anticipate and mitigate these performance pitfalls. This not only results in faster and more efficient systems but also in a smoother and more predictable user experience.

Final Word

Bugs, in their many forms, will always be a part of programming. But with a deeper understanding of their nature and the tools at our disposal, we can tackle them more effectively. Remember, every bug unraveled adds to our experience, making us better equipped for future challenges.

In previous posts in the blog I delved into some of the tools and techniques mentioned in this post.

Is OpenJDK Just a Drop-In Replacement?

Shai Almog — Tue, 08 Aug 2023 11:46:52 GMT

I dont know anyone who is still using the Oracle JDK. It has been my recommendation for quite a while to just switch to an OpenJDK distribution as they are roughly drop-in replacements for Oracles official JDK. Ive repeated that advice quite frequently but I guess I glossed over a lot of details that might be insignificant for hackers but can become a pretty big deal in an enterprise setting.

Following the review from Bazlur I chose to also pick up Simon Ritter's OpenJDK Migration for Dummies. This book has two things going against it:

For dummies - Ive never read one of these before and never considered reading those. While it does use overly simplified language I think the Dummies brand hurts this book. The subject matter is sophisticated and geared towards developers (and DevOps) who can follow the nuances. I think it might deter some developers from reading it, which is a shame.
It's a corporate book - Simon is the Deputy CTO at Azul. This creates the justified concern that the book is a promotion for Azul products. It has those. But having read through it, the material seems objective and valuable.
It does give one advantage: we're getting the book for free.

Unique Analysis

There are many Java books but this is the first time I read a book that explains these specific subjects. The first chapter discusses licensing, TCK (Test Compatibility Kit) and similar issues. Im familiar with all of them since I worked for Sun Microsystems and Oracle, I had a team composing TCKs for the mobile platform at Sun Microsystems. However, even experienced engineers outside of Sun might be unfamiliar with these tests.

The TCK is how we verify that our port of OpenJDK is still compatible with Java. The book illustrates why a reputable OpenJDK distribution can be trusted due to the TCK. This is knowledge thats probably not available elsewhere if you arent deeply involved in the JVM.

Simon explained nicely the scope of the current TCK (139k tests for Java 11), but I think he missed one important aspect: TCK isnt enforced. Oracle doesnt know you ran the TCK properly, it cant verify that. This is why OpenJDK vendors must have a good reputation and understanding of the underlying QA process.

This is just the beginning but pretty much every chapter covered material that I havent seen in other books.

As a side note, the whole TCK creation process is pretty insane. The engineers in my team would go over the JavaDoc like religious scholars and fill up Excel sheets with every statement made by the JavaDoc or implied by the Javadoc. Then devise tests to verify in isolation that every statement is indeed true. In that sense, TCK doesnt test quality. It tests compliance to a uniform, consistent standard. A JDK can fail after running for a week and we might not be able to tell from running the TCK alone, early releases of JDK 8 did exactly that at that time

Learning from a For Dummies Book

I mentioned at the top of this post that I treat the OpenJDK migration casually as a drop-in replacement. This book convinced me that this is not always the case, there are some nuances. I was casually aware of most of them e.g. I worked a lot with Pisces back in the day, but I never saw all of these nuances in a single place.

This is an important list for anyone considering a migration of this type. One should comb over these and verify the risks before embarking on such a migration. As a startup, you might not care about exact fonts or NTLM support, but in an enterprise environment, there are still projects that might rely on that.

In a later chapter comparing the various OpenJDK distributions, Simon included a great chart illustrating the differences. Take into consideration that Simon works for Azul and it is obvious in the chart. Still, the content of the chart is pretty accurate. I am missing the Microsoft VM in the comparison but I guess its a bit too new to be registered as a major vendor.

I did consulting work for major organizations quite often; such as banks, insurance companies etc. In these organizations commercial support is crucial. I used to scoff at that notion but as I ran into some of the edge cases those organizations run into, I get it. We had a senior engineer from IBM debug AIX and Websphere issues.

Similarly, a bank I worked with was having issues with RTL support in newer versions of Swing. As the older JDKs were nearing the end of their life cycle they were forced to migrate but had no way of addressing these issues. Oracles support for those issues was a dud in that case. Commercial support for the JVM isnt something I ever needed or wanted to buy, but I understand the motivation.

At the end of the book, Simon goes into more detail on the extra value that can be layered on top of an OpenJDK distribution. This was interesting to me as I often dont understand the its free business model. It helped me both in understanding the motivation for offering (and maintaining) an OpenJDK release. Its also valuable when I work with larger organizations, I can advise better on the value they can deliver for Java (e.g. fast response to zero-days, etc).

Who Should Read This Book?

Its not a book for everyone. If youre using the Oracle JDK then you need to pick this up and review it. Make sure the reasons you picked Oracle JDK are still valid, they probably arent. If your job includes picking the JDKs for provisioning or development then you should make sure youre familiar with the materials in the book.

If youre just learning Java or using it in a hobbyist capacity then theres an appendix on Javas history that might be interesting to you. But the book as a whole is probably more targeted at developers who are handling production. In that regard, its useful both for server and desktop developers.

BTW if you or someone you know is interested in learning Java please check out my new book for learning Java.

Debugging as a Process of Isolating Assumptions

Shai Almog — Tue, 01 Aug 2023 15:03:45 GMT

Debugging is an integral part of any software development process. It's a systematic hunt for bugs and mistakes that may be hidden in the intricate lines of your code. Much like a hunter and its prey, it requires a precise method and a set of specific tools. Let's delve deeper into the fascinating process of isolating assumptions to effectively debug your code.

https://youtu.be/Qw6coM798AE

Before I proceed with this week's post if you have friends who are learning to code I published a new book for Java beginners with no prior knowledge (for learning programming from scratch). Each chapter also has an accompanying video and I think theres no book quite like it for beginners. I would appreciate spreading the word on this.

The Role of Assumptions in Debugging

Debugging starts with defining the quadrants of the issue and then methodically eliminating possibilities until the root cause of the problem is found. However, this process can be dangerous, as it requires making assumptions about the code.

Often, debugging sessions fail or take longer than anticipated due to incorrect assumptions. A mistake at this stage can drastically elongate the debugging session compared to a mistake made at any other stage.

Think of it in terms of the popular TV show Dr. House. Its a Sherlock Holmes version of a medical drama where the lead character, a grumpy misanthrope, often faces challenges due to the incomplete or incorrect information provided by his patients. This is akin to debugging where we often spend most of our time on a wild goose chase due to incorrect assumptions or missing information.

The Solution to Wrong Assumptions: Double Verification

The best solution to erroneous assumptions is double verification. For every assumption, no matter how basic, we should find another approach to verify it. For instance, let's say we have a bug in code that depends on a result from a remote service. We assume the service works correctly, and we verified that by using the cURL program. To double-verify, we should also add a tracepoint in the code that shows we received the response.

As we narrow our assumptions, double verification may not always be necessary. However, if the process seems stuck, it's essential to revisit and ensure every stage has been verified. Often, we miss something simple and obvious, so it's important to verify the "low-hanging fruit" first.

Expanding the Assumptions' Perimeter: The Predator Analogy

Debugging can be likened to setting up a fence to trap a predator. If the fence is too small, we risk leaving the predator outside, wasting time searching within the confines of an empty enclosure. Conversely, if we fence off a large area, it will take considerable time to pinpoint the predator.

Similarly, in the context of debugging, the bug (predator) and its root cause might be located in different areas of the application (fenced territory). Therefore, we need to make careful assumptions and decide where to place our focus.

Understanding why the program behaves the way it does and the root cause is the ultimate goal of debugging. While locating the bug or even fixing it are crucial steps, the understanding gained is invaluable for future debugging sessions and long-term groundwork.

Prioritizing Assumptions to Find the Root Cause

When faced with a symptom, we need to prioritize our assumptions to find the root cause. Our intuition and experience often guide us toward that root cause. Yet, we must refrain from taking shortcuts that lead from the system directly to the root cause, as this approach often proves misleading. Instead, our focus should be on the next stage, which is the bug itself.

For instance, if a value is incorrect on the screen, we can start by verifying that the source returns the right value and work our way from there. However, when faced with a value that "shouldn't be here", we may not have an immediate course of action.

One effective solution for this could be time travel debugging, which lets us traverse the state of the application after execution is completed. This technique is especially beneficial when the issue is not easily reproducible, and we're dealing with a problematic state that's hard to investigate.

A Common Mistake: The Lure of a Quick Fix

A frequent mistake made during debugging is rushing to the line of code that contains the bug, as we assume we already know the problem. We then become focused on a specific area around the bug and waste a considerable amount of time investigating that code.

Instead, we should create a wider circle by debugging the code we assume works correctly. By doing this, we either discover the assumption isn't working as expected, and we widen our search, or we confirm that the assumption is functioning correctly, giving us a solid base to delve deeper into.

This is illustrated by the following diagram. We rush to narrow down the search scope and this leads us down the wrong path where we surround the symptom with assumptions and spend time there instead of looking for the root cause. It in itself can be in a completely different location from the location where the bug is expressed.

The downside is that the wide perimeter search is so much bigger, but it is sometimes a preferable scope of search.

Final Word

To sum it up, debugging is indeed a complex process that requires a careful and methodical approach to isolate and eliminate assumptions. By being aware of common pitfalls and adopting proven strategies such as double verification, prioritizing assumptions, and avoiding the lure of quick fixes, we can improve our debugging skills and make the process more efficient.

In the next article, we will further explore this topic by discussing common problems and solutions that can provide a practical application to these theoretical concepts. Stay tuned!

Can't Reproduce a Bug?

Shai Almog — Tue, 18 Jul 2023 12:58:32 GMT

The phrase it works on my machine can be a source of amusement, but it also represents a prevailing attitude in the world of development - an attitude that often forces users to prove bugs before we're willing to investigate them. But in reality, we need to take responsibility and chase the issue, regardless of where it takes us.

https://youtu.be/qGasKz-YsiU

A Two-Pronged Approach to Bug Solving

Solving bugs requires a two-pronged approach. Initially, we want to replicate the environment where the issue is occurring; it could be something specific to the user's machine. Alternatively, we may need to resort to remote debugging or use logs from the user's machine, asking them to perform certain actions on our behalf.

A few years back, I was trying to replicate a bug reported by a user. Despite matching JVM version, OS, network connectivity, and so forth, the bug simply wouldn't show up. Eventually, the user sent a video showing the bug, and I noticed they clicked differently within the UI. This highlighted the fact that often, the bug reproduction process is not just in the machine, but also in the user behavior.

The Role of User Behavior and Communication in Bug Solving

In these situations, it is crucial to isolate user behavior as much as possible. Using video to verify the behavior can prove helpful. Understanding the subtle differences in the replicated environment is a key part of this, and open, clear communication with the person who can reproduce the problem is a must.

However, there can be hurdles. Sometimes, the person reporting the issue is from the support department, while we might be in the R&D department. Sometimes, the customer might be upset, causing communication to break down. This is why I believe it's critical to integrate the R&D department with the support department to ensure a smoother resolution of issues.

Tools and Techniques for Bug Solving

Several tools such as strace, dtrace, and others can provide deep insights into a running application. This information can help us pinpoint differences and misbehaviors within the application. The advent of container technology like Docker has greatly simplified the creation of uniform environments, eliminating many subtle differences.

I was debugging a system that only failed at the customer's location. It turns out that their network connection was so fast, the round trip to the management server was completed before our local setup code finished its execution. I tracked it down by logging in remotely to their on-site machine and reproducing the issue there. Some problems can only manifest in a specific geographic location.

There are factors like networking differences, data source differences, and scale that can significantly impact the environment. How do you reproduce an issue that only appears when you have 1,000 requests per second in a large cluster? Observability tools can be extremely helpful in managing these situations. In that situation the debugging process changes, it's no longer about reproducing but rather about understanding the observable information we have for the environment as I discussed here.

Ideally, we shouldn't reach these situations since tests should have the right coverage. However, in practice, this is never the case. Many companies have long-run tests designed to run all night and stress the system to the max. They help discover concurrency issues before they even occur in the wild. Failures were often due to lack of storage (filled up everything with logs) but often when we got a failure it was hard to reproduce. Using a loop to re-run the code that failed many times was often a perfect solution. Another valuable tool was the Force Throw feature I discussed previously. This allowed us to fail gracefully and pass stumbling blocks in the long run.

Logging

Logging is an important feature of most applications; its the exact tool we need to debug these sorts of edge cases. I talked and wrote about logging before and its value.

Yes, logging requires forethought much like observability. We can't debug an existing bug without logging "already in place". Like many things, it's never too late to start logging properly and pick up best practices.

Concurrency

If a bug is elusive the odds of a concurrency-related issue are very high. If the issue is inconsistent then this is the place to start, verifying the threads involved and making sure the right threads are doing what you expect.

Use single thread breakpoints to pause only one specific thread and check if theres a race condition in a specific method. Use tracepoints where possible instead of breakpoints while debugging blocking hides or changes concurrency-related bugs, which are often the reason for the inconsistency.

Review all threads and try to give each one an edge by making the other threads sleep. A concurrency issue might only occur if some conditions are met. We can stumble onto a unique condition using such a technique.

Try to automate the process to get a reproduction. When running into issues like this, we often create a loop that runs a test case hundreds or even thousands of times. We do that by logging and trying to find the problem within the logs.

Notice that if the problem is indeed an issue in concurrent code, the extra logging might impact the result significantly. In one case I stored lists of strings in memory instead of writing them to the log. Then I dumped the complete list after execution finished. Using memory logging for debugging isnt ideal, but it lets us avoid the overhead of the logger or even direct console output (FYI console output is often slower than loggers due to lack of filtering and no piping).

When to "Give Up"

While it's never truly recommended to "give up," there may come a time when you must accept that reproducing the issue consistently on your machine is not feasible. In such situations, we should move on to the next step in the debugging process. This involves making assumptions about the potential causes and creating test cases to reproduce them.

In cases where we cannot resolve the bug, it's important to add logging and assertions into the code. This way, if the bug resurfaces, we'd have more information to work with.

The Reality of Debugging: A Case Study

At Codename One, we were using App Engine when our daily billing suddenly skyrocketed from a few dollars to hundreds. The potential cost was so high it threatened to bankrupt us within a month. Despite our best efforts, including educated guesses and fixing everything we could, we were never able to pinpoint the specific bug. Instead, we had to solve the problem through brute force.

In the end, bug-solving is about persistence and constant learning. It's about not only accepting the bug as a part of the development process but also understanding how we can improve and grow from each debugging experience.

TL;DR

The adage "it works on my machine" often falls short in the world of software development. We must take ownership of bugs, trying to replicate the user's environment and behaviors as closely as possible. Clear communication is key, and integration between R&D and support departments can be invaluable.

Modern tools can provide deep insights into running applications, helping us to pinpoint problems. While container technologies, like Docker, simplify the creation of uniform environments, differences in networking, data sources, and scale can still impact debugging.

Sometimes, despite our best efforts, bugs can't be consistently reproduced on our machines. In such cases, we need to make educated assumptions about potential causes, create test cases that reproduce these assumptions, and add logging and assertions into the code for future debugging assistance.

In the end, debugging is a learning experience that requires persistence and adaptability and is crucial for the growth and improvement of any developer.

The Theory of Debugging

Shai Almog — Tue, 11 Jul 2023 13:21:33 GMT

In the landscape of software development, bugs are an inevitable part of the journey, and debugging, albeit frustrating at times, is an integral part of the process. There's no escaping this truth, and the sooner we embrace it, the sooner we can master the art of debugging.

In the next few posts in this series, I will explain the little-known theory behind debugging. We all know the practice of debugging (to some degree) but there is also a theoretical underpinning that most of us never learned in University (I sure as hell didnt). Understanding this theory will help you apply a more methodical approach to problem resolution and will improve your understanding of your code.

Before we proceed Id like to mention that most of the content in this series is covered in my book Practical Debugging at Scale: Cloud Native Debugging in Kubernetes and Production (Apress).

Also while were on the subject of books, my new book Java 8 to 21: Explore and work with the cutting-edge features of Java 21 (BPB) is the #1 new release in Java Programming on Amazon and available now!

https://www.youtube.com/watch?v=sJkPV7njpVY

The Simplicity and Complexity of Bugs

Debugging is a labyrinthine journey, often reminiscent of Alice in Wonderland. It calls for acute observation, insatiable curiosity, calculated experimentation, and a sense of adventure. However, the general sentiment towards debugging is one of antagonism, largely because of the frustration it entails and the uncomfortable truths it uncovers.

The reality is, most bugs are embarrassingly simple in retrospect. When we finally pinpoint the issue, the common response is a groan of disbelief "How did I miss that?" While this reaction is natural, it breeds a sense of shame and inadequacy, often leading to impostor syndrome. Despite my 40 years of programming experience, I can confidently say that the bugs I encounter today are just as "stupid" as they were at the start. This constant humbling feeling, akin to a universal debugging facepalm, keeps me grounded.

The emotions experienced during debugging surprise, frustration, and humility serve as a reminder of our fallibility. Its akin to a form of meditation, keeping egos in check. Perhaps some leaders could even benefit from debugging as a method of grounding, bringing them closer to the realities of their tasks and teams.

An important principle I have when debugging is to start with stupid. I look for the dumbest mistake I can think of and in a surprising number of cases, its indeed the bug. This isnt a part of the theory

Embracing the Debugging Methodology

The first step in tracking a bug is identifying the likely area in the code. This involves searching through documentation and conducting basic research. From there, we need to devise a strategy to tackle the bug. This step is often overlooked in our haste to find a solution, leading to unstructured and disorganized approaches. We need to formulate a plan, make assumptions, and then test these assumptions.

Next, we should isolate the behavior causing the issue and aim to reproduce it consistently for testing. This could ideally be done in a local environment within the debugger. If we cant consistently reproduce a bug, we won't be able to truly verify our fix, adding uncertainty to the process.

Validation and Elimination

Following this, we must validate that the results of our tests and environment align with our initial assumptions. In the spirit of robustness, its advisable to have two forms of verification as one could potentially be flawed. I wrote about the importance of double verification in this post.

Once we've made these verifications, we proceed to the elimination stage, taking inspiration from Arthur Conan Doyle's famous quote:

"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth."

In other words, we need to "Sherlock Holmes" our problem and rule out possibilities until we're left with the most plausible explanation.

With a deeper understanding of the bug, we can move on to resolving the issue. The resolution process should include filing the issue, creating a failing test case, verifying the proposed fix resolves the test case, and committing both the bug and fix.

Reading the Docs: A Misconception

It's often said, 5 hours of debugging can save you 5 minutes of reading the docs. However, this saying is misleading. Reading the documentation is not the answer, especially considering the sheer volume of documents associated with APIs, platforms, systems, and more. Documentation is never read in five minutes and rarely memorized to a level that will solve a bug. In all the decades Ive been a developer I solved bugs by searching through the docs, but never by reading the docs in advance.

The key here is to know what, where, and when to search for the problem. Search engines and platforms like Stack Overflow have revolutionized debugging, enabling us to input error messages directly and find potential solutions. This method is not foolproof, but it's a good starting point.

The Importance of a Game Plan

Having a game plan saves us from being swallowed by the abyss of trial-and-error debugging. Many years ago, I lost two workdays due to a misplaced 'greater-than' character because I didn't use a methodical approach.

Before diving into a debugging process, it's crucial to answer questions like:

Can the user reproduce this?
Can I reproduce this on my machine?
Does the issue happen consistently?
Is the issue a regression?

The answers to these questions will shape your game plan and the course of your debugging process. In the end, patience and strategy can save precious time and prevent unnecessary frustration.

In our next installment, we'll explore gameplans for debugging issues that can't be reproduced. Stay tuned and embrace the debugging adventure!

Final Word

Debugging, despite being seen as frustrating, is an essential part of software development, offering moments of learning and personal growth. A methodical approach to debugging involves identifying the area of code responsible for the bug, formulating a strategic game plan, isolating and reproducing the bug for testing, and finally resolving the issue. A common misconception is that reading the documentation can save hours of debugging; however, it's more about knowing where and what to search for. Patience and a clear strategy can prevent unnecessary time waste and make the debugging process more efficient.

In the next installment, we will delve deeper into the game plan for debugging, particularly focusing on issues that are hard to reproduce. We will further explore strategies and tools that can assist in efficiently tackling such elusive bugs.

Building for Failure - Best Practices for Easy Production Debugging

Shai Almog — Tue, 04 Jul 2023 14:56:40 GMT

Before going into the content of this post, check out my new book which launched this week.

Quite a few years ago I was maintaining a database-driven system and ran into a weird production bug. The column I was reading from had a null value, but this wasnt allowed in the code and there was no place where that value could have been null. The database was corrupt in a bad way and we didnt have anything to go on. Yes, there were logs. But due to privacy concerns you cant log everything, even if we could, how would we know what to look for?

https://www.youtube.com/watch?v=uz06__w2mjI

Programs fail, thats inevitable. We strive to reduce failures but failure will happen. We also have another effort and it gets less attention: failure analysis. There are some best practices and common approaches, most famously logging. Ive often said before that logs are pre-cognitive debugging, but how do we create an application thats easier to debug?

How do we build the system so that when it fails like that we would have a clue of what went wrong?

A common military axiom goes Difficult training makes combat easy. Assuming the development stage is the training, any work we do here will be harder as we dont yet know the bugs we might face in production. But that work is valuable as we arrive prepared for production.

This preparation goes beyond testing and QA. It means preparing our code and our infrastructure for that point where a problem occurs. That point is where both testing and QA fail us. By definition, this is preparation for the unexpected.

Defining a Failure

We first need to define the scope of a failure. When I talk about production failures people automatically assume crashes, websites going down and disaster-level events. In practice those are rare. The vast majority of these cases are handled by OPS and system engineers.

When I ask developers to describe the last production problem they ran into they often stumble and cant recall. Then upon discussion and querying it seems that a recent bug they dealt with was indeed reported by a customer, in production. They had to somehow reproduce it locally or review information to fix it. We dont think of such bugs as production bugs but they are. The need to reproduce failures that already happened in the real world makes our job harder.

What if we could understand the problem just by looking at the way it failed, right in production?

Simplicity

The rule of simplicity is common and obvious but people use it to argue both sides. Simple is subjective. Is this block of code simple?

return obj.method(val).compare(otherObj.method(otherVal));

Or is this block simple?

var resultA = obj.method(val);var resultB = otherObj.method(otherVal);return resultA.compare(resultB);

In terms of lines of code, the first example seems simpler and indeed many developers will prefer that. This would probably be a mistake. Notice that the first example includes multiple points of failure in a single line. The objects might be invalid. There are three methods that can fail. If a failure occurs it might be unclear what part failed.

Furthermore, we cant log the results properly. We cant debug the code easily as we would need to step into individual methods. If a failure occurs within a method the stack trace should lead us to the right location even in the first example. Would that be enough?

Imagine if the methods we invoked there changed state. Was obj.method(val) invoked before otherObj.method(otherVal)?

With the second example, this is instantly visible and hard to miss. Furthermore, the intermediate state can be inspected and logged as the values of resultA and resultB.

Lets inspect a common example:

var result = list.stream()                 .map(MyClass::convert)                 .collect(Collectors.toList());

Thats a pretty common code that is similar to this code:

var result = new ArrayList();for(MyClass c: list) {    result.add(c.convert());}

There are advantages to both approaches in terms of debuggability and our decision can have a significant impact on the long-term quality. A subtle change in the first example is the fact that the returned list is unmodifiable. This is a boon and a problem. Unmodifiable lists fail at runtime when we try to change them, thats a potential risk of failure. However, the failure is clear. We know what failed.

A change to the result of the second list can create a cascading problem but might also simply solve a problem without failing in production.

Which should we pick?

The read-only list is a major advantage. It promotes the fail-fast principle which is a major advantage when we want to debug a production issue. When failing fast we reduce the probability of a cascading failure. Those are the worst failures we can get in production as they require a deep understanding of the application state which is complex in production.

When building big applications the word robust gets thrown around frequently. Systems should be robust, but they should offer that outside of your code which should fail fast.

Consistency

In my talk about logging best practices, I mention the fact that every company I ever worked for had a style guide for code, or at least aligned with a well-known style. Very few had a guide for logging, where should we log, what should we log etc. This is a sad state of affairs.

We need consistency that goes deeper than code formatting. When debugging we need to know what to expect. If specific packages are prohibited from use I would expect this to apply to the entire code base. If a specific practice in coding is discouraged Id expect this to be universal.

Thankfully, with CI these consistency rules are easy to enforce without burdening our review process. Automated tools such as SonarQube are pluggable and can be extended with custom detection code. We can tune these tools to enforce our set of consistent rules to limit usage to a particular subset of the code or require a proper amount of logging.

Every rule has an exception, we shouldnt be bound to overly strict rules. Thats why the ability to override such tools and merge a change with a developer review is important.

Double Verification

Debugging is the process of verifying assumptions as we circle the area of the bug. Typically this happens very quickly. We see whats broken, verify and fix it. But sometimes we spend an inordinate amount of time tracking a bug. Especially a hard-to-reproduce bug or a bug that only manifests in production.

As a bug becomes elusive its important to take a step back, usually, it means that one of our assumptions was wrong. In this case, it might mean that the way in which we verified the assumption was faulty. The point of double verification is to test the assumption that failed using a different approach to make sure the result is correct.

Typically we want to verify both sides of the bug, e.g. lets assume I have a problem in the backend. It would express itself via the front end where data is incorrect. To narrow the bug I initially made two assumptions:

The front end displays the data correctly from the backend
The database query returned the right data

To verify these assumptions I can open a browser and look at the data. I can inspect responses with the web developer tools to make sure the data displayed is what the server query returned. For the backend, I can issue the query directly against the database and see if the values are the correct ones.

But thats only one way of verifying this data. Ideally, we would want a second way. What if a cache returned the wrong result? What if the SQL made the wrong assumption?

The second way should ideally be different enough so it wouldnt simply repeat the failures of the first way. For the front-end code our knee-jerk reaction would be to try with a tool like cURL. Thats good and we probably should try that. But a better way might be to look at logged data on the server or invoke the WebService that underlies the front end.

Similarly, for the backend, we would want to see the data returned from within the application. This is a core concept in observability. An observable system is a system for which we can express questions and get answers. During development, we should aim our observability level at two different ways to answer a question.

Why not Three Ways to Verify?

We dont want more than two ways because that would mean were observing too much, and as a result, our costs can go up while performance goes down. We need to limit the information we collect to a reasonable amount. Especially given the risks of personal information retention which is an important aspect to keep in mind!

Observability is often defined through its tools, pillars or similar surface area features. This is a mistake. Observability should be defined by the access it provides us. We decide what to log and what to monitor. We decide the spans of the traces. We decide the granularity of the information and we decide whether we wish to deploy a developer observability tool.

We need to make sure that our production system will be properly observed. To do that we need to run failure scenarios and possibly chaos game days. When running such scenarios we need to think about the process of solving the issues that come up. What sort of questions would we have for the system, how could we answer such a question?

E.g. When a particular problem occurs we would often want to know how many users were actively modifying data in the system. As a result, we can add a metric for that information.

Verifying with Feature Flags

We can verify an assumption using observability tools but we can also use more creative verification tools. One unexpected tool is the feature flag system. A feature flag solution can often be manipulated with very fine granularity, we can disable or modify a feature only for a specific user, etc.

This is very powerful, we can toggle a feature that could provide us with verification of a specific behavior if that specific code is wrapped in a flag. I dont suggest spreading feature flags all over the code, but the ability to pull levers and change the system in production is a powerful debugging tool that is often underutilized as such.

Bug Debriefs

Back in the 90s I developed flight simulators and worked with many fighter pilots. They instilled in me a culture of debriefing. Up until that point, I thought of these things only for discussing failures but fighter pilots go to debrief immediately after the flight, whether it is successful or a failed mission.

There are a few important points we need to learn here:

Immediate - we need this information fresh in our minds. If we wait some things get lost and our recollection changes significantly.
On Success and Failure - Every mission gets things right and wrong. We need to understand what went wrong and what went right. Especially in successful cases.

When we fix a bug we just want to go home, we often dont want to discuss it anymore. Even if we do want to show off it's often our broken recollection of the tracking process. By conducting an open discussion of what we did right and wrong with no judgment. We can create an understanding of our current status. This information can then be used to improve our results when tracking issues.

Such debriefs can point at gaps in our observability data, inconsistencies and problematic processes. A common problem in many teams is indeed in the process. When an issue is raised it is often:

Encountered by the customer
Reported to support
Checked by ops
Passed to R&D

If youre in R&D youre four steps away from the customer and receive an issue that might not include the information you need. Refining these processes isnt a part of the code, but we can include tools within the code to make it easier for us to locate a problem. A common trick is to add a unique key to every exception object. This propagates all the way to the UI in case of a failure.

When a customer reports an issue theres a good possibility they will include the error key which R&D can find within the logs. These are the types of process refinements that often rise through such debriefs.

Review Successful Logs and Dashboards

Waiting for failure is a problematic concept. We need to review logs, dashboards, etc. regularly both to track potential bugs that arent manifesting, but also to get a sense of a baseline. What does a healthy dashboard or log look like

We have errors in a normal log, if during a bug hunt we spend time looking at a benign error, then were wasting our time. Ideally, we want to minimize the amount of these errors as they make the logs harder to read. The reality of server development is that we cant always do that. But we can minimize the time spent on this through familiarity and proper source code comments.

I went into more detail in the logging best practices post and talk.

Final Word

A couple of years after founding Codename One our Google App Engine bill suddenly jumped to a level that would trigger bankruptcy within days. This was a sudden regression due to a change on their backend.

This was caused because of uncached data but due to the way App Engine worked at the time there was no way to know the specific area of the code triggering the problem. There was no ability to debug the problem and the only way to check if the issue was resolved was to deploy a server update and wait a lot

We solved this through dumb luck. Caching everything we could think of in every single place. To this day I dont know what triggered the problem and what solved it.

What I do know is this:

I made a mistake when I decided to pick App Engine. It didnt provide proper observability and left major blind spots. Had I taken the time before the deployment to review the observability capabilities I would have known that. We lucked out but I could have saved a lot of our cash early on had we been more prepared.

GraphQL, JavaScript, Preprocessor, SQL and more in Manifold

Shai Almog — Tue, 27 Jun 2023 13:18:19 GMT

We reached the final installment of our Manifold series but not the end of its remarkable capabilities. Throughout this series, we have delved into various aspects of Manifold, highlighting its unique features and showcasing how it enhances Java development. In this article, we will cover some of the remaining features of Manifold, including its support for GraphQL, integration with JavaScript, and the utilization of a preprocessor. By summarizing these features and reflecting on the knowledge gained throughout the series, I hope to demonstrate the power and versatility of Manifold.

https://www.youtube.com/watch?v=7PzaVwZ68II

Expanding Horizons with GraphQL Support

GraphQL, a relatively young technology, has emerged as an alternative to REST APIs. It introduced a specification for requesting and manipulating data between client and server, offering an arguably more efficient and streamlined approach. However, GraphQL can pose challenges for static languages like Java.

Thankfully, Manifold comes to the rescue by mitigating these challenges and making GraphQL accessible and usable within Java projects. By removing the rigidness of Java and providing seamless integration with GraphQL, Manifold empowers developers to leverage this modern API style.

E.g. for this graphql file taken from the Manifold repository:

query MovieQuery($genre: Genre!, $title: String, $releaseDate: Date) {    movies(genre: $genre, title: $title, releaseDate: $releaseDate) {        id        title        genre        releaseDate    }}query ReviewQuery($genre: Genre) {    reviews(genre: $genre) {        id        stars        comment        movie {            id            title        }    }}mutation ReviewMutation($movie: ID!, $review: ReviewInput!) {    createReview(movie: $movie, review: $review) {        id        stars        comment    }}extend type Query {    reviewsByStars(stars: Int) : [Review!]!}

We can write this sort of fluent code:

var query = MovieQuery.builder(Action).build();var result = query.request(ENDPOINT).post();var actionMovies = result.getMovies();for (var movie : actionMovies) {  out.println(    "Title: " + movie.getTitle() + "\n" +    "Genre: " + movie.getGenre() + "\n" +    "Year: " + movie.getReleaseDate().getYear() + "\n");}

None of these objects need to be declared in advance, all we need are the GraphQL files.

Achieving Code Parity with JavaScript Integration

In some cases, regulatory requirements demand identical algorithms in both client and server code. This is common for cases like interest rate calculations where in the past we used Swing applications to calculate and display the rate. Since both the backend and front-end were in Java it was simple to have a single algorithm.

However, this can be particularly challenging when the client-side implementation relies on JavaScript. Manifold provides a solution by enabling the integration of JavaScript within Java projects. By placing JavaScript files alongside the Java code, developers can invoke JavaScript functions and classes seamlessly using Manifold. Under the hood, Manifold utilizes Rhino to execute JavaScript, ensuring compatibility and code parity across different environments.

E.g. this JavaScript snippet:

function calculateValue(total, year, rate) {  var interest = rate / 100 + 1;  return parseFloat((total * Math.pow(interest, year)).toFixed(4));}

Can be invoked from Java as if it was a static method:

var interest = Calc.calculateValue(4,1999, 5);System.out.println(interest);

Preprocessor for Java

While preprocessor-like functionality may seem unnecessary in Java due to its portable nature and JIT compilation, there are scenarios where conditional code becomes essential. For instance, when building applications that require different behavior in on-premises and cloud environments, configuration alone may not suffice. It would technically work, but it might leave proprietary bytecode in on-site deployments and that isnt something we would want. There are workarounds for this but they are often very heavy-handed for something relatively simple.

Manifold addresses this need by offering a preprocessor-like capability. By defining values in build.properties or through environment variables and compiler arguments, developers can conditionally execute specific code paths. This provides flexibility and control without resorting to complex build tricks or platform-specific code.

With Manifold, we can write preprocessor code such as:

#if SERVER_ON_PREM    onPremCode();#elif SERVER_CLOUD    cloudCode();#else    #error Missing definition: SERVER_ON_PREM or SERVER_CLOUD#endif

Reflecting on Manifold's Power

Throughout this series, we have explored the many features of Manifold, including type-safe reflection, extension methods, operator overloading, property support, and more. These features demonstrate Manifold's commitment to enhancing Java development and bridging the gap between Java and modern programming paradigms. By leveraging Manifold, developers can achieve cleaner, more expressive code while maintaining the robustness and type safety of the Java language.

Manifold is an evolving project with many niche features I didnt discuss in this series including the latest one SQL Support.

In a current Spring Boot project that Im developing, I chose to use Manifold over Lombok. My main reasoning was that this is a startup project so Im more willing to take risks. Manifold lets me tailor itself to my needs, I dont need many of the manifold features and indeed didnt add all of them. I will probably need to interact with GraphQL though and this was a big deciding factor when picking Manifold over Lombok. So far I am very pleased with the results and features such as entity beans work splendidly with property annotations.

I do miss the Lombok constructor's annotations though, I hope something like that will eventually make its way into Manifold. Alternatively, if I find the time I might implement this myself.

Final Word

As we conclude this journey through Manifold, it's clear that this library offers a rich set of features that elevate Java development to new heights. Whether it's simplifying GraphQL integration, ensuring code parity with JavaScript, or enabling conditional compilation through a preprocessor-like approach, Manifold empowers developers to tackle complex challenges with ease. We hope this series has provided valuable insights and inspired you to explore the possibilities that Manifold brings to your Java projects.

Dont forget to check out the past installments in this series to get the full scope of Manifolds power.

Understanding Security Vulnerabilities: A First Step in Preventing Attacks

Shai Almog — Tue, 13 Jun 2023 16:29:36 GMT

When I was a teenager, our local telephone company introduced a new service - the premium phone calls (AKA 1-900 numbers). The fun part was that we discovered a workaround to these charges by dialing the sequential local numbers which these 1-900 numbers would redirect to. If the "support number" for the 1-900 was 555-555 we would dial every number between 555-455 and 555-655 until we hit the jackpot...

Hours were spent dialing these numbers, leading us to make numerous calls for free. This attack is still prevalent today, and it's called Insecure Direct Object References (IDOR).

https://www.youtube.com/watch?v=HTp3cW1Sfq8

IDOR

In the digital world, IDOR is similar to our teen exploits. It means trying various ID numbers in sequence until we find the right one. A few years ago, a social network named Parler, which listed users by a sequential numeric ID, fell victim to this type of attack when a user was able to request and download the full list of users on that network.

E.g. their URLs looked like: https://site.com/viewUser?id=999

All a person needs to do is loop over valid sequential numbers and send the request to get the user information of everyone on that site. This is trivial and can be accomplished by anyone with relatively low technical skills.

To avoid such an attack, it is advised not to expose guessable or sequential numeric IDs to the end users. While UUID might seem long, it offers a more secure alternative. Additionally, request checking should be implemented. If a user is requesting information about a user they aren't connected to, that request should be blocked. Other effective mitigations include setting request quotas and delays between requests to prevent a massive data grab.

I won't go into these since they are typically implemented in the API gateway layer during provisioning. You can write this in code but it's a challenging task as you might have many endpoints with a great deal of complexity. The rule of thumb is to write as little code as you possibly can, more code means more bugs and a wider attack surface for a malicious hacker.

Vulnerabilities and Exploits

A crucial term in application security is vulnerability. It's a weakness or bug that can be likened to a hole in the fence surrounding your house. These vulnerabilities can reside in your code, libraries, Java itself, the operating system, or even physical hardware. However, not every vulnerability is exploitable. Just like a hole in your fence may not necessarily grant access to your house, vulnerabilities don't always mean your code can be hacked. Our aim is to plug as many holes as possible to make the task of exploiting our system more difficult.

I know the onion metaphor is tired by now but for security it makes a lot of sense. We need to enforce security at every layer. In the Log4Shell exploit that was exposed last year we had a major zero-day vulnerability. A zero-day vulnerability is a newly discovered vulnerability that no one knew about before, like a new hole in the fence.

The Log4Shell vulnerability relied on people logging information without validating it first. This was a bad practice before the vulnerability was known. If you used a Log4J version that had that vulnerability, but sanitized your data. You would have been safe despite that vulnerability.

SQL Injection

SQL injection involves building your own queries by concatenating a query string manually. Let's look at vulnerable SQL like this:

String sql = "SELECT * from Users WHERE id = " + id;

Considering the sample URL we used before we could request a URL like this: https://site.com/viewUser?id=1 OR true=true.

This URL would result in an attacker fetching all the users as the condition will become:

SELECT * from Users WHERE id = 1 OR true=true

Which is always true. This is a relatively tame outcome. SQL statements can be chained to drop tables deleting the entire database. A solution to this is using the prepared statement syntax, where the implementation treats all the content as a string. This prevents the SQL keywords from being exploited e.g.:

PreparedStatement sql = connection.prepareStatement("SELECT * from Users WHERE id = ?");sql.setString(1, id);

In this situation when we set the value for the id it will treat it as a string even if there are SQL keywords or special characters. Using APIs like JPA (Spring Data, Hibernate, etc.) will also protect you from SQL injection when using similar APIs.

Serialization

Java serialization is another common vulnerability. The lesson here is to avoid using serialization or requiring it, and instead running your app with a filter that blocks certain types of serialization.

This is something I discussed in a previous post so there's no point repeating it.

Cross-site Scripting (XSS)

Cross-site scripting, or XSS, is a complex attack. It involves injecting malicious scripts into websites that then run on every person's browser visiting the page. This can lead to theft of user cookies, which in turn allows the attacker to impersonate users on the website. Protecting against XSS involves validating user-supplied data, treating it as display content, not executable code.

Let's say I have a submit form that accepts user input that is saved to the database. Like the comments section in the blog. I can post in JavaScript code that would submit the user's cookies to a site I control. Then I can steal this information and impersonate a user. This is a very common and surprising attack, it's often performed by encoding the script into a link sent by email.

These are three types of XSS attacks:

Stored XSS (Persistent) - The attack I described here is a form of stored XSS since the comment I would submit is saved in the database. At this point, every user that looks at the comment is attacked.
Reflected XSS (Non-persistent) - In this form, the attacker sends a link to a user (or convinces the user to click on a link) that contains the malicious script. When the user clicks the link, the script runs, sending their data to the attacker. The script is embedded in the URL and reflected off the web server. This is usually part of a phishing attack.
DOM-Based XSS - This type of attack occurs entirely in the victim's browser. The web application's client-side scripts write user-provided data to the Document Object Model. The data is subsequently read from the DOM by the web application and outputted to the browser. If the data was interpreted as JavaScript, it's executed.

Protecting from XSS requires diligent validation of all input. We can protect against these attacks by checking if user-provided data is of the correct form and contains no malicious content. We must ensure any user-supplied content is treated as display content, not executable code.

There are many ways to validate user-submitted data and the Jsoup library contains one such API. Notice that Spring Boot contains XSS protection as part of the security configuration but Ill cover that later.

personName = Jsoup.clean(personName, Whitelist.basic());

Notice that validating input is a recurring theme when it comes to security vulnerabilities. As developers we often strive to provide the most generic and flexible tooling, this works against us when it comes to security vulnerabilities. It's important to limit input options even when we don't see a problem.

Content-Security-Policy (CSP)

One of the ways to carry out an XSS attack is by including foreign code into our own website. One way to block this is using special HTTP headers to define which sites can include our site. This is a rather elaborate process but the nice thing is that Spring Security handles that nicely for us as well.

HttpOnly Cookies

Cookies can be created in the browser using JavaScript. This is a bad practice. Ideally, cookies should always come from the server and be marked as HTTP only (and HTTPS only). This blocks JavaScript code from accessing the cookie.

That means that even if a script is added somehow or a bad link is clicked, it wont have access to the cookie value. This mitigates XSS attacks so even if your site vulnerable the attack can't steal the cookie. We can enable HttpOnly cookies when we set the cookie in the server code.

Unvalidated Redirects and Forwards

Another security concern is unvalidated redirects and forwards. Here, an attacker creates a URL that looks like it's coming from your domain, but redirects to another malicious site. The solution lies in validating and restricting included or submitted URLs, and never sending users blindly to third-party sites.

Lets say we have a login page. After we login were shown a splash screen and then were sent to the actual destination. This seems simple enough but some people need to go to page X and others need to go to page Y. We want to keep the code generic so we accept the destination URL as an argument. That way the login code can decide where to go next and we dont need to know about all the user types e.g.: https://bug.com/postLogin?dest=url.

The problem is that a person can create a URL that looks like its coming from our domain, but pass in another URL as the last argument. Our users can end up on a malicious site without realizing they were redirected to a new site.

The solution is to validate and restrict included or submitted URLs and never send a user blindly to a third-party site.

Server Side Request Forgery (SSRF)

SSRF attacks are similar conceptually, in these attacks our server performs a request based on the request we received. Our server can be manipulated to request arbitrary URLs for an attacker. This can serve as the basis for information theft, denial of service attacks, etc.

Cross-Site Request Forgery (CSRF)

CSRF is another challenging issue where an attacker tricks users into hacking their own account. Typically, were logged into a website. Our credentials and cookies are already set. If a different website knows were logged in to a website it can trick us and get us to hack ourselves...

Let's say you visit a website and it has a big button that you can press for your chance to win a million dollars. Would you press it?

Whats the harm right?

If that button is a form that submits the request directly to your bank, this can be used to steal currency and more.

The standard solution is to add a server-generated token into the HTML that changes with every request, thus validating that the HTML came from the legitimate site. This is a standard strategy supported by Spring Security.

We can also set our cookies to the SameSite policy which will mean a user wont be logged in if hes on a separate site. Turning this on for your login information is probably a good idea.

Final Word

In conclusion, while we did not delve into a lot of code writing in this post, the objective was to shed light on common security vulnerabilities and attacks, and how to prevent them. Understanding these concepts is fundamental in building secure applications, and the more we're aware, the better equipped we are to thwart potential threats.

There are many tools for security validation, if you use a decent linter like SonarQube you would be on your way to a more secure app. Snyk also has great tooling that can help catch various vulnerabilities.

This paragraph from the post probably sums up the most important aspects:

Notice that validating input is a recurring theme when it comes to security vulnerabilities. As developers we often strive to provide the most generic and flexible tooling, this works against us when it comes to security vulnerabilities. It's important to limit input options even when we don't see a problem.

Everything Bad in Java is Good for You

Shai Almog — Tue, 06 Jun 2023 16:17:08 GMT

Everything Bad is Good for You is a pop culture book that points out that some things we assume are bad (like TV) have tremendous benefits to our well-being. I love the premise of disrupting the conventional narrative and was reminded of that constantly when debating some of the more controversial features and problems in Java. Its a feature, not a bug

One of my favorite things about Java is its tendency to move slowly and deliberately. It doesnt give us what we want right away. The Java team understands the requirements and looks at the other implementations, then learns from them.

Id say Javas driving philosophy is that the early bird is swallowed by a snake.

https://www.youtube.com/watch?v=cWdkMfmpsps

Checked Exceptions

One of the most universally hated features in Java is checked exceptions. They are the only innovative feature Java introduced as far as I recall. Most of the other concepts in Java existed in other languages, checked exceptions are a brand new idea that other languages rejected. They arent a fun feature, I get why people dont like them. But they are an amazing tool.

The biggest problem with checked exceptions is the fact that they dont fit nicely into functional syntax. This is true for nullability as well (which I will discuss shortly). Thats a fair complaint. Functional programming support was tacked onto Java and in terms of exception handling it was poorly done. The Java compiler could have detected checked exceptions and required an error callback. This was a mistake made when these capabilities were introduced in Java 8. E.g. if these APIs were better introduced into Java we could have written code like this:

api.call1()    .call2(() -> codeThatThrowsACheckedException())    .errorHandler(ex -> handleError(ex))    .finalCall();

The compiler could force us to write the errorHandler callback if it was missing which would satisfy the spirit of the checked exceptions perfectly. This is possible because checked exceptions are a feature of the compiler, not the JVM. A compiler could detect a checked exception in the lambda and require a specially annotated exception handling callback.

Why wasnt something like this added?

This is probably because of the general dislike of checked exceptions. No one attempted to come up with an alternative. No one likes them because no one likes the annoying feature that forces you to tidy up after yourself. We just want to code, checked exceptions force us to be responsible even when we just want to write a simple hello world

This is, to a great extent, a mistake We can declare that main throws an exception and create a simple hello world without handling checked exceptions. In large application frameworks like Spring, checked SQLException is wrapped with a RuntimeException version of the same class. You might think Im against that but Im not. Its a perfect example of how we can use checked exceptions to clean up after the fact. Cleanup is performed internally by Spring, at this point the exception-handling logic is no longer crucial and can be converted to a runtime exception.

I think a lot of the hate towards the API comes from bad versions of this exception such as MalformedURLException or encoding exceptions. These exceptions are often thrown for constant input that should never fail. Thats just redundant and a bad use of language capabilities. Checked exceptions should only be thrown when theres cleanup we can do. Thats an API problem, not a problem with the language feature.

Null

Pouring hate on null has been trending for the past 15+ years. Yes, I know that quote. I think people misuse it.

Null is a fact of life today, whether you like it or not. Its inherent in everything: databases, protocols, formats, etc. Null is a deep part of programming and will not go away in the foreseeable future.

The debate over null is pointless. The debate that matters is whether the cure is better than the disease and Im yet unconvinced. What matters isnt if null was a mistake, what matters is what we do now.

To be fair, this directly correlates to your love of functional programming paradigms. Null doesnt play nicely in FP which is why it became a punching bag for the FP guys. But are we stepping back or stepping forward?

Lets break this down into three separate debates:

Performance
Failures
Ease of programming

Performance

Null is fast. Super fast. Literally free. The CPU performs a null check for us and handles exceptions as interrupts. We dont need to write code to handle null. The alternatives can be very low overhead and can sometimes translate to null for CPU performance benefits. But this is harder to tune.

Abstractions leak and null is the way our hardware works. For most intents and purposes, it is better.

There is a caveat. We need the ability to mark some objects as non-null for better memory layout (as Valhalla plans to do). This will allow for better memory layout and can help speed up code. Notice that we can accomplish this while maintaining object semantics, a marker would be enough.

I would argue that null takes this round.

Failures

People hate NullPointerException. This baffles me.

NullPointerException is one of the best errors to get. Its the fail-fast principle. The error is usually simple to understand and even when it isnt; it isnt far off. Its an easy bug to fix. The alternative might include initializing an empty object which we need to verify or setting a dummy object to represent null.

Open a database that has been around long enough and search for undefined. I bet it has quite a few entries Thats the problem with non-null values. You might not get a failure immediately. You will get something far worse. A stealth bug that crawls through the system and pollutes your data.

Since null is so simple and easy to detect theres a vast number of tools that can deal with it both in runtime and during development. When people mention getting a null pointer exception in production I usually ask: what would have been the alternative?

If you could have initialized the value to begin with then why didnt you do it?

Java has the final keyword, you can use that to keep non-null stateful values. Mutable values are the main reason for uninitialized or null values. Its very possible that a non-null language wouldnt fail. But would its result be worse?

In my experience, corrupt data in storage is far worse. The problem is insidious and hides under the surface. Theres no clue as to the origin of the problem and we need to set traps to track it down. Give me a fail-fast any day.

In my opinion, null has this one hands down

Ease of Programming

An important point to understand is that null is a requirement of modern computing. Our entire ecosystem is built on top of null. Languages like Kotlin demonstrate this perfectly, they have null and non-null objects.

This means we have duplication. Every concept related to objects is expressed twice, and we need to maintain semantics between null and non-null. This raises the bar of complexity for developers new to such languages and makes for some odd syntax.

This in itself would be fine if the complexity paid off. Unfortunately, such features only resolve the most trivial non-issue cases of null. The complex objects arent supported since they contain null retrieved from external sources. Were increasing language complexity for limited benefit.

Boilerplate

This used to be a bigger issue in the past but looking at a typical Java file vs. TypeScript or JavaScript the difference isnt as big. Still, people nitpick. A smart engineer I know online called the use of semicolons in languages: "Laziness".

I dont get that. I love the semicolon requirement and am always baffled by people who have a problem with that. As an author it lets me format my code while ignoring line length. I can line break wherever I want, the semicolon is the part that matters. If anything, I would have loved to cancel the ability to write conditional statements without the curly braces e.g.:

if(..) x();else y();

Thats terrible. I block these in my style requirements; they are a recipe for disaster with an unclear beginning or end.

Java forces organization, this is a remarkable thing. Classes must be in a specific file and packages map to directories. This might not matter when your project is tiny, but as you handle a huge code base, this becomes a godsend. You would instantly know where to look for clues. That is a powerful tool. Yet, it leads to some verbosity and some deep directory structures. But Java was designed by people who build 1M LoC projects, it scales nicely thanks to the boilerplate. We cant say the same for some other languages.

Moving Fast

Many things arent great in Java, especially when building more adventurous startup projects. Thats why Im so excited about Manifold. I think its a way to patch Java with all the cool stuff we want while keeping the performance, compatibility and stability we love.

This can let the community move forward faster and experiment, while Java as a platform can take the slow and steady route.

Final Word

Conventional wisdom is problematic. Especially when it is so one-sided and presents a single-dimension argument in which a particular language feature is inferior. There are tradeoffs to be made and my bias probably shines through my words.

However, the cookie cutter counterpoints dont cut it. The facts dont present a clear picture to their benefit. Theres always a tradeoff and Java has walked a unique tightrope. Even a slight move in the wrong direction can produce a fast tumbling-down effect. Yet it maintains its traction despite the efforts of multiple different groups to display it as antiquated. This led to a ridiculous perception among developers of Python and JavaScript as newer languages.

I think the solution for that is two-fold. We need to educate about the benefits of Java's approach to these solutions. We also need solutions like Manifold to explore potential directions freely. Without the encumberment of the JCP. Having a working proof of concept will make integrating new ideas into Java much easier.

Manifold vs. Lombok: Enhancing Java with Property Support

Shai Almog — Tue, 30 May 2023 13:24:02 GMT

Today, we'll explore an intriguing aspect of Manifold: its property support. In comparison, we'll also examine Lombok, a popular library that offers similar features. By analyzing the differences between these two projects, we can gain insight into their core philosophies and advantages. Let's dive in and discover how Manifold's property support stands out.

https://www.youtube.com/watch?v=z-9jSMq_Yk8

As always, you can find the code examples and additional resources for this post on my GitHub page.

Comparing Manifold and Lombok

Manifold and Lombok are very different projects but they do overlap in a few areas. Properties are where the overlap is greatest and the philosophical differences between the two shine out the most.

Let's explore these differences in detail, highlighting the strengths and limitations of each. The following table provides a high-level comparison:

	Lombok	Manifold
Maturity	Old	New
IDE Support	All Major IDEs Supported	IntelliJ/IDEA
Exit Strategy	Easy	None
Extensibility	Challenging	Modular
Scope	Limited	Extensive

Maturity, IDE Support and Exit Strategy

Lombok has been around for quite some time, and many of its features were designed for older versions of Java. Consequently, some of its functionalities may no longer be relevant or applicable. Furthermore, Lombok's integration with modern IDEs can be limited, and converting Lombok code back to plain Java sources may not be straightforward. While Lombok provides essential boilerplate removal, it is often perceived as a temporary solution or "band-aid" due to its underlying approach.

Manifold's Extensibility and Scope

Manifold offers a more extensible and pluggable framework with a well-designed architecture. While it may still face growing pains and limited IDE support, Manifold's flexibility allows for more ambitious projects and future enhancements. It leverages the strengths of Java's type system while providing unique features.

Property Support

Manifold's property support is based on a concept that has been discussed for decades in the Java community. However, reaching a consensus on the direction proved challenging so a JEP never took hold. On a personal note, I was deeply involved with this discussion and strongly advocated for object properties. With Valhalla, its possible that object properties will become the golden standard moving forward.

Manifold takes a more standard approach that aligns with other languages such as C# and Kotlin, providing familiar and powerful property notation. This is good, it can provide short-term relief to the verbosity of getters and setters.

Differentiating Manifold's Property Support

To understand Manifold's property support, let's examine its equivalent code to the Lombok example. This is a standard Plain Old Java Object (POJO). Notice the get and set methods? Those are standard Java properties, powerful but heavy on the boilerplate:

public class HelloPojo {    private int number;    private String str;    public int getNumber() {        return number;    }    public void setNumber(int number) {        this.number = number;    }    public String getStr() {        return str;    }    public void setStr(String str) {        this.str = str;    }}

We can write the same code using Lombok as such:

@Getter@Setterpublic class HelloLombok {    private int number;    private String str;}

The usage is identical:

obj.setNumber(5);

This is problematic. You will notice we define the fields as private, yet suddenly a setNumber() method appears out of thin air. It feels weird. I get the logic though. The creators of Lombok wanted it to be a drop-in replacement. You could write the rest of the code with setters so if you choose to remove Lombok you dont need to edit the rest of the code. It still stands out as weird.

The Manifold equivalent is similar but has some nuances:

public class HelloProperties {    @var int number;    @var String str;}

One notable distinction is that Manifold's properties are defined at the individual field level, rather than applying to an entire class. Although Manifold may introduce a similar feature to Lombok's Data annotation in the future, it does not currently exist. We need to explicitly define each property.

By default, Manifold makes the property private and generates public accessors (getters and setters). However, just like Lombok, Manifold allows customization through annotations.

The big difference is in the usage. Unlike Lombol, Manifold enables accessing the property directly, eliminating the need for explicit method calls. This behavior applies not only to properties but also to regular setters and getters, creating a consistent and intuitive experience. For example, accessing Calendar.getInstance() as Calendar.instance or retrieving the time from an instance using timeInMillis instead of getTimeInMillis() showcases the syntactic clarity and conciseness offered by Manifold:

// In Manifold a property acts like field accessvar properties = new HelloProperties();properties.number = 5;// even POJO getters and setters act like field accessvar pojo = new HelloPojo();pojo.number = 5;// This is equivalent to Calendar.getInstance().getTimeInMillis()var time = Calendar.instance.timeInMillis;

Customizing Manifold Properties

Similar to Lombok, Manifold offers customization options for individual properties. By using val, a read-only property with only a getter can be defined, behaving similarly to a final field. Conversely, set defines a write-only property. Additionally, scoping preferences can be passed as arguments, influencing the generated methods' visibility.

This is an example class that customizes such scoping:

public class HelloScoping {    @val(PropOption.Package) int number = 5;    final @set String str;}

Which we can use as such:

var scoping = new HelloScoping();scoping.str = "";System.out.println(scoping.number);// This line wont compile due to read only propertyscoping.number = 4;// This line wont compile due to write only propertySystem.out.println(scoping.str);

Encapsulation and Method Implementation

Despite the accessibility of properties in Manifold, encapsulation is not compromised. If a setter or getter method is explicitly implemented, Manifold recognizes the custom implementation and seamlessly interacts with the property. For instance, implementing setNumber() allows the code to treat access to the number field accordingly, maintaining consistent behavior throughout the codebase as such:

public class HelloComputed {    @var int number;    @var String str;    public void setNumber(int number) {        this.number = number - 1;    }}

The following code will print 4:

computed.number = 5;System.out.println(computed.number);

Other Considerations

While both Lombok and Manifold offer useful features, it's important to consider some aspects that may influence your decision. Lombok provides annotations for generating equals, hashCode, and toString methods. It offers annotations that generate constructors, builder patterns, and loggers, all of which are currently missing from Manifold.

Additionally, Lombok's @SneakyThrows annotation enables throwing checked exceptions without explicit declaration or handling. Manifold addresses this with a similar global configuration, disabling checked exceptions throughout the codebase. That might be a bit heavy-handed for most developers. However, it's worth noting that these are not direct equivalents and have different implications.

We can sum this up in the following table:

	Lombok	Manifold
Data & Value
Equals/Hashcode/toString
Constructors
Builders
Loggers
.?
Fluid Dot Syntax
Smart Scoping

Final Word

As we wrap up our exploration of property support in Manifold and Lombok, it's evident that both projects bring unique approaches to enhance Java development. While Lombok has been a popular choice, its limitations and legacy features may prompt developers to seek alternative solutions. On the other hand, Manifold's extensibility, architectural soundness, and distinctive features make it an enticing option for modern Java projects. Manifold is young and still has some limitations in this particular department. That might hinder its adoption in the short term.

Personally, Im still on the fence. I think Manifold has great potential here. I feel it's a better thought-out solution. Lombok feels a bit like a bandaid by comparison. The main benefit of Manifold is as a whole the project provides far more capabilities than Lombok.

Remember to check out the code examples and resources on my GitHub page to further deepen your understanding of Manifold's property support.

Operator Overloading in Java

Shai Almog — Tue, 23 May 2023 14:02:45 GMT

In this post, we'll delve into the fascinating world of operator overloading in Java. Although Java doesn't natively support operator overloading, we'll discover how Manifold can extend Java with that functionality. We'll explore its benefits, limitations, and use cases, particularly in scientific and mathematical code.

We will also explore three powerful features provided by Manifold that enhance the default Java type safety while enabling impressive programming techniques. We'll discuss unit expressions, type-safe reflection coding, and fixing methods like equals during compilation. Additionally, we'll touch upon a solution that Manifold offers to address some limitations of the var keyword. Let's dive in!

https://www.youtube.com/watch?v=pwQs-308OdY

Before we begin as always, you can find the code examples for this post and other videos in this series on my GitHub page. Be sure to check out the project, give it a star, and follow me on GitHub to stay updated!

Arithmetic Operators

Operator overloading allows us to use familiar mathematical notation in code, making it more expressive and intuitive. While Java doesn't support operator overloading by default, Manifold provides a solution to this limitation.

To demonstrate, let's start with a simple Vector class that performs vector arithmetic operations. In standard Java code, we define variables, accept them in the constructor, and implement methods like plus for vector addition. However, this approach can be verbose and less readable.

public class Vec {   private float x, y, z;   public Vec(float x, float y, float z) {       this.x = x;       this.y = y;       this.z = z;   }   public Vec plus(Vec other) {       return new Vec(x + other.x, y + other.y, z + other.z);   }}

With Manifold, we can simplify the code significantly. Using Manifold's operator overloading features, we can directly add vectors together using the + operator as such:

Vec vec1 = new Vec(1, 2, 3);Vec vec2 = new Vec(1, 1, 1);Vec vec3 = vec1 + vec2;

Manifold seamlessly maps the operator to the appropriate method invocation, making the code cleaner and more concise. This fluid syntax resembles mathematical notation, enhancing code readability.

Moreover, Manifold handles reverse notation gracefully. If we reverse the order of the operands, such as a scalar plus a vector, Manifold swaps the order and performs the operation correctly. This flexibility enables us to write code in a more natural and intuitive manner.

Lets say we add this to the Vec class:

public Vec plus(float other) {    return new Vec(x + other, y + other, z + other);}

This will make all these lines valid:

vec3 += 5.0f;vec3 = 5.0f + vec3;vec3 = vec3 + 5.0f;vec3 += Float.valueOf(5.0f);

In this code, we demonstrate that Manifold can swap the order to invoke Vec.plus(float) seamlessly. We also show that the plus equals operator support is built into the plus method support

As implied by the previous code Manifold also supports primitive wrapper objects specifically in the context of autoboxing. In Java, primitive types have corresponding wrapper objects. Manifold handles the conversion between primitives and their wrapper objects seamlessly, thanks to autoboxing and unboxing. This enables us to work with objects and primitives interchangeably in our code. There are caveats to this as we will find out.

BigDecimal Support

Manifold goes beyond simple arithmetic and supports more complex scenarios. For example, the manifold-science dependency includes built-in support for BigDecimal arithmetic. BigDecimal is a Java class used for precise calculations involving large numbers or financial computations. By using Manifold, we can perform arithmetic operations with BigDecimal objects using familiar operators, such as +, -, *, and /. Manifold's integration with BigDecimal simplifies code and ensures accurate calculations.

The following code is legal once we add the right set of dependencies which add method extensions to the BigDecimal class:

var x = new BigDecimal(5L);var y = new BigDecimal(25L);var z = x + y;

Under the hood, Manifold adds the applicable plus, minus, times, etc. methods to the class. It does so by leveraging class extensions which I discussed before.

Limits of Boxing

We can also extend existing classes to support operator overloading. Manifold allows us to extend classes and add methods that accept custom types or perform specific operations. For instance, we can extend the Integer class and add a plus method that accepts BigDecimal as an argument and returns a BigDecimal result. This extension enables us to perform arithmetic operations between different types seamlessly. The goal is to get this code to compile:

var z = 5 + x + y;

Unfortunately, this wont compile with that change. The number five is a primitive, not an Integer and the only way to get that code to work would be:

var z = Integer.valueOf(5) + x + y;

This isnt what we want. However, theres a simple solution. We can create an extension to BigDecimal itself and rely on the fact that the order can be swapped seamlessly. This means that this simple extension can support the 5 + x + y expression without a change:

@Extensionpublic class BigDecimalExt {    public static BigDecimal plus(@This BigDecimal b, int i) {        return b.plus(BigDecimal.valueOf(i));    }}

List of Arithmetic Operators

So far we focused on the plus operator but Manifold supports a wide range of operators. The following table lists the method name and the operators it supports:

Operator	Method
`+` , `+=`	`plus`
`-`, `-=`	`minus`
``, `=`	`times`
`/`, `/=`	`div`
`%`, `%=`	`rem`
`-a`	`unaryMinus`
`++`	`inc`
`--`	`dec`

Notice that the increment and decrement operators dont have a distinction between the prefix and postfix positioning. Both a++ and ++a would lead to the inc method.

Index Operator

The support for the index operator took me completely off guard when I looked at it. This is a complete game-changer The index operator is the square brackets we use to get an array value by index. To give you a sense of what Im talking about, this is valid code in Manifold:

var list = List.of("A", "B", "C");var v = list[0];

In this case, v will be A and the code is the equivalent to invoking list.get(0). The index operators seamlessly map to get and set methods. We can do assignment as well using:

var list = new ArrayList<>(List.of("A", "B", "C"));var v = list[0];list[0] = "1";

Notice I had to wrap the List in an ArrayList since List.of() returns an unmodifiable List. But this isnt the part Im reeling about. That code is nice. This code is absolutely amazing:

var map = new HashMap<>(Map.of("Key", "Value"));var key = map["Key"];map["Key"] = "New Value";

Yes!

Youre reading valid code in Manifold. An index operator is used to lookup in a map. Notice that a map has a put() method and not a set method. Thats an annoying inconsistency that Manifold fixed with an extension method. We can then use an object to look up within a map using the operator.

Relational and Equality Operators

We still have a lot to cover Can we write code like this (referring to the Vec object from before):

if(vec3 > vec2) {    // }

This wont compile by default. However, if we add the Comparable interface to the Vec class this will work as expected:

public class Vec implements Comparable<Vec> {    //     public double magnitude() {        return Math.sqrt(x  x + y  y + z * z);    }    @Override    public int compareTo(Vec o) {        return Double.compare(magnitude(), o.magnitude());    }}

These >=, >, <, <= comparison operators will work exactly as expected by invoking the compareTo method. But theres a big problem. You will notice that the == and != operators are missing from this list. In Java we often use these operators to perform pointer comparisons, this makes a lot of sense in terms of performance. We wouldnt want to change something so inherent in Java. To avoid that, Manifold doesnt override these operators by default.

However, we can implement the ComparableUsing interface which is a sub-interface of the Comparable interface. Once we do that the == and != will use the equals method by default. We can override that behavior by overriding the method equalityMode() which can return one of these values:

CompareTo - will use the compareTo method for == and !=
Equals (the default) - will use the equals method
Identity - will use pointer comparison as is the norm in Java

That interface also lets us override the compareToUsing(T, Operator) method. This is similar to the compareTo method but lets us create operator-specific behavior which might be important in some edge cases.

Unit Expressions for Scientific Coding

https://www.youtube.com/watch?v=r4iycWnI5fE

Notice that Unit expressions are experimental in Manifold. But they are one of the most interesting applications of operator overloading in this context.

Unit expressions are a new type of operator that significantly simplifies and enhances scientific coding while enforcing strong typing. With unit expressions, we can define notations for mathematical expressions that incorporate unit types. This brings a new level of clarity and type safety to scientific calculations.

For example, consider a distance calculation where speed is defined as 100 miles per hour. By multiplying the speed (miles per hour) by the time (hours), we can obtain the distance as such:

Length distance = 100 mph * 3 hr;Force force = 5kg * 9.807 m/s/s;if(force == 49.035 N) {    // true}

The unit expressions allow us to express numeric values (or variables) along with their associated units. The compiler checks the compatibility of units, preventing incompatible conversions and ensuring accurate calculations. This feature streamlines scientific code and enables powerful calculations with ease.

Under the hood, a unit expression is just a conversion call. The expression 100 mph is converted to:

VelocityUnit.postfixBind(Integer.valueOf(100))

This expression returns a Velocity object. The expression 3 hr is similarly bound to the postfix method and returns a Time object. At this point, the Manifold Velocity class has a times method which as you recall, is an operator and its invoked on both results:

public Length times( Time t ) {    return new Length( toBaseNumber() * t.toBaseNumber(), LengthUnit.BASE, getDisplayUnit().getLengthUnit() );}

Notice that the class has multiple overloaded versions of the times method that accept different object types. A Velocity times Mass will produce Momentum. A Velocity times Force results in Power.

Many units are supported as part of this package even in this early experimental stage, check them out here.

You might notice a big omission here: Currency. I would love to have something like:

var sum = 50 USD + 70 EUR;

If you look at that code the problem should be apparent. We need an exchange rate. This makes no sense without exchange rates and possibly conversion costs. The complexities of financial calculations dont translate as nicely to the current state of the code. I suspect that this is the reason this is still experimental. Im very curious to see how something like this can be solved elegantly.

Pitfalls of Operator Overloading

While Manifold provides powerful operator overloading capabilities, it's important to be mindful of potential challenges and performance considerations. Manifold's approach can lead to additional method calls and object allocations, which may impact performance, especially in performance-critical environments. It's crucial to consider optimization techniques, such as reducing unnecessary method calls and object allocations, to ensure efficient code execution.

Lets look at this code:

var n = x + y + z;

On the surface, it can seem efficient and short. It physically translates to this code:

var n = x.plus(y).plus(z);

This is still hard to spot but notice that in order to create the result we invoke two methods and allocate at least two objects. A more efficient approach would be:

var n = x.plus(y, z);

This is an optimization we often do for high-performance matrix calculations. You need to be mindful of this and understand what the operator is doing under the hood if performance is important. I dont want to imply that operators are inherently slower. In fact theyre as fast as a method invocation, but sometimes the specific method invoked and amount of allocations are unintuitive.

Type Safety Features

The following arent related to operator overloading but they were a part of the second video so I feel they make sense as part of a wide-sweeping discussion on type safety. One of my favorite things about Manifold is its support of strict typing and compile time errors. To me, both represent the core spirit of Java.

JailBreak: Type-Safe Reflection

@JailBreak is a feature that grants access to the private state within a class. While it may sound bad, @JailBreak offers a better alternative to using traditional reflection to access private variables. By jailbreaking a class, we can access its private state seamlessly, with the compiler still performing type checks. In that sense, its the lesser of two evils. If youre going to do something terrible (accessing private state), then at least have it checked by the compiler.

In the following code, the value array is private to String yet we can manipulate it thanks to the @JailBreak annotation. This code will print Ex0osed:

@Jailbreak String exposedString = "Exposed...";exposedString.value[2] = '0';System.out.println(exposedString);

JailBreak can be applied to static fields and methods as well. However, accessing static members requires assigning null to the variable, which may seem counterintuitive. Nonetheless, this feature provides a more controlled and type-safe approach to accessing the internal state, minimizing the risks associated with using reflection.

@Jailbreak String str = null;str.isASCII(new byte[] { 111, (byte)222 });

Finally, all objects in Manifold are injected with a jailbreak() method. This method can be used like this (notice that fastTime is a private field):

Date d = new Date();long t = d.jailbreak().fastTime;

Self Annotation: Enforcing Method Parameter Type

In Java, certain APIs accept objects as parameters, even when a more specific type could be used. This can lead to potential issues and errors at runtime. However, Manifold introduces the @Self annotation, which helps enforce the type of the object passed as a parameter.

By annotating the parameter with @Self, we explicitly state that only the specified object type is accepted. This ensures type safety and prevents the accidental use of incompatible types. With this annotation, the compiler catches such errors during development, reducing the likelihood of encountering issues in production.

Lets look at the MySizeClass from my previous posts:

public class MySizeClass {    int size = 5;    public int size() {        return size;    }    public void setSize(int size) {        this.size = size;    }    public boolean equals(@Self Object o) {        return o != null && ((MySizeClass)o).size == size;    }}

Notice I added an equals method and annotated the argument with Self. If I remove the Self annotation this code will compile:

var size = new MySizeClass();size.equals("");size.equals(new MySizeClass());

With the @Self annotation the string comparison will fail during compilation.

Auto Keyword: A Stronger Alternative to Var

Im not a huge fan of the var keyword. I feel it didnt simplify much and the price is coding to an implementation instead of to an interface. I understand why the devs at Oracle chose this path. Conservative decisions are the main reason I find Java so appealing. Manifold has the benefit of working outside of those constraints and it offers a more powerful alternative called auto. auto can be used in fields and method return values, making it more flexible than var. It provides a concise and expressive way to define variables without sacrificing type safety.

Auto is particularly useful when working with tuples, a feature not yet discussed in this post. It allows for elegant and concise code, enhancing readability and maintainability. You can effectively use auto as a drop-in replacement for var.

Finally

Operator overloading with Manifold brings expressive and intuitive mathematical notation to Java, enhancing code readability and simplicity. While Java doesn't natively support operator overloading, Manifold empowers developers to achieve similar functionality and use familiar operators in their code. By leveraging Manifold, we can write more fluid and expressive code, particularly in scientific, mathematical, and financial applications.

The type safety enhancements in Manifold make Java more Well, Java like. It lets Java developers build upon the strong foundation of the language and embrace a more expressive type-safe programming paradigm.

Should we add operator overloading to Java itself?

I'm not in favor. I love that Java is slow, steady and conservative. I also love that Manifold is bold and adventurous. That way I can pick it when I'm doing a project where this approach makes sense (e.g. a startup project), but pick standard conservative Java for an enterprise project.

Logging Best Practices Revisited

Shai Almog — Mon, 15 May 2023 11:54:42 GMT

As I write this my interview on DevCentral hasn't started yet so if you subscribe to my blog or follow me on socials you might be able to catch it live. If not the recording should appear right here:

https://www.youtube.com/watch?v=CFL--dAX3FQ

Either way, this isn't the first time I wrote about or talked about logging and the common pitfalls we see when logging in production or debugging. I covered this extensively in the old blog. I also did a video covering these ideas. But my ideas somewhat evolved around some of the concepts I discussed.

In my original post, I was a bit harsh on AOP logging. My opinion on this has evolved. I think the main problem with AOP logging is that it is often used as a sledgehammer when debugging. Another problem is leaving it on in production. But when it is used surgically it can uncover problems that would be much harder to uncover in any other way.

The main message of the original post is still the most important part: we need company-wide standardization of logging. Without that our code review process is useless.

Logging is Precognitive Debugging

In my debugging book, I spent quite a bit of time talking about logging. First, it's important to understand that logging is very different from print debugging. When you use print statements for debugging they are ephemeral, in a bad way. You should use tracepoints. But more importantly, logging is about the bug that hasn't happened yet. Print debugging is about the bug that is already there.

They are nothing alike.

A log should describe our system. When we read a log we can often see the code quality without inspecting a single line of source. Uniformity, consistency, conciseness, order and value. These are all properties of a good log which is the output of a well-oiled machine. Logging is a user interface designed for your field-work engineering team. If it is written badly, they won't be as effective when carrying out their jobs. Your product will suffer.

Logging will pay back dividends with early detection of problems and simpler debugging of tests. But to do that we need to give a lot of thought to the core process.

Which variables should we log?

Where should we place the log?

How many logs should we have per block of code?

These are all questions that we can answer for the general case. I answered them all in the original blog. But that isn't an authoritative answer, it's an opinion. We need to enforce standards around this.

Let me qualify that last statement. We need standards. I love that we can measure coverage and then standardize the amount of test coverage. I think that can be very helpful. However, standardizing a fixed number like test coverage without flexibility leads to terrible code that's only designed to reach the unattainable metric. We need flexibility, and a baseline to align against. Not rigid rules.

Costs, Energy and Performance

While managers might look at the financial bottom line for overlogging. To me, the more significant aspect is the environmental impact. This has a cascading effect throughout our industry. More logging and ingestion require more computing services. If big companies take up more computing resources it drives up pricing for all of us due to scarcity.

We can do our part for the environment, the company's bottom line and our industry. There are many strategies we can take to reduce logging to a reasonable minimum. Setting the log levels intelligently and consistently. Monitoring our logs regularly, etc.

It is often that a request that would have been served only by the cache is forced to perform an IO operation to satisfy logging. The impact on overall system performance can be tremendous yet hard to notice. If our production and dev environments differ these subtle differences can further mask such inconsistencies.

Join Us

I hope to see your questions in the live stream or here before/after the fact. When I gave my logging talk before, I got amazing and highly engaging feedback from the audience. Either way, the recording should be there after the fact so check it out.

Logging got a lot of attention as a pillar of observability. However, the developer perspective of logging seems to have fallen to the wayside and doesn't enjoy the same level of attention. Let's change that.

Extending Java APIs - Add Missing Features Without the Hassle

Shai Almog — Tue, 09 May 2023 14:00:59 GMT

The Java API is vast. Thats great, but sometimes a missing method or capability can be frustrating. With Manifold, developers can solve this problem without having to wait for Java to add a feature in a later version. Even more importantly, Manifold provides many such extensions out of the box, making Java better for developers. In this post, we will discuss the extension capability. It lets us change classes in the Java API in a compatible way without risk. If Java adds these APIs later then we can seamlessly switch to them.

https://www.youtube.com/watch?v=4wv-wtxa9-g

Before I proceed, the code for this tutorial and the previous tutorials in the series is available here. I will skip setting up as it was covered in the previous installments of the series. You can see the specific pom file settings for this time in the project directly.

Extensions

One frequent complaint about Java is verbosity. With Manifold, developers can remove unnecessary code by using a simple trick. For instance, the stream() method in Java is redundant. With Manifold, we can remove that stream method using extensions.

Extensions work similarly to interface default methods. We can define a method that appears to be part of an API, but we write it in a separate class. Manifold finds the extension by looking for packages with a name matching the class name, in the code below you will notice the package name is com.debugagent.extensions.java.util.Collection. This package includes the full class name for the Collection interface. In this package, we can create extension classes by marking them with the @Extension annotation. We can extend any object or interface in this way:

package com.debugagent.extensions.java.util.Collection;import com.debugagent.extensions.Sizable;import manifold.ext.rt.api.Extension;import manifold.ext.rt.api.This;import java.util.Collection;import java.util.function.Function;import java.util.stream.Stream;@Extensionpublic class CollectionExt {    public static  Stream map( @This Collection t, Functionsuper E, R> mapper ) {        return t.stream().map(mapper);    }}

The sample code here adds a map method to the Collection interface. Notice a few things about the code:

This is passed as the first argument and annotated as @This.
We use the @Extension annotation to denote that this is an extension.
The method itself is static.
Generics and other language features still work as expected.

Once we include that code in that package we can change this:

List numbers = strings.stream().map(Integer::parseInt).toList();

To this:

List numbers = strings.map(Integer::parseInt).toList();

That is a small change but it makes Java a bit less verbose. You might read that code and think this is a lot of code to write in order to save nine characters. Youd be right. We dont need to write that code at all

Extension Libraries

Manifold has ready-made extensions for many built-in Java classes. In fact, all the methods of Stream are already mapped to Collection by Manifold. We can package our extensions as libraries and use the ones from Manifold, they dont collide. E.g. if Manifold extends a class by adding method x() and add method y() to the same class they will both work as expected.

To understand this and some of the other capabilities around this feature we first need to understand how it works. Manifold cant add a method to an existing Java class. The JVM doesnt allow it. Instead, it does something very simple, it replaces the call to the class with a call to the extension. Since this is done during compile time the call is static and very efficient. Theres no inherent runtime overhead.

If during this process Manifold notices that a method with that signature already exists, it uses the actual method. This is important, if we want to use a feature that isnt yet available in the current version of Java, we can mock it with Manifold and then when we finally upgrade the transition would be seamless. We wont even need to remove Manifold!

The extension libraries include too many features to cover here. String is extended with many common methods such as substringAfterLast(delimiter), removePrefix(String), padStart(length, char), etc. Code like this does what you would expect:

var str = "https://debugagent.com";str = str.removePrefix("https://");

One of the great parts about this is that a control-click (or meta-click) navigates to the right code within Manifold. So we can see exactly what goes on under the hood. Manifold includes the following builtin extensions (taken from GitHub here):

Collections

Defined in module manifold-collections this library extends:

java.lang.Iterable
java.util.Collection
java.util.List
java.util.stream.Stream

Text
Defined in module manifold-text this library extends:

java.lang.CharSequence
java.lang.String

I/O
Defined in module manifold-io this library extends:

java.io.BufferedReader
java.io.File
java.io.InputStream
java.io.OutputStream
java.io.Reader
java.io.Writer

Web/JSON

Defined in module manifold-json this library extends:

java.net.URL
manifold.rt.api.Bindings

Extending Arrays

Arrays in Java are special objects. You cant inherit them or override them. Sure, they are technically an object, but there isnt much we can do. An array in Java has only one attribute and its pretty limited. With Manifold, any array has many methods that are already built in. Most of them are from the Arrays class and now occupy their rightful place within the object itself.

With Manifold, this is legal code:

String[] arr = strings.toArray(new String[0]);if(arr.isEmpty()) {    // ...}

We can extend an array like we can extend any other class. We extend it using the special case class name manifold.rt.api.Array. In that class, the methods are the same as the other extension code but since primitive arrays and object arrays are incompatible, the @This parameter is always an Object.

Structural Interfaces

Manifold can add methods to classes and interfaces. But it cant make a class implement an arbitrary interface e.g. I cant make ArrayList implement Runnable. However, we can do something else. The main limitation is due to the way Manifold works. Since Manifold changes the call at compile time, the method isnt really added to the class. As a result it cant generate an interface, only a proxy of sorts. This might seem good on the surface but it creates some compatibility issues as it can impact object identity, etc.

However, we can get around that problem partially using structural interfaces. We can annotate an interface with @Structural, and it will work differently than a regular interface.

We can define a structural interface like this:

@Structuralpublic interface Sizable {    int size();}

Notice that nothing is special about this interface other than the structural annotation. We can then write a String extension to implement the size() method and implement the Sizable interface as such:

package com.debugagent.extensions.java.lang.String;import com.debugagent.extensions.Sizable;import manifold.ext.rt.api.Extension;import manifold.ext.rt.api.This;@Extensionpublic abstract class StringExt implements Sizable {    public static int size(@This String str) {        return str.length();    }}

This might seem a bit weird at first as the class is abstract. This doesnt matter. Manifold only uses the static methods so it never needs to create an instance of the class. We implement the interface as a hint to Manifold. The abstract keyword is there just so we wont need to implement the interface.

We can make the CollectionExt example from before abstract as well and implement Sizable there too. Since it already has the size() method this is all it requires. As a result, we can write code like this showing that the interface that we made up can now be used to write polymorphic code utilizing String and Collection:

List strings = List.of("1", "2", "3");var str = "https://debugagent.com";Sizable s = str;Sizable v = strings;s.size();

But it doesnt end here. When we create a regular class like this one that doesnt implement the Sizable interface:

public class MySizeClass {    public int size() {        return 0;    }}

We can now do this:

Sizable z = new MySizeClass();

This effectively makes Java into a structurally typed language. I dont know how I feel about that particular feature, but it sure is an interesting feature.

Adding Annotations

Amazingly enough, Manifold supports extending a class with custom compile-time annotations. This means we can inject Structural to an interface in the JDK such as Runnable. This will surprisingly work to some degree.

However, since the JDK itself cant be changed, we wont see this impacting code in the JDK. Only our code will be impacted by any such change we make.

Final Word

Manifold makes it easy to change Java and experiment with new features. Developers can contribute to Project Manifold and help a broader audience see the value of their proposed change. If you ever felt like Java was missing something, Manifold is your way to add that to Java. Its much easier to change something there than in OpenJDK

I think this is one of the most important features in Manifold, not because its my favorite. But because it democratizes Java. If you never had the chance to contribute to the Java API then now is a great opportunity. Find a pet peeve in the JDK and fix it!

Java String Templates Today

Shai Almog — Tue, 02 May 2023 13:40:00 GMT

In our last post, we introduced you to the Manifold project and how it offers a revolutionary set of language extensions for Java, including the ability to parse and process JSON files seamlessly in Java. Today, we will take a look at another exciting feature of the Manifold project: string templates.

https://www.youtube.com/watch?v=ite6a5y50yQ

But before we get to that, some of the feedback I got from the previous post was that I was unclear about Manifold. Manifold is a combination of an IDE plugin and plugins to Maven or Gradle. Once used we can enhance the Java language (or environment) almost seamlessly in a fluid way.

A frequent question was how is it different from something like Lombok?

There are many similarities and in fact, if you understand Lombok then you are on your way to understand Manifold. Lombok is a great solution for some problems in the Java language. It is a bandaid on the verbosity of Java and some of its odd limitations (I mean bandaid as a compliment, no hate mail). Manifold differs from Lombok in several critical ways:

Its modular - all the extensions built into Manifold are separate from one another. You can activate a particular feature or leave it out of the compiler toolchain.
Its bigger - Lombok has many features but Manifold's scope is fantastic and far more ambitious.
It tries to do the right thing - Lombok is odd. We declare private fields but then use getters and setters as if they arent private. Manifold uses properties (which we will discuss later) that more closely resemble what the Java language should have offered.

Manifold also has some drawbacks:

It only works as a compiler toolchain and only in one way. Lombok can be compiled back to plain Java source code and removed.
It only supports one IDE - IntelliJ.

These partially relate to the age of Manifold which is a new project by comparison. But it also relates to the different focus. Manifold focuses on language functionality and a single working fluid result.

JEP 430 String Interpolation

One of the big features coming to JDK 21 is JEP 430, which is a string interpolation language change. It will allow writing code like this:

String name = "Joan";String info = STR."My name is \{name}";

In this case, info will have the value My name is Joan. This is just the tip of the iceberg in this JSR as the entire architecture is pluggable. I will discuss this in a future video but for now, the basic functionality we see here is pretty fantastic.

Unfortunately, it will take years to use this in production. It will be in preview in JDK 21, then it will be approved. We will wait for an LTS, and then wait for the LTS to reach critical mass. In the meantime, can we use something as nice as this today?

Maven Dependencies

Before we dive into the code, I want to remind you that all the code for this and other videos in this series is available on GitHub (feel free to star it and follow).

String templating has no dependencies. We still need to make changes to the pom file but we dont need to add dependencies. Im adding one dependency here for the advanced templates we will discuss soon. All thats needed is the compiler plugin. That means that string templates are a compile-time feature and have no runtime impact!

<dependencies>   <dependency>       <groupId>systems.manifoldgroupId>       <artifactId>manifold-templates-rtartifactId>       <version>${manifold.version}version>   dependency>dependencies><build>   <plugins>       <plugin>           <groupId>org.apache.maven.pluginsgroupId>           <artifactId>maven-compiler-pluginartifactId>           <version>3.8.0version>           <configuration>               <source>19source>               <target>19target>               <encoding>UTF-8encoding>               <compilerArgs>                                      <arg>-Xplugin:Manifoldarg>               compilerArgs>                              <annotationProcessorPaths>                   <path>                       <groupId>systems.manifoldgroupId>                       <artifactId>manifold-stringsartifactId>                       <version>${manifold.version}version>                   path>                   <path>                       <groupId>systems.manifoldgroupId>                       <artifactId>manifold-templatesartifactId>                       <version>${manifold.version}version>                   path>               annotationProcessorPaths>           configuration>       plugin>   plugins>build>

Manifold String Interpolation

To begin, we can create a new variable that we can use to get external input. In the second line, we integrate that variable into the printout:

String world = args.length > 0 ? args[0] : "world";System.out.println("Hello $world! I can write \$world as the variable...");

The backslash syntax implicitly disables the templating behavior, just like in other string elements in Java. This will print Hello world! I can write $world as the variable.

Theres something that you cant really see in the code, you need to look at a screenshot of the same code:

Its subtle, do you see it?

Notice the $world expression, it is colored differently. It's no longer just a string but a variable embedded in a string. This means that we can control-click it and go to the variable declaration, rename it, or see find its usage.

There's another way to escape a string, and we can use the @DisableStringLiteralTemplates annotation on a method or a class to disable this functionality in the respective block of code. This can be useful if we use the dollar sign frequently in a block of code:

@DisableStringLiteralTemplatesprivate static void noTemplate(String word) {   System.out.println("Hello $world!");}

Templates

The Manifold project allows us to create JSP-like templates without all of the baggage. We can define a base class to a template to create generic code for the templates and place common functionality in a single location. We can create a file called HelloTemplate.html.mtl in the resources/templates directory with the following content. Notice the params we define in the template file can be anything:

<%@ params(String title, String body) %>html><html lang="en"><head> <meta charset="UTF-8"> <title>${title}title>head><body>   ${body}body>html>

This will seem very familiar to those of us with a JSP background. We can then use the file in the Java code like this, we can pass the parameters and they will replace the appropriate blocks in the HTML file:

System.out.println(HelloTemplate.render("My Title", "My Body"));

Notice the generated template it compiled to a class, similarly to JSP. Unlike JSP this template isnt a servlet and can be used in any context. A local application, a server, etc. The templating language is more lightweight and doesnt depend on various server APIs. It is also less mature. The main value is in using such an API to generate arbitrary files like Java source files or configuration files.

The templating capabilities are powerful yet simple. Just like we could in JSP, we can embed Java source code into the template e.g. we can include control flow and similar restrictions just like we could in JSP:

<% if(body != null) {%>   ${body}<% } %>

Why Not: JSP, Velocity, Thymeleaf or Freemarker?

There are so many templating languages in Java already Adding yet another one seems like a heavy burden of replication. I think all of those are great and this isnt meant to replace them, at least not yet.

Their focus is very much on web generation, they might not be ideal for more fluid use cases like code generation or web frameworks like Spark.

Another big advantage is size and performance. All of these frameworks have many dependencies and a lot of runtime overhead. Even JSP performs the initial compilation in runtime by default. This templating support is compiled and Spartan, in a good way. Its fast, simple and deeply integrated into the application flow.

Import

We can import Java packages just like we can in every Java class using code like this:

<%@ import com.debugagent.stringtemplates.* %>

Once imported we can use any class within the code. Notice that this import statement must come above other lines in the code, just like a regular import statement.

Include

We can use include to simply include another template into the current template, allowing us to assemble sophisticated templates like headers and footers. If we want to generate a complex Java class, we can wrap the boilerplate in a generic template and include that in. We can conditionally include a template using an if statement and use a for loop to include multiple entries:

<%@ include JavaCode("ClassName", classBody) %>

Notice that we can include an entry with parameters and pass them along to the underlying template. We can pass hardcoded strings or variables along the include chain.

A Lot More

I skipped extends because of a documentation issue, which has since been fixed. it has a lot of potential. Theres layout functionality that has a lot of potential but is missing parameter passing at the moment. But the main value is in the simplicity and total integration.

When I define a dependency on a class and remove it from the code the error appears even in the template file. This doesnt happen in Thymeleaf.

Final Word

In conclusion, with the Manifold project, we can write fluent text processing code today without waiting for a future JVM enhancement. The introduction of String templates can help Java developers generate files that aren't a web application, which is useful in several cases e.g. where code generation is needed.

Manifold allows us to create JSP-like templates without all of the baggage and generate any arbitrary file we want. With the inclusion of sophisticated options like layout, the sky's the limit.

Theres a lot more to Manifold and we will dig deeper into it as we move forward.

Revolutionize JSON Parsing in Java with Manifold

Shai Almog — Tue, 25 Apr 2023 12:53:13 GMT

Java developers have often envied JavaScript for its ease of parsing JSON. Although Java offers more robustness, it tends to involve more work and boilerplate code. Thanks to the Manifold project, Java now has the potential to outshine JavaScript in parsing and processing JSON files. Manifold is a revolutionary set of language extensions for Java that completely changes the way we handle JSON (and much more).

https://www.youtube.com/watch?v=AoBnGZ7q6rk

Getting Started with Manifold

The code for this tutorial can be found on my GitHub page. Manifold is relatively young but already vast in its capabilities. You can learn more about the project on their website and Slack channel. To begin, you'll need to install the Manifold plugin, which is currently only available for JetBrains IDEs. The project supports LTS releases of Java, including the latest JDK 19.

We can install the plugin from IntelliJ/IDEAs settings UI by navigating to the marketplace and searching for Manifold. The plugin makes sure the IDE doesnt collide with the work done by the Maven/Gradle plugin.

Manifold consists of multiple smaller projects, each offering a custom language extension. Today, we'll discuss one such extension, but there's much more to explore.

Setting Up a Maven Project

To demonstrate Manifold, we'll use a simple Maven project (it also works with Gradle). We first need to paste the current Manifold version from their website and add the necessary dependencies. The main dependency for JSON is the manifold-json-rt dependency. Other dependencies can be added for YAML, XML, and CSV support. We need to add this to the pom.xml file in the project.

I'm aware of the irony where the boilerplate reduction for JSON starts with a great deal of configuration in the Maven build script. But this is configuration, not "actual code" and it's mostly copy & paste. Notice that if you want to reduce this code the Gradle equivalent code is terse by comparison.

This line needs to go into the properties section:

<manifold.version>2023.1.5manifold.version>

The dependencies we use are these:

<dependencies>   <dependency>       <groupId>systems.manifoldgroupId>       <artifactId>manifold-json-rtartifactId>       <version>${manifold.version}version>   dependency>

The compilation plugin is the boilerplate that weaves Manifold into the bytecode and makes it seamless for us. Its the last part of the pom setup:

<build>   <plugins>       <plugin>           <groupId>org.apache.maven.pluginsgroupId>           <artifactId>maven-compiler-pluginartifactId>           <version>3.8.0version>           <configuration>               <source>19source>               <target>19target>               <encoding>UTF-8encoding>               <compilerArgs>                                      <arg>-Xplugin:Manifoldarg>               compilerArgs>                              <annotationProcessorPaths>                   <path>                       <groupId>systems.manifoldgroupId>                       <artifactId>manifold-jsonartifactId>                       <version>${manifold.version}version>                   path>               annotationProcessorPaths>           configuration>       plugin>   plugins>build>

With the setup complete, let's dive into the code.

Parsing JSON with Manifold

We place a sample JSON file in the project directory under the resources hierarchy. I placed this file under src/main/resources/com/debugagent/json/Test.json:

{  "firstName": "Shai",  "surname": "Almog",  "website": "https://debugagent.com/",  "active": true,  "details":[    {"key": "value"}  ]}

In the main class, we refresh the Maven project, and you'll notice a new Test class appears. This class is dynamically created by Manifold based on the JSON file. If you change the JSON and refresh Maven, everything updates seamlessly. Its important to understand that Manifold isnt a code generator. It compiles the JSON we just wrote into bytecode.

The Test class comes with several built-in capabilities, such as a type-safe builder API that lets you construct JSON objects using builder methods. You can also generate nested objects and convert the JSON to a string by using the write() and toJson() methods.

It means we can now write:

Test test = Test.builder().withFirstName("Someone")        .withSurname("Surname")        .withActive(true)        .withDetails(List.of(                Test.details.detailsItem.builder().                        withKey("Value 1").build()        ))        .build();

Which will printout the following JSON:

{  "firstName": "Someone",  "surname": "Surname",  "active": true,  "details": [    {      "key": "Value 1"    }  ]}

We can similarly read a JSON file using code such as this:

Test readObject = Test.load().fromJson("""        {          "firstName": "Someone",          "surname": "Surname",          "active": true,          "details": [            {              "key": "Value 1"            }          ]        }        """);

Note the use of Java 15 TextBlock syntax for writing a long string. The load() method returns an object that includes various APIs for reading the JSON. In this case, it is read from a String but there are APIs for reading it from a URL, file, etc.

Manifold supports various formats, including CSV, XML, and YAML, allowing you to generate and parse any of these formats without writing any boilerplate code or sacrificing type safety. In order to add that support we will need to add additional dependencies to the pom.xml file:

   <dependency>       <groupId>systems.manifoldgroupId>       <artifactId>manifold-csv-rtartifactId>       <version>${manifold.version}version>   dependency>   <dependency>       <groupId>systems.manifoldgroupId>       <artifactId>manifold-xml-rtartifactId>       <version>${manifold.version}version>   dependency>   <dependency>       <groupId>systems.manifoldgroupId>       <artifactId>manifold-yaml-rtartifactId>       <version>${manifold.version}version>   dependency>dependencies>

With these additional dependencies, this code will print out the same data as the JSON file... With test.write().toCsv() the output would be:

"firstName","surname","active","details""Someone","Surname","true","[manifold.json.rt.api.DataBindings@71070b9c]"

Notice that the Comma Separated Values (CSV) output doesnt include hierarchy information. Thats a limitation of the CSV format and not the fault of Manifold.

With test.write().toXml() the output is familiar and surprisingly concise:

<root_object firstName="Someone" surname="Surname" active="true">  <details key="Value 1"/>root_object>

With test.write().toYaml() we again get a familiar printout:

firstName: Someonesurname: Surnameactive: truedetails:- key: Value 1

Working with JSON Schema

Manifold also works seamlessly with JSON schema, allowing you to enforce strict rules and constraints. This is particularly useful when working with dates and enums. Manifold seamlessly creates/updates byte code that adheres to the schema, making it much easier to work with complex JSON data.

This schema is copied and pasted from the Manifold github project:

{  "$schema": "http://json-schema.org/draft-07/schema#",  "$id": "http://example.com/schemas/User.json",  "type": "object",  "definitions": {    "Gender": {      "type": "string",      "enum": ["male", "female"]    }  },  "properties": {    "name": {      "type": "string",      "description": "User's full name.",      "maxLength": 80    },    "email": {      "description": "User's email.",      "type": "string",      "format": "email"    },    "date_of_birth": {      "type": "string",      "description": "Date of uses birth in the one and only date standard: ISO 8601.",      "format": "date"    },    "gender": {      "$ref" : "#/definitions/Gender"    }  },  "required": ["name", "email"]}

Its a relatively simple schema but Id like to turn your attention to several things here. It defines name and email as required. This is why when we try to create a User object using a builder in Manifold, the build() method requires both parameters:

User.builder("Name", "email@domain.com")

That is just the start... The schema includes a date. Dates are a painful prospect in JSON, the standardization is poor and fraught with issues. The schema also includes a gender field which is effectively an enum. This is all converted to type-safe semantics using common Java classes such as LocalDate:

User u = User.builder("Name", "email@domain.com")       .withDate_of_birth(LocalDate.of(1999, 10, 11))       .withGender(User.Gender.male)       .build();

That can be made even shorter with static imports but the gist of the idea is clear. JSON is effectively native to Java in Manifold.

The Tip of The Iceberg

Manifold is a powerful and exciting project. It revolutionizes JSON parsing in Java but thats just one tiny portion of what it can do!

We've only scratched the surface of its capabilities in this post. In the next article, we'll dive deeper into Manifold and explore some additional unexpected features.

Please share your experience and thoughts about Manifold in the comments section. If you have any questions, don't hesitate to ask.

Spring Boot Debugging with Aspect-Oriented Programming (AOP)

Shai Almog — Tue, 18 Apr 2023 15:08:57 GMT

Aspect-Oriented Programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of cross-cutting concerns. In simple terms, it helps you keep your code clean, modular, and easy to maintain. We can leverage AOP to debug Spring Boot applications more effectively and seamlessly. In this post, we will explore how to use AOP to debug a Spring Boot application effectively.

https://youtu.be/6kxpkcEqssw

AOP Concepts

Before diving into debugging with AOP, it is essential to understand the core concepts. AOP lets us write code that executes before or after a method. It includes the following common terms:

Aspect: An aspect represents the cross-cutting concerns or the functionality that needs to be applied throughout the application.
Join Point: A join point is a specific point in the execution flow of the application, where an aspect can be applied.
Advice: The action taken by an aspect at a specific join point is called advice. There are different types of advice, such as before, after, around, and after throwing.
Pointcut: A pointcut is a set of join points where an aspect should be applied.

I will skip the discussion on weaving since theres a lot of nuance related to that and I want to focus on the debugging aspects. I don't want to make this into an AOP tutorial. However, theres one general aspect of AOP I want to discuss.

While AOP offers significant benefits in terms of modularity, maintainability, and debugging capabilities, it's important to be aware of the potential performance implications.

Using AOP can introduce some overhead, primarily due to the creation and execution of proxy objects, which intercept method calls and apply the specified advices. The impact on performance can vary depending on the number of aspects, the complexity of pointcut expressions, and the type of advice used. For example, around advice is typically more expensive in terms of performance compared to before and after advice.

To minimize the performance impact of AOP, consider the following best practices:

Be selective: Apply AOP to the critical parts of your application where it provides the most value. Avoid using aspects for trivial or infrequent operations.
Optimize pointcut expressions: Ensure that your pointcut expressions are as precise as possible. This can help minimize the number of unnecessary interceptions and reduce the overhead.
Limit advice execution: Choose the appropriate type of advice for your use case. For example, use before or after advice instead of around advice when possible.
Use conditional aspects: If some aspects are only required for debugging or development purposes, use conditional aspects that can be enabled or disabled based on a configuration property. This ensures that the performance impact is limited to specific environments or scenarios.
Monitor and measure: Regularly monitor and measure the performance of your application to ensure that the overhead introduced by AOP is within acceptable limits. Use profiling tools to identify potential bottlenecks and optimize your AOP implementation accordingly.

Logging Aspect

Let's create a simple aspect to log the execution time of methods in our Spring Boot application. This can help identify performance bottlenecks.

We can create a new class called 'LoggingAspect' and annotate it with @Aspect and @Component to indicate that it's an aspect and a Spring-managed bean. We implement a pointcut expression to target the methods you want to measure. For example, you can target all public methods in a specific package. We then implement an around advice to measure the execution time and log the results.

Here's an example of a LoggingAspect:

@Aspect@Componentpublic class LoggingAspect {  private static final Logger logger = LoggerFactory.getLogger(LoggingAspect.class);  @Pointcut("execution(public * com.example.myapp..*.*(..))")  public void publicMethods() {}  @Around("publicMethods()")  public Object logExecutionTime(ProceedingJoinPoint joinPoint) throws Throwable {    long startTime = System.currentTimeMillis();    Object result = joinPoint.proceed();    long elapsedTime = System.currentTimeMillis() - startTime;    logger.debug("Method [{}] executed in {} ms", joinPoint.getSignature(), elapsedTime);    return result;  }}

Pointcut methods are empty because their sole purpose is to define a pointcut expression, which determines the join points (i.e., specific points in the execution of a program) where the aspect should be applied. The method body itself does not contain any logic, as the behavior of the aspect is defined in the advice methods (e.g., @Before, @After, @Around, etc.).

The pointcut method serves as a reusable reference to a specific pointcut expression, making it easier to maintain and update if needed. By referring to a pointcut method in your advice methods, you can apply the same pointcut expression to multiple pieces of advice without duplicating the expression. We will see that in action below.

After implementing the LoggingAspect, we can run our application and observe the logs. We should now see the execution time for each targeted method as a poor man's profiler. Just like a regular profiler, this tool has many disadvantages and impacts the observed application. However, we can extract valuable data if we tune this correctly.

Logging All Methods

One of the common problems we face in big projects is flaky tests and failures. These are especially hard to understand as we might not have enough logging data.

Adding logs to the entire application is impractical in most cases and wed only want such logs for specific CI executions. We wouldnt want to over-log in production or even maintain such logging code. However, knowing the parameters that every method received and its return value can lead us to understand a failure after the fact.

The following code shows such a logger that can print out every entry/exit and the arguments or return values:

@Aspect@Componentpublic class LoggingAspect {    private static final Logger logger = LoggerFactory.getLogger(LoggingAspect.class);    @Pointcut("execution(public * com.example.myapp..*.*(..))")    public void publicMethods() {}    @Around("publicMethods()")    public Object logMethodEntryAndExit(ProceedingJoinPoint joinPoint) throws Throwable {        // Log method entry and arguments        Object[] args = joinPoint.getArgs();        logger.debug("Entering method [{}] with arguments: {}", joinPoint.getSignature(), Arrays.toString(args));        // Execute the target method and capture the return value        Object result = joinPoint.proceed();        // Log method exit and return value        logger.debug("Exiting method [{}] with result: {}", joinPoint.getSignature(), result);        // Return the result        return result;    }}

The Tip of the Iceberg

Logging to keep track of performance or method entry/exit is powerful but basic. We can go much deeper than that. We can create an aspect to log incoming HTTP requests and responses. This aspect intercepts the methods with the @RequestMapping annotation, which is typically used for handling HTTP requests in Spring Boot applications:

@Aspect@Componentpublic class RequestResponseLoggingAspect {    private static final Logger logger = LoggerFactory.getLogger(RequestResponseLoggingAspect.class);    @Pointcut("@annotation(org.springframework.web.bind.annotation.RequestMapping)")    public void requestMappingMethods() {}    @Around("requestMappingMethods()")    public Object logRequestAndResponse(ProceedingJoinPoint joinPoint) throws Throwable {        HttpServletRequest request = ((ServletRequestAttributes) RequestContextHolder.currentRequestAttributes()).getRequest();        logger.debug("Request: {} {} - Parameters: {}", request.getMethod(), request.getRequestURI(), request.getParameterMap());        Object result = joinPoint.proceed();        if (result instanceof ResponseEntity) {            ResponseEntity responseEntity = (ResponseEntity) result;            logger.debug("Response: Status {} - Body: {}", responseEntity.getStatusCode(), responseEntity.getBody());        }        return result;    }}

Another important aspect is bean creation and destruction. We can create an aspect to log these events. This aspect intercepts the methods annotated with @PostConstruct and @PreDestroy which doesnt apply to all beans but would help us keep track of the applicable code in a large application:

@Aspect@Componentpublic class BeanLifecycleLoggingAspect {    private static final Logger logger = LoggerFactory.getLogger(BeanLifecycleLoggingAspect.class);    @Pointcut("@annotation(javax.annotation.PostConstruct)")    public void postConstructMethods() {}    @Pointcut("@annotation(javax.annotation.PreDestroy)")    public void preDestroyMethods() {}    @After("postConstructMethods()")    public void logBeanInitialization(JoinPoint joinPoint) {        logger.debug("Bean Initialized: {}", joinPoint.getTarget().getClass().getSimpleName());    }    @Before("preDestroyMethods()")    public void logBeanDestruction(JoinPoint joinPoint) {        logger.debug("Bean Destroyed: {}", joinPoint.getTarget().getClass().getSimpleName());    }}

We can even log dependency injection events. This aspect intercepts the methods with the @Autowired annotation. It doesnt track constructor injection though but we can use it to track the instance type that gets injected into a bean:

@Aspect@Componentpublic class DependencyInjectionLoggingAspect {    private static final Logger logger = LoggerFactory.getLogger(DependencyInjectionLoggingAspect.class);   @Pointcut("@annotation(org.springframework.beans.factory.annotation.Autowired)")    public void autowiredMethods() {}    @Before("autowiredMethods()")    public void logDependencyInjection(JoinPoint joinPoint) {        logger.debug("Autowired: {} - Target: {}", joinPoint.getSignature(), joinPoint.getTarget().getClass().getSimpleName());    }}

Final Word

AOP is a fantastic debugging tool. In this post, I skimmed a lot of big ideas and overused logging. When tracking performance a better approach would be to accumulate values and then log the statistics in the end, thus removing the overhead of the logger (which impacts the results).

The one thing I recommend is turning this off by default. Its important to run a CI cycle without AOP, the cost is too big and we can seriously impact the final result. We need to turn this on surgically when we need to understand something deep. In those situations, AOP tooling can make the difference when searching for that needle in the haystack.

Continuing Hello World

Shai Almog — Tue, 11 Apr 2023 14:45:40 GMT

I taught myself to code from scratch. I just started writing in Basic on my Sinclair when I was in kindergarten and got it. I later picked up the books and courses easily without that initial study period of formal training. When I got to advanced academic materials I already knew pretty much everything. As a matter of fact, I never learned much from teachers. I learned to read mostly on my own and followed through with most disciplines.

This isnt a brag because the reason for that is a learning disability. Its hard for me to understand teachers and communicate with them. This was a problem in my childhood but became a bigger problem when I started to teach programming in a local computer lab at 16. I was a terrible instructor. Typically we would mimic the skills of a good role model when performing a task, I didnt have any good teachers. At least not good for my learning disability.

I kept teaching on and off for over a decade. It wasnt because I was good (I wasnt), but you couldnt tell that from the feedback I received. Most students gave me high ratings, why did they?

The main reason was that I was knowledgeable and made an effort. The students who didnt get it would end up blaming themselves or making an extreme effort to understand. This dragged on for a while until I faced a class in which I got a bad review. It stung and I initially rejected the notion that I could be the one at fault. Eventually, it sunk in.

I took that to heart and improved, Im still not a great teacher. One of my shortcomings is my thought process isnt visual enough so I dont communicate as visually as most students need. Im working on that, its an ongoing process.

People like me who arent natural teachers need to overcompensate in other departments. One of the biggest tricks up my sleeve is my approach to learning. I teach theory in short bursts while focusing on building something cool.

Why do we Have Hello World?

As a teacher, I usually taught advanced classes. Those are the easy classes to teach. In these classes, the students already have a decent baseline. If they dont understand something they can check and they dont get intimidated or ashamed because they missed something the teacher mentioned.

A beginner class is an enormous challenge. If I explain something only once, or I forget to mention something, I might lose the entire class. People might be missing basic knowledge and if I explain it too much or too thoroughly some students might get bored and lose interest.

You lose if you do and you lose if you dont. Theres very little winning in this area.

Thats why we have the hello world. It isnt just to show how the language looks or how the tools function. Its a win. Once we compile and run a version of hello world we accomplish something and that is the most important part. It gives students the motivation to keep going and double down. Even if I missed something when teaching, they might make the extra effort to go back and learn or ask a question if they feel a sense of accomplishment.

A few years ago gamification was a hot trend. Thankfully its out of vogue, I never liked it. Stack overflow is great (that statement doesnt condone the actions of overzealous rogue moderators), but what I liked about stack overflow wasnt the gamification. Its the accomplishment that rewarded that. I enjoy the points or badges not because they are points or badges. I enjoy them because I earned them by giving a great answer. This is the core of the matter and what most learning experiences miss: winning.

In it to Win

I see a syllabus like this in practically every beginner oriented course:

Hello world
Variables
If statements
Loops

We accomplish one thing. Then its study, theres no reward for successful accomplishment or something measured that would trigger excitement. I get why we teach that way People need the theory. But is that theory sinking in if we learn it sequentially?

When we build an ongoing demo and constantly improve it; we learn the theory while continuously accomplishing such goals. Better yet, we can intertwine it with a narrative story of developing an application or a game that helps memorize and form a narrative trail. Theres one downside: the tutorial sucks as a reference since the ideas are mixed within an additional narrative.

But the advantage is the show dont tell approach. I love encapsulation. Its wonderful. Dont explain it. Show me why the code looks bad if I dont use it. Dont explain every type and waste my time, show me how to build stuff and then go back and refine it to explain the nuances involved in practical terms. This is the exact approach I tried to take with my Java Basics free course. The idea is that we can build a Wordle clone without knowing anything and slowly refine it into a real-world application.

https://www.youtube.com/watch?v=1Bum2gYETUQ&list=PL8GhfcywW9YMucwRw2IbpeCp1FBMEgsmk

Yes, we need to understand some basics before we can write something useful. But the bar is very low. Better yet, we need to make flawed and imperfect code so we can show how to improve and fix it.

Achievement Unlocked

When we say gamification this is what it really means; to achieve something meaningful and not merely satisfy a metric a company needs to further push a user through a funnel. This applies to teaching coding skills and to your SaaS.

Quite a few SaaS companies still show pointless awards and badges to get us engaged. Even kids dont buy that nonsense in their iPad games. A reward matters only when you made an actual effort and should signify something with meaning. Theres harm in those nonsense awards as they reduce your credibility. When you have an important milestone that award wont matter as much. In any product, we need to divide the steps into achievable hops where something meaningful is accomplished before moving to the next step.

An important aspect is milestone communication. A person needs to understand the goals, short and long-term. When I teach I start a lesson by discussing what they will know by the end of the lesson: why should you pay attention?

When I start the course I need to discuss the ending. What you will know and ideally show what we will build. I didnt do this for the videos in the Java Basics course since Im building them as I go along. That is not an ideal scenario. But I might revisit the first video in the series and redo it once Im done. This gives us the motivation to work towards a clear goal.

Relearning Java Thread Primitives

Shai Almog — Tue, 04 Apr 2023 15:49:54 GMT

Ive coded in Java since the first beta, even back then threads were at the top of my list of favorite features. Java was the first language to introduce thread support in the language itself, it was a controversial decision back then. In the past decade, every language raced to include async/await and even Java had some third-party support for that But Java zigged instead of zagging and introduced the far superior virtual threads (project Loom). This post isnt about that.

I think its wonderful and proves the core power of Java. Not just as a language but as a culture. A culture of deliberating changes instead of rushing into the fashionable trend.

In this post, I want to revisit the old ways of doing threading in Java. Im used to synchronized, wait, notify, etc. But it has been a long time since they were the superior approach for threading in Java. Im part of the problem, Im still used to these approaches and find it hard to get used to some APIs that have been around since Java 5. It's a force of habit. There are many great APIs for working with threads which I discuss in the videos here, but I want to talk about locks which are basic yet important.

https://www.youtube.com/watch?v=pUnX8r-6IIo

https://www.youtube.com/watch?v=pOrhJ8o9TT0

Synchronized vs. ReentrantLock

A reluctance I had with leaving synchronized is that the alternatives arent much better. The primary motivation to leave it today is that at this time synchronized can trigger thread pinning in Loom which isnt ideal. JDK 21 might fix this (when Loom goes GA), but it still makes some sense to leave it.

The direct replacement for synchronized is ReentrantLock. Unfortunately, ReentrantLock has very few advantages over synchronized so the benefit of migrating is dubious at best. In fact, it has one major disadvantage to get a sense of that lets look at an example, this is how we would use synchronized:

synchronized(LOCK) {    // safe code}LOCK.lock();try {    // safe code} finally {    LOCK.unlock();}

The first disadvantage of ReentrantLock is the verbosity. We need the try block since if an exception occurs within the block the lock will remain. Synchronized handles that seamlessly for us.

Theres a trick some people pull of wrapping the lock with AutoClosable which looks roughly like this:

public class ClosableLock implements AutoCloseable {   private final ReentrantLock lock;   public ClosableLock() {       this.lock = new ReentrantLock();   }   public ClosableLock(boolean fair) {       this.lock = new ReentrantLock(fair);   }   @Override   public void close() throws Exception {       lock.unlock();   }   public ClosableLock lock() {       lock.lock();       return this;   }   public ClosableLock lockInterruptibly() throws InterruptedException {       lock.lock();       return this;   }   public void unlock() {       lock.unlock();   }}

Notice I dont implement the Lock interface which would have been ideal. Thats because the lock method returns the auto-closable implementation instead of void.

Once we do that, we can write more concise code such as this:

try(LOCK.lock()) {    // safe code}

I like the reduced verbosity but this is a problematic concept since try-with-resource is designed for the purpose of cleanup and we reuse locks. It is invoking close but we will invoke that method again on the same object. I think it might be nice to extend the try with resource syntax to support the lock interface. But until that happens, this might not be a worthwhile trick.

Advantages of ReentrantLock

The biggest reason for using ReentrantLock is Loom support. The other advantages are nice but none of them is a killer feature.

We can use it between methods instead of in a continuous block. This is probably a bad idea as you want to minimize the lock area and failure can be a problem. I dont consider that feature as an advantage.

It has the option of fairness. This means that it will serve the first thread that stopped at a lock first. I tried to think of a realistic non-convoluted use case where this will matter and Im drawing blanks. If youre writing a complex scheduler with many threads constantly queued on a resource, you might create a situation where a thread is starved since other threads keep coming in. But such situations are probably better served by other options in the concurrency package. Maybe Im missing something here

lockInterruptibly() lets us interrupt a thread while its waiting for a lock. This is an interesting feature but again, hard to find a situation where it would realistically make a difference. If you write code that must be very responsive for interrupting you would need to use the lockInterruptibly() API to gain that capability. But how long do you spend within the lock() method on average?

There are edge cases where this probably matters but not something most of us will run into, even when doing advanced multi-threaded code.

ReadWriteReentrantLock

A much better approach is the ReadWriteReentrantLock. Most resources follow the principle of frequent reads and few write operations. Since reading a variable is thread safe theres no need for a lock unless were in the process of writing to the variable. This means we can optimize reading to an extreme while making the write operations slightly slower.

Assuming this is your use case, you can create much faster code. When working with a read-write lock we have two locks, a read lock as we can see in the following image. It lets multiple threads through and is effectively a free for all.

Once we need to write to the variable, we need to obtain a write lock as we can see in the following image. We try to request the write lock but there are threads still reading from the variable so we must wait.

Once the threads finished reading all reading will block and the write operation can happen from a single thread only as seen in the following image. Once we release the write lock, we will go back to the free for all situation in the first image.

This is a powerful pattern that we can leverage to make collections much faster. A typical synchronized list is remarkably slow. It synchronizes over all operations, read or write. We have a CopyOnWriteArrayList which is fast for reading, but any write is very slow.

Assuming you can avoid returning iterators from your methods you can encapsulate list operations and use this API. E.g. in the following code we expose the list of names as read-only but then when we need to add a name we use the write lock. This can outperform synchronized lists easily:

private final ReadWriteLock LOCK = new ReentrantReadWriteLock();private Collection listOfNames = new ArrayList<>();public void addName(String name) {   LOCK.writeLock().lock();   try {       listOfNames.add(name);   } finally {       LOCK.writeLock().unlock();   }}public boolean isInList(String name) {   LOCK.readLock().lock();   try {       return listOfNames.contains(name);   } finally {       LOCK.readLock().unlock();   }}

StampedLock

The first thing we need to understand about StampedLock is that it isnt reentrant. Say we have this block:

synchronized void methodA() {     //      methodB();    // }synchronized void methodB() {     // }

This will work. Since synchronized is reentrant. We already hold the lock so going into methodB() from methodA() wont block. This works with ReentrantLock too assuming we use the same lock or the same synchronized object.

StampedLock returns a stamp that we use to release the lock. Because of that, it has some limits. But its still very fast and powerful. It too includes a read-and-write stamp we can use to guard a shared resource. But unlike the ReadWriteReentrantLock, it lets us upgrade the lock. Why would we need to do that?

Look at the addName() method from before What if I invoke it twice with Shai?

Yes, I could use a Set But for the point of this exercise let's say that we need a list I could write that logic with the ReadWriteReentrantLock:

public void addName(String name) {   LOCK.writeLock().lock();   try {       if(!listOfNames.contains(name)) {           listOfNames.add(name);       }   } finally {       LOCK.writeLock().unlock();   }}

This sucks. I paid for a write lock only to check contains() in some cases (assuming there are many duplicates). We can call isInList(name) before obtaining the write lock. Then we would:

Grab the read lock
Release the read lock
Grab the write lock
Release the write lock

In both cases of grabbing we might be queued and it might not be worth the extra hassle.

With a StampedLock, we can update the read lock to a write lock and do the change on the spot if necessary as such:

public void addName(String name) {   long stamp = LOCK.readLock();   try {       if(!listOfNames.contains(name)) {           long writeLock = LOCK.tryConvertToWriteLock(stamp);           if(writeLock == 0) {               throw new IllegalStateException();           }           listOfNames.add(name);       }   } finally {       LOCK.unlock(stamp);   }}

It is a powerful optimization for these cases.

Finally

I cover many similar subjects in the video series above, check it out and let me know what you think.

I often reach for the synchronized collections without giving them a second thought. That can be reasonable sometimes but for most, its probably suboptimal. By spending a bit of time with the thread-related primitives we can significantly improve our performance. This is especially true when dealing with Loom where the underlying contention is far more sensitive. Imagine scaling read operations on 1M concurrent threads In those cases, the importance of reducing lock contention is far greater.

You might think, why cant synchronized collections use ReadWriteReentrantLock or even StampedLock?

This is problematic since the surface area of the API is so big its hard to optimize it for a generic use case. Thats where control over the low-level primitives can make the difference between high throughput and blocking code.

Boldness in Refactoring

Shai Almog — Tue, 28 Mar 2023 14:13:11 GMT

The old engineering adage: dont touch it, it works. Is terrible. Dont listen to it. It might be OK at a small scale but as time goes by the bit rot spreads through your code and servers polluting everything. Large swaths of your system become no-man's-land. As youre developing a new system, you must always touch it and make sure we hire engineers who arent afraid to do so.

Yes, I get it. I said that sentence frequently in the past. I understand the motivation. Management doesnt care about the bit rot in our future. They care about the here and now. Why are you wasting time on this feature?

Its working. Dont you have enough on your plate already?

Are you the Marie Kondo of coding? Does this code not spark joy?

Its more like a bad apple in a barrel. Bad code and forbidden zones tend to grow and metastasize. A living project needs to be fully accessible by the current team. It can keep working without that, but that makes every future step painful.

When we have a flexible team with a relatively small and familiar code base, touching everything isnt challenging. Its easy in that case.

The Legacy Project

The hard part is touching code in legacy projects. As a consultant, I had to do that often. How do you enter a project with a million lines of code and start refactoring?

The nice thing is that were all alike. The engineers that built the project were trained with similar books and similar thought processes. Once you understand their logic you can understand why they did something. But a large part of the difficulty is in the tooling. Projects that were built 20 years ago used tools that are no longer available. The code might no longer compile on a modern IDE. Our immediate reflex would be to try to use an old IDE and old tooling.

That might be a mistake.

Old tools keep the stale bit rot. This is an opportunity. Revisit the project and update the tools. A few years ago I did some work for an older C++ codebase. I didnt understand the code base, but the original developers built it in an older version of Visual Studio. Getting it to work on my Mac with LLVM and VS Code helped me visualize the moving pieces more clearly. Once I had a debugger up and running, fixing the bugs and weird issues became trivial. I cant say I fully understood that codebase. But the process of porting and updating the tools exposed me to many nuances and issues.

When You Cant

The flip side of that were cases where an existing legacy system is a customer requirement. I had to implement integrations with legacy systems that were external black boxes. We didnt need to touch their code, but we needed to interface with these systems and rely on their behaviors. This is a very challenging situation.

Our solution in those cases was to create a mock of the system so we can simulate and test various scenarios. In one such situation, we wrote an app that sent requests and saved responses from such a black box to create a simple recorder. We then used the recordings as the basis for tests in our implementation. This might not be an option since sometimes, the black box is directly wired to production (directly to the stock market in one case).

My rules for dealing with such a black box are:

A single isolated module handles all the connections - that way we can build uniform workarounds for failures. We can use a physically isolated microservice which is ideal for this specific case.
Expose results using asynchronous calls - this prevents deadlocks and overloading a legacy system. We can use a queue to map causes of failure and error handling is simpler since a failure just wont invoke the result callback.
We need to Code defensively. Use circuit breakers, logging and general observability tooling. Expect failure in every corner since this will be the most contentious part of the project.

Once we wrap that legacy we need to trigger alerts on the failures. Some failures might not bubble up to the user interface and might trigger retries that succeed. This can be a serious problem. E.g. in a case of a stock market purchase command that fails a trader might press retry which will issue a new successful command. But the original command might retry implicitly in the legacy system and we can end up with two purchases.

Such mistakes can be very costly and originate from that black box. Without reviewing the legacy code fully and understanding it, we can make no guarantee. What we can do is respond promptly and accurately to failures of this type. Debuggability is important in these situations hence the importance of observability and isolation in such a black box.

Confidence Through Observability

In the past, we used to watch the server logs whenever we pushed a new release. Waiting for user complaints to pour in. Thanks to observability were the first to know about a problem in our production. Observability flipped the script.

Unfortunately, theres a wide chasm between knowing about a problem and understanding it, fixing it and noticing it. If we look at the observability console, we might notice an anomaly that highlights a problem but it might not trigger an alert even though a regression occurs. A good example of that would be a miscalculation. A change to the application logic can report wrong results and this is very unlikely to show in the observability data.

In theory, tests should have found that issue but tests are very good at verifying that things we predicted didnt happen. They dont check against unexpected bugs. E.g. We might allocate a field size for financial calculations and it worked great for our developers based in the USA. However, a customer in Japan working in Yen might have a far larger number and experience a regression because of that limit.

We can debug such issues with developer observability tools but when we deeply integrate legacy systems, we must apply the fail-fast principles deeply, that way the observability layer will know of the problem. We need to assert expectations and check for conditions not in the test, but in the production code. Here an actual error will be better than a stealthy bug.

A lot of focus has been given in languages to the non-null capabilities of languages. But the concepts pioneered in languages like Eiffel of design by contract have gone out of fashion. This is understandable, its hard and awkward to write that sort of code. Checked exceptions are often the most hated feature of the Java language. Imagine having to write all the constraints you expect for every input.

Not to mention dependencies on the environmental state. This isnt tenable and enforcing this check-in runtime would be even more expensive. However, this is something we can consciously do in entry points to our module or microservice. The fail-fast principle is essential when integrating with legacy systems because of the unpredictable nature of the result.

Summary

In the 90s I used to take a bus to my job. Every day as I walked to the office I would pass by a bank machine and every time it would reboot as I came close. This was probably part of their cycling policy, banks have a culture of rebooting machines on a schedule to avoid potential issues.

One morning I went by the machine and it didnt reboot. I did what every good programmer/hacker would do; I pulled out my card and tried to use it. It instantly rebooted and wouldnt take my card, but the fact that my instinct was to try is good. Even if it isnt the smartest thing in the world, we need to keep code accessible and fresh. Legacy code isnt a haunted house and we shouldnt be afraid.

When Should we Move to Microservices?

Shai Almog — Tue, 21 Mar 2023 15:38:34 GMT

Last month I wrote about modular Monoliths and the value of modern Monolithic architecture. One of the more interesting discussions that came out of that post (and video) is the inverse discussion: when is it right to still pick Microservices?

Like any design choice, the answer is subjective and depends on many things. But there are still general rules of thumb and global metrics we can use. Before we get into these problems, we need to understand what it means to have a Microservice architecture. Then we can gauge the benefit and price of having such an architecture.

https://www.youtube.com/watch?v=PrFZB9NqZ5E

Small Monoliths

A common misconception is that microservices are simply broken down monoliths. This isnt the case. Ive talked to quite a few people who still hold that notion, to be fair they might have a point. This is how AWS defines Microservices:

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs. These services are owned by small, self-contained teams.
Microservices architectures make applications easier to scale and faster to develop, enabling innovation and accelerating time-to-market for new features.

Smaller monoliths might fit the definition, but they dont if you read between the lines. The words independent, and easier to scale hint at the problem. The problem (and advantage) of a monolith is a single point of failure. By having one service we can usually find problems more easily. The architecture is much simpler.

If we break this service down into smaller pieces, we essentially create distributed points of failure. If one piece along the chain fails, the entire architecture breaks down. That isnt independent, and it isnt easier to scale. Microservices are NOT small monoliths and breaking down the Monolith isnt only about working with smaller projects. Its about shifting the way we work.

What Makes a Microservice?

A good Microservice needs to follow these principles for robustness and scale:

Divided by business function this is a logical division. A Microservice is a standalone product that provides a complete package. This means that the team responsible for the Microservice can make all the changes required for the business without dependencies.
Automation through CI/CD without continuous delivery the cost of updating would eliminate every benefit of Microservices.
Independent deployment is implied, since a commit on one Microservice will only trigger the CD of that specific service. We can accomplish this through Kubernetes and Infrastructure as Code (IaC) solutions.
Encapsulation it should hide the underlying implementation details. A service acts as a standalone product that publishes an API for other products.
We commonly accomplished this via REST interfaces but also messaging middleware. This is further enhanced with API Gateways.
Decentralized with no single point of failure otherwise, we would distribute failure.
Failures should be isolated without this, a single service going down could create a domino effect. Circuit breakers are probably the most important tools for isolating failures. To satisfy this dependency every microservice handles its own data. This means many databases which can be challenging at times.
Observable this is required to deal with failures on a scale. Without proper observability, we are effectively blind as the various teams can deploy automatically.

This is all good and well, but what does that mean in practical terms?

Most of what it means is that we need to make several big changes to the way we handle some big ideas. We need to move more of the complexity to the DevOps team. We need to handle cross-microservice transactional state differently. This is one of the hardest concepts to grasp when dealing with Microservices.

In an ideal world, all our operations will be simple and contained in a small microservice. The service mesh framework surrounding our microservices will handle all the global complexities and manage our individual services for us. But that isnt the real world. In reality, our Microservices might have a transactional state that carries between the services. External services might fail and for that, we need to take some unique approaches.

Reliance on the DevOps Team

If your company doesnt have good DevOps and Platform Engineering teams, Microservices arent an option. Instead of deploying one application, we might deploy hundreds because of migration. While the individual deployments are simple and automated, you will still throw a lot of work at operations.

When something doesnt work or doesnt connect. When a new service needs to integrate or the service configuration should be adopted. Operations carry a greater burden when working with Microservices. This requires great communication and collaboration. It also means the team managing a specific service needs to take some of the OPS burdens back. That isnt a simple task.

As developers we need to know many of the tools used to tie our separate services back to a single unified service:

Service Mesh lets us combine separate services and effectively acts as a load balancer between them. It also provides security, authorization, traffic control and much more.
API Gateways should be used instead of invoking the API directly. This can be awkward at times but it's often essential to avoid costs, prevent rate limiting and more.
Feature Flags & Secrets are useful in a monolith as well. But theyre impossible to manage at a Microservice scale without dedicated tools.
Circuit Breaking lets us kill a broken web service connection and recover gracefully. Without this, a single broken service can bring down the entire system.
Identity management must be separate. You cant get away with an authentication table in the database when dealing with a Microservice environment.

Ill skip orchestration, CI/CD, etc. but they too need to be adapted for every service that comes out. Some of those tools are opaque to developers but we need the help of DevOps in all the phases.

Saga Pattern

Stateless services would be ideal, carrying a state makes everything far more complex. If we stored the state in the client, we need to send it back and forth all the time. If it is on the server, we would need to either fetch it constantly, cache it or save it locally and then all interaction would be performed against the current system. That eliminates the scalability of the system.

A typical Microservice will store in its own database and work with local data. A service that needs remote information will typically cache some data to avoid round-trips to the other service. This is one of the biggest reasons Microservices can scale. In a Monolith the database should become the bottleneck of the application, which means the Monolith is efficient and limited by the speed we can store and retrieve the data. This has two major drawbacks:

Size the more data we have the larger the database and performance impacts all users at once. Imagine querying an SQL table of every purchase ever made on Amazon just to find your specific purchase.
Domain databases have different use cases. Some databases are optimized for consistency, write speed, read speed, time data, spatial data and more. A microservice that tracks user information would probably use a time series database which is optimized for time-related information, whereas a purchase service will focus on a traditional conservative ACID database.

Note that a Monolith can use more than one database. That can work perfectly well and can be very useful. But its the exception. Not the rule.

The Saga pattern works by using compensating transactions to undo the effects of a saga if it fails. When a saga fails, the compensating transaction is executed to undo the changes made by the previous transaction. This allows the system to recover from failures and maintain a consistent state. We can accomplish this with tools such as Apache Camel but this is non-trivial and requires far more involvement than a typical transaction in a modern system. That means that for every major cross-service operation you would need to do the equivalent undo operation that will restore the state back. That is non-trivial. There are several tools for saga orchestration but this is a big subject that is beyond the scope of this post, still I will explain it in broad terms.

Whats important to understand about Saga is that it avoids the classic ACID database principles and focuses on eventual consistency. That means operations would bring the database to a consistent state at some point. That is a very difficult process. Imagine debugging a problem that only occurs when the system is in an inconsistent state

The following image demonstrated the idea in broad terms. Lets say we have a money transfer process.

For the money transfer, we need to first allocate funds.
We then verify that the recipient is valid and exists.
Then we need to deduct the funds from our account.
And finally, we need to add the money to the recipient's account.

That is a successful transaction. With a regular database, this would be one transaction and we can see this in the blue column on the left. But if something goes wrong we need to run the reverse process.

If a failure occurs when allocating funds, we need to remove the allocation. We need to create a separate block of code that does the inverse operation of the allocation.
If verifying a recipient fails we need to remove that recipient. But then we need to also remove the allocation.
If deducting the funds fails we need to restore the funds, remove the recipient and remove the allocation.
Finally if adding the funds to the recipient fails we need to run all the undo operations!

Another problem in Saga is illustrated in the CAP theorem. CAP stands for Consistency, Availability and Partition Tolerance. The problem is we need to pick any two Dont get me wrong, you might have all three. But in a case of a failure you can only guarantee two.

Availability means that requests receive responses. But theres no guarantee that they contain the most recent writes.

Consistency means that every read receives the most recent write on an error.

Tolerance means that everything will keep working even if many messages get dropped along the way.

This differs greatly from our historic approach to failure with transactions.

Should We Pick Microservices?

Hopefully, you now understand how hard it is to deploy Microservices properly. We need to make some big compromises. This new way isnt necessarily better, in some regards, it is worse. But the proponents of Microservices still have a point, we can gain a lot through Microservices and should focus on those benefits too.

We mentioned the first requirement upfront: DevOps. Having a good DevOps team is a prerequisite to considering Microservices. I saw teams trying to hack their way through this without an OPS team and they ended up spending more time on operational complexity than writing code. It wasnt worth the effort.

The biggest benefit of Microservice is to the team. That is why having a stable team and scope is crucial. Splitting teams into vertical teams that work independently is a tremendous benefit. The most modular monolith in the world cant compete with that. When we have hundreds of developers following the git commits alone and tracking the code changes in scale becomes untenable. The value of Microservices is only realized in a large team. This sounds reasonable enough, but in a startup environment, things shift suddenly. A colleague of mine works for a startup that employed dozens of developers. They decided to follow a Microservice architecture and built a lot of them Then came the downsizing and maintaining dozens of services in multiple languages became a problem.

Splitting a Monolith is hard but doable. Unifying Microservices to a Monolith is probably harder, Im unaware of anyone who seriously tried to do that but would be curious to hear stories.

Not One Size

In order to move to a Microservice architecture we need a bit of a mind shift. A good example is in the databases. A good example would be a user tracking Microservice. In a Monolith, we would write the data to a table and move on with our work. But this is problematic

As data scales, this user tracking table can end up containing a great deal of data that is hard to analyze in real-time without impacting the rest of the operating system. With a Microservice we can offer several advantages:

The interface to the microservice can use messaging - which means the cost to send tracking information will be minimal.
Tracking data can use a Time Series database which would be more efficient for this use case.
We can stream the data and process it asynchronously to derive additional value from that data.

There are complexities, data will no longer be localized. So if we send tracking data asynchronously we need to send everything necessary as the tracking service wont be able to go back to the original service to get additional meta-data. But it has a locality advantage, if regulation changes about tracking storage theres a single place where this is stored.

Dynamic Control and Rollout

Did you ever push a button to a release that broke production?

I did, more than once (way too many times). Thats a terrible feeling. Microservices can still fail in production and can still fail catastrophically, but often their failure is more localized. It is also easier to roll them out to a specific subset of the system (Canary) and verify. These are all policies that can be controlled in depth by the people who actually have their fingers on the user's pulse: OPS.

Observability for Microservices is essential, expensive, but also more powerful. Since everything occurs at the network layer, it is all exposed to the observability tools. An SRE or a DevOps can understand a failure with greater detail. This comes at the expense of the developer who might need to face increased complexity and limited tooling.

Applications can become too big to fail. Even with modularity, some of the largest monoliths around have so much code it takes hours to run through a full CI/CD cycle. Then if the deployment fails reverting to the last good version might also take a while.

Segmentation

Back in the day, we used to divide teams based on layers. Client, Server, DB, etc. This made sense since each of those required a unique set of skills. Today, vertical teams make more sense, but we still have specialties.

Typically, a mobile developer wouldnt work on the backend. But lets say we have a mobile team that wants to work with GraphQL instead of REST. With a Monolith we would either tell them to live with it or we would have to do the work. With Microservices we can create a simple service for them with very little code. A simple facade to the core services. We wont need to worry about a mobile team writing server code since this would be relatively isolated. We can do the same for every client layer, this makes it easier to integrate a team vertically.

Too Big

It is hard to put the finger on a size that makes a monolith impractical but heres what you should ask yourself:

How many teams do we have or want?

If you have a couple of teams, then a monolith is probably great. If you have a dozen teams, then you might face a problem there.

Measure pull request, and issue resolution times.

As a project grows your pull requests will spend more time waiting to merge and issues will take longer to resolve. This is inevitable as complexity tends to grow in the project. Notice that a new project will have larger features and that might sway the results once you account for that in the project stats the decrease in productivity should be measurable.

Notice that this is one metric. In many cases, it can indicate other things such as the need to optimize the test pipeline, the review process, modularity, etc.

Do we have experts who know the code?

At some point, a huge project becomes so big that the experts start losing track of the details. This becomes a problem when bugs become untenable and theres no authoritative figure that can make a decision without consultation.

Are you comfortable spending money?

Microservices will cost more. Theres no way around that. There are special cases where we can tune scale, but ultimately observability and management costs would remove any potential cost savings. Since personnel costs usually exceed the costs of cloud hosting the total might still play in your favor as those costs might decrease if the scale is big enough.

Trade-Offs

The trade-offs of monolith vs. microservice are illustrated nicely in the following radar chart. Notice that this chart was designed with a large project in mind. The smaller the project, the better the picture is for the Monolith.

Notice that Microservices deliver a benefit in larger projects in fault tolerance and team independence. But they pay a price in cost. They can reduce R&D spend but they mostly shift it to DevOps so that isnt a major benefit.

Final Word

The complexity of Microservices is immense and sometimes ignored by the implementing teams. Developers use Microservices as a cudgel to throw away parts of the system they dont want to maintain, instead of building a sustainable, scalable architecture worthy of replacing a monolith.

I firmly believe that projects should start off with a monolith. Microservices are an optimization for scaling a team and optimizing prematurely is the root of all evil. The question is, when is the right time to do such an optimization?

There are some metrics we can use to make that decision easier. Ultimately, the change isnt just splitting a monolith. It means rethinking transactions and core concepts. By starting with a monolith we have a blueprint we can use to align our new implementation as it strengthens.

DevOps For Developers: Continuous Integration, GitHub Actions & Sonar Cloud

Shai Almog — Tue, 14 Mar 2023 14:12:41 GMT

I first ran into the concept of Continuous Integration (CI) when the Mozilla project launched. It included a rudimentary build server as part of the process and this was revolutionary at the time. I was maintaining a C++ project that took 2 hours to build and link. We rarely went through a clean build which created compounding problems as bad code was committed into the project.

A lot has changed since those old days. CI products are all over the place and as Java developers we enjoy a richness of capabilities like never before. But Im getting ahead of myself Lets start with the basics.

https://youtu.be/NaL5il6cLGQ

Continuous Integration is a software development practice in which code changes are automatically built and tested in a frequent and consistent manner. The goal of CI is to catch and resolve integration issues as soon as possible, reducing the risk of bugs and other problems slipping into production.

CI often goes hand in hand with Continuous Delivery (CD) which aims to automate the entire software delivery process, from code integration to deployment in production. The goal of CD is to reduce the time and effort required to deploy new releases and hotfixes, enabling teams to deliver value to customers faster and more frequently. With CD, every code change that passes the CI tests is considered ready for deployment, allowing teams to deploy new releases at any time with confidence. I wont discuss continuous delivery in this post but I will go back to it as theres a lot to discuss. Im a big fan of the concept but there are some things we need to monitor.

Continuous Integration Tools

There are many powerful continuous integration tools. Here are some commonly used tools:

Jenkins: Jenkins is one of the most popular CI tools, offering a wide range of plugins and integrations to support various programming languages and build tools. It is open-source and offers a user-friendly interface for setting up and managing build pipelines. Its written in Java and was often my go to tool. However, its a pain to manage and set-up. There are some Jenkins as a service solutions that also clean up its user experience which is somewhat lacking.
Travis CI: Travis CI is a cloud-based CI tool that integrates well with GitHub, making it an excellent choice for GitHub-based projects. Since it predated GitHub Actions it became the default for many open source projects on GitHub.
CircleCI: CircleCI is a cloud-based CI tool that supports a wide range of programming languages and build tools. It offers a user-friendly interface but its big selling point is the speed of the builds and delivery.
GitLab CI/CD: GitLab is a popular source code management tool that includes built-in CI/CD capabilities. The GitLab solution is flexible yet simple, it has gained some industry traction even outside of the GitLab sphere.
Bitbucket Pipelines: Bitbucket Pipelines is a cloud-based CI tool from Atlassian that integrates seamlessly with Bitbucket, their source code management tool. Since it's an Atlassian product it provides seamless JIRA integration and very fluid, enterprise oriented functionality.

Notice I didnt mention GitHub Actions which we will get to shortly. There are several factors to consider when comparing CI tools:

Ease of Use: Some CI tools have a simple setup process and user-friendly interface, making it easier for developers to get started and manage their build pipelines.
Integration with Source Code Management (SCM) Tools such as GitHub, GitLab, and Bitbucket. This makes it easier for teams to automate their build, test, and deployment processes.
Support for Different Programming Languages and Build Tools: Different CI tools support different programming languages and build tools, so it's important to choose a tool that is compatible with your development stack.
Scalability: Some CI tools are better suited to larger organizations with complex build pipelines, while others are better suited to smaller teams with simpler needs.
Cost: CI tools range in cost from free and open-source to commercial tools that can be expensive, so it's important to choose a tool that fits your budget.
Features: Different CI tools offer distinct features, such as real-time build and test results, support for parallel builds, and built-in deployment capabilities.

In general, Jenkins is known for its versatility and extensive plugin library, making it a popular choice for teams with complex build pipelines. Travis CI and CircleCI are known for their ease of use and integration with popular SCM tools, making them a good choice for small to medium-sized teams. GitLab CI/CD is a popular choice for teams using GitLab for their source code management, as it offers integrated CI/CD capabilities. Bitbucket Pipelines is a good choice for teams using Bitbucket for their source code management, as it integrates seamlessly with the platform.

Cloud vs. On Premise

The hosting of agents is an important factor to consider when choosing a CI solution. There are two main options for agent hosting: cloud-based and on-premise.

Cloud-based: Cloud-based CI solutions, such as Travis CI, CircleCI, GitHub Actions, and Bitbucket Pipelines, host the agents on their own servers in the cloud. This means that you don't have to worry about managing the underlying infrastructure, and you can take advantage of the scalability and reliability of the cloud.
On-premise: On-premise CI solutions, such as Jenkins, allow you to host the agents on your own servers. This gives you more control over the underlying infrastructure, but also requires more effort to manage and maintain the servers.

When choosing a CI solution, it's important to consider your team's specific needs and requirements. For example, if you have a large and complex build pipeline, an on-premise solution such as Jenkins may be a better choice, as it gives you more control over the underlying infrastructure. On the other hand, if you have a small team with simple needs, a cloud-based solution such as Travis CI may be a better choice, as it is easy to set up and manage.

Agent Statefulness

Statefulness determines whether the agents retain their data and configurations between builds.

Stateful Agents: Some CI solutions, such as Jenkins, allow for stateful agents, which means that the agents retain their data and configurations between builds. This is useful for situations where you need to persist data between builds, such as when you're using a database or running long-running tests.
Stateless Agents: Other CI solutions, such as Travis CI, use stateless agents, which means that the agents are recreated from scratch for each build. This provides a clean slate for each build, but it also means that you need to manage any persisted data and configurations externally, such as in a database or cloud storage.

Theres a lively debate among CI proponents regarding the best approach. Stateless agents provide a clean and easy to reproduce environment. I choose them for most cases and think they are the better approach.

Stateless agents can be more expensive too as they are slower to set up. Since we pay for cloud resources that cost can add up. But the main reason some developers prefer the stateful agents is the ability to investigate. With a stateless agent when a CI process fails you are usually left with no means of investigation other than the logs. With a stateful agent, we can log into the machine and try to run the process manually on the given machine. We might reproduce an issue that failed and gain insight thanks to that. A company I worked with chose Azure over GitHub Actions because Azure allowed for stateful agents. This was important to them when debugging a failed CI process.

I disagree with that but thats a personal opinion. I feel I spent more time troubleshooting bad agent cleanup than I benefited from investigating a bug. But thats a personal experience and some smart friends of mine disagree.

Repeatable Builds

Repeatable builds refer to the ability to produce the same exact software artifacts every time a build is performed, regardless of the environment or the time the build is performed. From a DevOps perspective, having repeatable builds is essential to ensuring that software deployments are consistent and reliable. Intermittent failures are the bane of DevOps everywhere and they are painful to track. Unfortunately, theres no easy fix. As much as wed like it, some flakiness finds its way into projects with reasonable complexity. It is our job to minimize this as much as possible. There are two blockers to repeatable builds:

Dependencies - if we dont use specific versions for dependencies even a small change can break our build.
Flaky Tests - tests that fail occasionally for no obvious reasons are the absolute worst.

When defining dependencies we need to focus on specific versions. There are many versioning schemes but over the past decade, the standard three-number semantic versioning took over the industry. This scheme is immensely important for CI as its usage can significantly impact the repeatability of a build e.g. with maven we can do:

<dependency>     <groupId>groupgroupId>     <artifactId>artifactartifactId>     <version>2.3.1version>dependency>

This is very specific and great for repeatability. However, this might become out of date quickly. We can replace the version number with LATEST or RELEASE which will automatically get the current version. This is bad as the builds will no longer be repeatable. However, the hard-coded three-number approach is also problematic. Its often the case that a patch version represents a security fix for a bug. In that case, we would want to update all the way to the latest minor update but not newer versions. E.g. For that previous case I would want to use version 2.3.2 implicitly and not 2.4.1. This trades off some repeatability for minor security updates and bugs. But a better way would be to use the Maven Versions Plugin and periodically invoke the mvn versions:use-latest-releases command. This updates the versions to the latest to keep our project up to date.

This is the straightforward part of repeatable builds. The difficulty is in the flaky tests. This is such a common pain that some projects define a reasonable amount of failed tests and some projects re-run the build multiple times before acknowledging failure.

A major cause of test flakiness is state leakage. Tests might fail because of subtle side effects left over from a previous test. Ideally, a test should clean up after itself so each test will run in isolation. In a perfect world, we would run every test in a completely isolated fresh environment, but this isnt practical. It would mean tests would take too long to run and we would need to wait a great deal of time for the CI process. We can write tests with various isolation levels, sometimes we need complete isolation and might need to spin up a container for a test. But most times we dont and the difference in speed is significant.

Cleaning up after tests is very challenging. Sometimes state leaks from external tools such as the database can cause a flaky test failure. To ensure repeatability of failure, it is a common practice to sort the test cases consistently, this ensures future runs of the build will execute in the same order.

This is a hotly debated topic. Some engineers believe that this encourages buggy tests and hides problems we can only discover with a random order of tests. From my experience, this did indeed find bugs in the tests, but not in the code. My goal isnt to build perfect tests and so I prefer running the tests in a consistent order such as alphabetic ordering.

It is important to keep statistics of test failures and never simply press retry. By tracking the problematic tests and the order of execution for a failure, we can often find the source of the problem. Most times the root cause of the failure happens because of faulty cleanup in a prior test, which is why the order matters and its consistency is also important.

Developer Experience and CI Performance

Were here to develop a software product, not a CI tool. The CI tool is here to make the process better. Unfortunately, many times the experience with the CI tool is so frustrating that we end up spending more time on logistics than actually writing code. Often I spent days trying to pass a CI check so I could merge my changes. Every time I get close, another developer would merge their change first and would break my build.

This contributes to a less than stellar developer experience, especially as a team scales and we spend more time in the CI queue than merging our changes. There are many things we can do to alleviate these problems:

Reduce duplication in tests - over testing is a common symptom that we can detect with coverage tools.
Flaky test elimination - I know deleting or disabling tests is problematic. Dont do that lightly. But if you spend more time debugging the test than debugging your code, its value is debatable.
Allocate additional or faster machines for the CI process.
Parallelize the CI process. We can parallelize some build types and some tests.
Split the project into smaller projects - notice that this doesnt necessarily mean microservices.

Ultimately, this connects directly to the productivity of the developers. But we dont have profilers for these sorts of optimizations. We have to measure each time, this can be painstaking.

GitHub Actions

GitHub Actions is a continuous integration/continuous delivery (CI/CD) platform built into GitHub. It is stateless although it allows the self-hosting of agents to some degree. Im focusing on it since its free for open-source projects and has a decent free quota for closed-source projects.

This product is a relatively new contender in the field, it is not as flexible as most other CI tools mentioned before. However, it is very convenient for developers thanks to its deep integration with GitHub and stateless agents.

To test GitHub Actions, we need a new project which in this case I generated using JHipster with the configuration seen here:

I created a separate project that demonstrates the use of GitHub Actions here. Notice you can follow this with any project, although we include maven instructions in this case, the concept is very simple. Once the project is created, we can open the project page on GitHub and move to the actions tab. We will see something like this:

In the bottom right corner, we can see the Java with Maven project type. Once we pick this type, we move to the creation of a maven.yml file as shown here:

Unfortunately, the default maven.yml suggested by GitHub includes a problem. This is the code we see in this image:

name: Java CI with Mavenon:  push:    branches: [ "master" ]  pull_request:    branches: [ "master" ]jobs:  build:    runs-on: ubuntu-latest    steps:    - uses: actions/checkout@v3    - name: Set up JDK 11      uses: actions/setup-java@v3      with:        java-version: '11'        distribution: 'temurin'        cache: maven    - name: Build with Maven      run: mvn -B package --file pom.xml    # Optional: Uploads the full dependency graph to GitHub to improve the quality of Dependabot alerts this repository can receive    - name: Update dependency graph      uses: advanced-security/maven-dependency-submission-action@571e99aab1055c2e71a1e2309b9691de18d6b7d6

The last three lines update the dependency graph. But this feature fails or at least it failed for me. Removing them solved the problem. The rest of the code is standard YAML configuration.

The pull_request and push lines near the top of the code, declare that builds will run on both a pull request and a push to the master. This means we can run our tests on a pull request before committing. If the test fails, we will not commit. We can disallow committing with failed tests in the project settings. Once we commit the YAML file, we can create a pull request and the system will run the build process for us. This includes running the tests, since the package target in maven runs tests by default. The code that invokes the tests is in the line starting with run near the end. This is effectively a standard unix command line. Sometimes it makes sense to create a shell script and just run it from the CI process. Its sometimes easier to write a good shell script than deal with all the YAML files and configuration settings of various CI stacks. Its also more portable if we choose to switch the CI tool in the future. Here we dont need it though since maven is enough for our current needs.

We can see the successful pull request here:

To test this out, we can add a bug to the code by changing the /api endpoint to /myapi. This produces the failure shown below. It also triggers an error email sent to the author of the commit.

When such a failure occurs, we can click the Details link on the right side. This takes us directly to the error message you see here:

Unfortunately, this is typically a useless message that does not provide help in the issue resolution. However, scrolling up will show the actual failure which is usually conveniently highlighted for us as seen here:

Note that there are often multiple failures so it would be prudent to scroll up further. In this error, we can see the failure was an assertion in line 394 of AccountResourceIT which you can see here, note that the line numbers do not match. In this case, line 394 is the last line of the method:

@Test@Transactionalvoid testActivateAccount() throws Exception {    final String activationKey = "some activation key";    User user = new User();    user.setLogin("activate-account");    user.setEmail("activate-account@example.com");    user.setPassword(RandomStringUtils.randomAlphanumeric(60));    user.setActivated(false);    user.setActivationKey(activationKey);    userRepository.saveAndFlush(user);    restAccountMockMvc.perform(get("/api/activate?key={activationKey}", activationKey)).andExpect(status().isOk());    user = userRepository.findOneByLogin(user.getLogin()).orElse(null);    assertThat(user.isActivated()).isTrue();}

This means the assert call failed. isActivated() returned false and failed the test. This should help a developer narrow down the issue and understand the root cause.

Going Beyond

As we mentioned before CI is about developer productivity. We can go much further than merely compiling and testing. We can enforce coding standards, lint the code, detect security vulnerabilities and much more. In this example lets integrate Sonar Cloud which is a powerful code analysis tool (linter). It finds potential bugs in your project and helps you improve code quality.

SonarCloud is a cloud-based version of SonarQube that allows developers to continuously inspect and analyze their code to find and fix issues related to code quality, security, and maintainability. It supports various programming languages such as Java, C#, JavaScript, Python, and more. SonarCloud integrates with popular development tools such as GitHub, GitLab, Bitbucket, Azure DevOps, and more. Developers can use SonarCloud to get real-time feedback on the quality of their code and improve the overall code quality.

On the other hand, SonarQube is an open-source platform that provides static code analysis tools for software developers. It provides a dashboard that shows a summary of the code quality and helps developers to identify and fix issues related to code quality, security, and maintainability.

Both SonarCloud and SonarQube provide similar functionalities, but SonarCloud is a cloud-based service and requires a subscription, while SonarQube is an open-source platform that can be installed on-premise or on a cloud server. For simplicity's sake, we will use SonarCloud but SonarQube should work just fine. To get started we go to sonarcloud.io, and sign up. Ideally with our GitHub account. We are then presented with an option to add a repository for monitoring by Sonar Cloud as shown here:

When we select the Analyze new page option, we need to authorize access to our GitHub repository. The next step is selecting the projects we wish to add to Sonar Cloud as shown here:

Once we select and proceed to the setup process, we need to pick the analysis method. Since we use GitHub Actions, we need to pick that option in the following stage as seen here:

Once this is set, we enter the final stage within the Sonar Cloud wizard as seen in the following image. We receive a token that we can copy (entry 2 that is blurred in the image), we will use that shortly. Notice there are also default instructions to use with maven that appear once you click the button labeled Maven.

Going back to the project in GitHub we can move to the project settings tab (not to be confused with the account settings in the top menu). Here we select Secrets and variables as shown here:

In this section we can add a new repository secret, specifically the SONAR_TOKEN key and value we copied from the SonarCloud as you can see here:

GitHub Repository Secrets are a feature that allows developers to securely store sensitive information associated with a GitHub repository, such as API keys, tokens, and passwords, which are required to authenticate and authorize access to various third-party services or platforms used by the repository.

The concept behind GitHub Repository Secrets is to provide a secure and convenient way to manage and share confidential information, without having to expose the information publicly in code or configuration files. By using secrets, developers can keep sensitive information separate from the codebase and protect it from being exposed or compromised in case of a security breach or unauthorized access.

GitHub Repository Secrets are stored securely and can only be accessed by authorized users who have been granted access to the repository. Secrets can be used in workflows, actions, and other scripts associated with the repository. They can be passed as environment variables to the code, so that it can access and use the secrets in a secure, reliable way.

Overall, GitHub Repository Secrets provide a simple and effective way for developers to manage and protect confidential information associated with a repository, helping to ensure the security and integrity of the project and the data it processes.

We now need to integrate this into the project. First, we need to add these two lines to the pom.xml file. Notice that you need to update the organization name to match your own. These should go into the section in the XML:

<sonar.organization>shai-almogsonar.organization> <sonar.host.url>https://sonarcloud.iosonar.host.url>

Notice that the JHipster project we created already has SonarQube support which should be removed from the pom file before this code will work.

After this we can replace the Build with Maven portion of the maven.yml file with the following version:

 - name: Build with Maven    env:       GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}  # Needed to get PR information, if any       SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}    run: mvn -B verify org.sonarsource.scanner.maven:sonar-maven-plugin:sonar -Dsonar.projectKey=shai-almog_HelloJHipster package

Once we do that, SonarCloud will provide reports for every pull request merged into the system as shown here:

We can see a report that includes the list of bugs, vulnerabilities, smells, and security issues. Clicking every one of those issues leads us to something like this:

Notice that we have tabs that explain exactly why the issue is a problem, how to fix it and more. This is a remarkably powerful tool that serves as one of the most valuable code reviewers in the team.

Two additional interesting elements we saw before are the coverage and duplication reports. SonarCloud expects that tests will have 80% code coverage (trigger 80% of the code in a pull request), this is high and can be configured in the settings. It also points out duplicate code which might indicate a violation of the Dont Repeat Yourself (DRY) principle.

Finally

CI is a huge subject with many opportunities to improve the flow of your project. We can automate the detection of bugs. Streamline artifact generation, automated delivery and so much more. But in my humble opinion, the core principle behind CI is developer experience. Its here to make our lives easier.

When it is done badly, the CI process can turn this amazing tool into a nightmare. Passing the tests becomes an exercise in futility. We retry again and again until we can finally merge. We wait for hours to merge because of slow, crowded queues. This tool that was supposed to help becomes our nemesis. This shouldnt be the case. CI should make our lives easier, not the other way around.

Open Source Maintenance is Community Organizing

Shai Almog — Tue, 28 Feb 2023 19:23:04 GMT

About six months ago, I wrote a piece about the state of open-source commercialization that garnered some notice. Almost immediately after that Hacker News featured this rather problematic piece of prose on the front page. The title is Fix it, Fork it, F*ck off (censorship is my own). I think this is a great sample of where OSS is failing both commercially and as a movement. A lot of the attention in OSS goes into the code of conduct policies (which are important), but not enough goes into what it takes to build a successful open-source project. It isnt just the coding. Id argue that coding is often less important than this.

The Clash of Entitlement

I understand where the author is coming from. Hes writing from a point of frustration. We put a lot of work into our OSS project and an entitled end user can be very difficult. I agree some people go over the line but in my experience, this is less than 0.1% of complaints. Its very rare to have a truly toxic user.

Thats the fundamental problem I have with that post, it throws the task to the users. Fix it yourself. Thats the maintainer being entitled. Dont get me wrong, its OK to do a code dump. But then declare it as such and let it go. Being a maintainer is something completely different.

I will support and help users who want to fix it themselves, but I get the people who dont want to do so. I still think its OK for them to complain as if they paid for the software. Yes, its frustrating to hear complaints. But I didnt start an open-source project just to be told how great I am. I want users to feel comfortable enough to vent, this helps me understand the pain points. It helps the authenticity of the community, and its ability to criticize me helps everyone. Especially me. It draws the attention of other community members who might have a similar issue and might be more motivated to fix it. It makes me more aware of the problems and points that require improvement.

A Community Organizer

An OSS maintainer is a community organizer of sorts. A bit of a politician mixed with a project manager and benevolent dictator. All rolled into one. Good managers and leaders are usually pretty nice empathetic people Most of the time. People sometimes mention a manager or OSS leader and mention what a terrible personality he has. This is nonsense. They often derived these stories from a few incidents out of thousands. A successful project starts with a patient leader. Yes, that leader might have a bad day occasionally. Or they might use language that isnt politically correct. But the exception doesnt make the rule and such language doesnt always translate as hostile.

The myth of an a*hole genius leader is often just that. A myth. When people like that succeed its usually despite that, not because of that. Yet some people choose to push that narrative hiding behind false meritocracy instead of taking accountability. Dont take this the wrong way, I dont think we need to walk on eggshells around everyone. I dont think that helps and I dont think it solves anything either.

Its OK to forgive a maintainer who had a bad day or did something wrong. But it probably shouldnt be celebrated. Why is that? Why should I maintain an open-source project for which I might not get paid and also support users?

Why do we write open-source code in the first place? We love to code. But I think we love it even more when people use our software. I enjoy cooking too, but if people dont eat my food Its worse than burning it on the stove. One of my failures as a maintainer is that I overly lean on my technocratic background. I provide too much support. This often stifled the community. Why answer a question in the forum if Shai will just answer right away?

The solution was to not overwhelm. I answer once a day and not faster. As a result, the community can develop its strengths. This was a version of micromanagement that applies to maintainers. Its common for coding tasks where we tend to take up too much of the work and delegate too little.

Why This is Important

Last time I wrote about the corporate takeover of open source. This has good elements to it but also poses many dangers. We need the community to drive this. We need maintainers who understand the work and are not bound to corporate overlords.

The big problem in any project, open source or other, is that people dont care. Garnering community traction is something we all need. Thats how you build up a project. You might be lucky if youre in the right place at the right time. But if you support your community you might succeed even if you missed the boat.

This all brings me to why I suddenly remembered I have this post lying around in my backlog for six months garnering dust.

Should I Start a new OSS Project?

Ive been contemplating a new type of AoT JVM that takes a radically different approach from what we have right now. This is something I want to build but should I build it as an OSS project or start in private?

Id like to raise funding for that and traction from the community will help. But if I don't have an OSS project yet, the uncertainty of upcoming traction might be a better selling point to investors than the reality of currently limited traction. By its nature, a JVM project will take time to bear fruits and provide value to customers, it wont bring a community or traction early on. So this might not be an ideal project to build in the open. Im also not a fan of coding with people staring over my shoulder. I feel self-conscious since there are expectations about my coding skills.

I 100% want to build in the open but I can only do that when I have something running that shows the value of the work Im doing. This will take some time. This is very subjective. Some projects make more sense as open-source from the start.

Do We Need This New Project?

I was talking to a friend a while back and discussing my idea when he challenged me on the need for yet another JVM. Did you conduct a market study? Whats the viability of building something like that?

I dont want to reach the point where I invest so much of my time and soul into a product that ends up in a small niche. Theres the adage: If I had asked people what they wanted, they would have said faster horses.. Its attributed to Henry Ford who as far as I know never said that. I dont think it would have been true either. We need to be product-oriented even when building an open-source project.

We need to think about customer value, willingness to adapt and market penetration even as we build something free and open-source. Before we launched LWUIT at Sun Microsystems, we spent a tremendous amount of time with potential target users. We conducted interviews, meetings and helped in the integration. The feedback we got was tremendous.

Since Codename One came from the same roots, it benefited from those roots. Still, we worked with accelerator companies to see how they worked closely. We also had an extensive early beta. Its something pretty important for the initial traction of the project. When LWUIT was announced we could delight the initial users with a product that was more adapted to customer needs. This helped a lot with the initial users and first customers we got for Codename One.

Finally

When we build an open-source project, we often end up doing everything. Product, DevOps, User/Developer Experience, etc. This is a unique experience that most companies wont give you. I know that my experience in those other disciplines has made me a better manager and entrepreneur in the long run.

Its a job. It doesnt pay (at least at first) and the hours are always overtime. But its still a job. Because otherwise, youre asking other developers to trust your hobby. Thats unfair to them. Even when its free, we need to take the project seriously and respect our users.

Maintainers get a lot in return. We improve our skills. People are more likely to use and advocate our work and its more likely to end up as a profitable endeavor. Ending up with an open-source job from our hobby project is a pretty great outcome for many side projects. It starts with being reasonably patient with people.

DevOps for Developers - Introduction and Version Control

Shai Almog — Tue, 21 Feb 2023 16:47:18 GMT

I start some of my talks with a joke: back in my day we didnt have monitoring or observability. Wed go to the server and give it a kick. Hear the HD spin? Its working!

We didnt have DevOps. If we were lucky we had some admins and a technician to solve hardware issues. Thats it. In a small company, we would do all of that ourselves. Today this is no longer practical. The complexity of deployment and scale is so big, its hard to imagine a growing company without an engineer dedicated to the operations.

In this series, I hope to introduce you to some of the core principles and tools used by DevOps. This is an important skill that we need to master in a startup where we might not have a DevOps role at all. Or in a big corporation, where we need to communicate with the DevOps team and explain our needs or requirements.

https://www.youtube.com/watch?v=2HatFLh4xoA

Whats DevOps?

DevOps is a software development methodology that aims to bridge the gap between development and operations teams. It emphasizes collaboration and communication between these two teams to ensure the seamless delivery of high-quality software products.

The core principles behind it are:

Continuous Integration and Continuous Delivery (CI/CD) - CI/CD is one of the key principles of DevOps. It involves automated processes for building, testing, and deploying software. With CI/CD, developers can identify and fix bugs early in the development cycle, leading to faster and more reliable delivery of software.
As a developer, CI/CD can help you by giving you a faster feedback loop, enabling you to make changes to the code and see the results in real-time. This helps you to quickly identify and fix any issues, which saves time and ensures that your code is always in a releasable state.
Notice that CD stands for both Continuous Delivery and Deployment. This is a terribly frustrating acronym. The difference between the two is simple though. Deployment relies on the delivery, we cant deploy an application unless it was built and delivered. The deployment aspect means that merging our commits into the main branch will result in a change to production at some point, without any user involvement.

Automation - Automation involves automating repetitive tasks such as building, testing, and deploying software. This helps to reduce the time and effort required to perform these tasks, freeing up developers to focus on more important tasks.
As a developer, automation can help you by freeing up your time and allowing you to focus on writing code, rather than spending time on manual tasks. Additionally, automation helps reduce the risk of human error, ensuring that your code is always deployed correctly.
Collaboration and Communication - DevOps emphasizes collaboration and communication between development and operations teams. This helps ensure that everyone is on the same page and working towards a common goal. It also helps reduce the time and effort required to resolve any issues that may arise.

Platform Engineering

Recently theres been a rise in the field of platform engineering. This is somewhat confusing as the overlap between the role of DevOps and a Platform Engineer isnt necessarily clear. However, they are two related but distinct fields within software development. While both are concerned with improving software delivery and operation processes, they have different focuses and approaches.

Platform Engineering is a discipline that focuses on building and maintaining the infrastructure and tools required to support the software development process. This includes the underlying hardware, software, and network infrastructure, as well as the tools and platforms used by development and operations teams.

In other words, DevOps is concerned with improving the way software is developed and delivered, while Platform Engineering is concerned with building and maintaining the platforms and tools that support that process.

While both DevOps and Platform Engineering complement each other, they serve different purposes. DevOps helps teams to work together more effectively and deliver software faster, while Platform Engineering provides the infrastructure and tools needed to support that process.

Where do we Start?

When learning DevOps, it is important to have a solid understanding of the tools and techniques commonly used in the field. Here are some of the most important tools and techniques to learn:

Version control systems: Understanding how to use version control systems, such as Git, is a key component of DevOps. Version control systems allow teams to track changes to their code, collaborate on projects, and roll back changes if necessary. I assume you know git and will skip it and go directly to the next stage.
Continuous Integration (CI) and Continuous Deployment (CD) tools: CI/CD tools are at the heart of DevOps and are used to automate the build, test, and deployment of code. Popular CI/CD tools include Jenkins, Travis CI, CircleCI, and GitLab CI/CD. I will focus on GitHub Actions. It isnt a popular tool in the DevOps space since its relatively limited, but for our needs as developers, its pretty great.
Infrastructure as Code (IaC) tools: IaC tools let us manage our infrastructure as if it was source code. This makes it easier to automate the provisioning, configuration, and deployment of infrastructure. Popular IaC tools include Terraform, CloudFormation, and Ansible. I also like Pulumi which lets you use regular programming languages to describe the infrastructure, including Java.
Containerization: Containerization technologies, such as Docker, allow you to package and deploy applications in a consistent and portable way, making it easier to move applications between development, testing, and production environments.
Orchestration: Orchestration refers to the automated coordination and management of multiple tasks and processes, often across multiple systems and technologies. In DevOps, orchestration is used to automate the deployment and management of complex, multi-tier applications and infrastructure.
Popular orchestration tools include Kubernetes, Docker Swarm, and Apache Mesos. These tools allow teams to manage and deploy containers, automate the scaling of applications, and manage the overall health and availability of their systems.
Monitoring and logging tools: Monitoring and logging tools allow you to keep track of the performance and behavior of your systems and applications. Popular monitoring tools include Nagios, Zabbix, and New Relic. Prometheus and Grafana are probably the most popular in this field in recent years. Popular logging tools include ELK Stack (Elasticsearch, Logstash, and Kibana), Graylog, and Fluentd.
Configuration management tools: Configuration management tools, such as Puppet, Chef, and Ansible, allow you to automate the configuration and management of your servers and applications.
Cloud computing platforms: Cloud computing platforms, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). They provide the infrastructure and services necessary for DevOps practices.

In addition to these tools, it is also important to understand DevOps practices and methodologies, such as agile.

Remember, the specific tools and techniques you need to learn will depend on the needs of your organization and the projects you are working on. However, by having a solid understanding of the most commonly used tools and techniques in DevOps, you will be well-prepared to tackle a wide range of projects and challenges.

Most features and capabilities are transferable. If you learn CI principles in one tool, moving to another wont be seamless. But it will be relatively easy.

Version Control

We all use git, or at least I hope so. Git's dominance in version control has made it much easier to build solutions that integrate deeply. As developers, Git is primarily viewed as a version control system that helps us manage and track changes to our codebase. We use Git to collaborate with other developers, create and manage branches, merge code changes, track issues, and bugs. Git is an essential tool for developers as it allows them to work efficiently and effectively on code projects.

DevOps have a different vantage point. Git is viewed as a critical component of the CI/CD pipeline. In this context, Git is used as a repository to store code and other artifacts such as configuration files, scripts, and build files. DevOps professionals use Git to manage the release pipeline, automate builds, and manage deployment configurations. Git is an important part of the DevOps toolchain as it allows for the seamless integration of code changes into the CI/CD pipeline, ensuring the timely delivery of software to production.

Branch Protection

By default, GitHub projects allow anyone to commit changes to the main (master) branch. This is problematic in most projects. We usually want to prevent commits to that branch so we can control the quality of the mainline. This is especially true when working with CI as a break in the master can stop the work of other developers.

We can minimize this risk by forcing everyone to work on branches and submit pull requests to the master. This can be taken further with code review rules that require one or more reviewers. GitHub has highly configurable rules that can be enabled in the project settings. As you can see here.

Enabling branch protection on the master branch in GitHub provides several benefits, including:

Preventing accidental changes to the master branch: By enabling branch protection on the master branch, you can prevent contributors from accidentally pushing changes to the branch. This helps to ensure that the master branch always contains stable and tested code.
Enforcing code reviews: You can require that all changes to the master branch be reviewed by one or more people before they are merged. This helps to ensure that changes to the master branch are high quality and meet the standards of your team.
Preventing force pushes: Enabling branch protection on the master branch can prevent contributors from force-pushing changes to the branch, which can overwrite changes made by others. This helps to ensure that changes to the master branch are made intentionally and with careful consideration.
Enforcing status checks: You can require that certain criteria, such as passing tests or successful builds, are met before changes to the master branch are merged. This helps to ensure that changes to the master branch are of high quality and do not introduce new bugs or issues.

Overall, enabling branch protection on the master branch in GitHub can help to ensure that changes to your codebase are carefully reviewed, tested, and of high quality. This can help to improve the stability and reliability of your software.

Working with Pull Requests

As developers, we find that working with branches and pull requests allow us to collect multiple separate commits and changes to a single feature. This is one of the first areas of overlap between our role as developers and the role of DevOps. Pull requests let us collaborate and review each other's code before merging it into the main branch. This helps identify issues and ensures that the codebase remains stable and consistent. With pull requests, the team can discuss and review code changes, suggest improvements, and catch bugs before they reach production. This is critical for maintaining code quality, reducing technical debt, and ensuring that the codebase is maintainable. The role of DevOps is to tune the quality vs. churn.

How many reviewers should we have for a pull request? Is a specific reviewer required? Do we require test coverage levels?

A DevOps needs to tune the ratio between developer productivity, stability and churn. By increasing the reviewer count or forcing review by a specific engineer we create bottlenecks and slow development. The flip side is a potential increase in quality. We decide on these metrics based on rules of thumb and best practices. But a good DevOps engineer will follow through everything with metrics that help an informed decision down the road. E.g. If we force two reviewers we can then look at the time it takes to merge a pull request which will probably increase. But we can compare it to the number of regressions and issues after the policy took place. That way we have a clear and factual indication of the costs and benefits of a policy.

The second benefit of pull requests is their crucial role in the CI/CD process. When a developer creates a pull request, it triggers an automated build and testing process, which verifies that the code changes are compatible with the rest of the codebase and that all tests pass. This helps identify any issues early in the development process and prevents bugs from reaching production. Once the build and test processes are successful, the pull request can be merged into the main branch, triggering the release pipeline to deploy the changes to production. I will cover CI more in-depth in the next installment of this series.

Finally

I feel that the discussion of DevOps is often very vague. There are no hard lines between the role of a DevOps engineer and the role of a developer since they are developers and are a part of the R&D team. DevOps navigate over that fine line between administration and development, they need to satisfy the sometimes conflicting requirements on both ends. I think understanding their jobs and tools can help make us better developers, better teammates and better managers.

Next time well discuss building a CI pipeline using GitHub actions. Working on your artifacts. Managing secrets and keeping everything in check. Notice we wont discuss the continuous delivery in great detail at this stage because that would drag us into a discussion of deployment. I fully intend to circle back to it and discuss CD as well once we cover deployment technologies such as IaC, Kubernetes, Docker etc.

Java Serialization Filtering - Prevent 0-day Security Vulnerabilities

Shai Almog — Tue, 14 Feb 2023 18:57:23 GMT

Ive been a Java developer long enough to remember the excitement when Sun introduced the concept of serialization in the JVM. In the world of C, we could just write a struct into a file but this was always problematic. It wasnt portable and had many issues. But for Java we could just write the class and it worked. This was pure magic!

Java was still mostly in use in the client side and when we thought about security, we had different vulnerabilities in mind. The sandbox occupied most of our security discussions. Fast forward a couple of decades and today when most developers discuss serialization the discussion isnt so positive. Serialization as Brian Vermeer puts it is: the gift that keeps giving.

https://www.youtube.com/watch?v=xLXFhRLkxLc

In fact, just after I created this video, a new deserialization vulnerability in SnakeYaml was exposed. Serialization is one of the biggest security problems in many programming languages, it isnt just a JVM problem. Hackers can use tools designed to deliver a serialization exploit chain. You can then generate a gadget used to deliver the exploit without too much knowledge of the system. That is scary stuff...

Im not a security expert, I care more about the solution. How do we make sure that the next zero-day doesnt affect us?

How do we harden our server code against serialization attacks?

Do we need Serialization?

We rarely need serialization. Ideally, if you can remove serialization entirely from your code and can avoid 3rd party code that uses serialization, you can just block it completely. This will mean that even if a zero-day comes up, the serialization portion will fail. You might have a bug, but it wont be an exploitable vulnerability.

Sometimes we need a bit of serialization. In that case, we can include only the well-known classes needed and block everything else out.

JEP 290: Serialization Filtering

The solution came in Java 9 in the form of serialization filtering as part of JEP 290. There are critical patch updates for older JDKs such as JDK 8u121. So if you must stay in an older version its still possible to use this feature.

It is possible to use this feature with no code changes, although we might want to change the code for additional functionality. The fundamental problem is testing that important systems dont break. We might run into difficulty when verifying serialization usage, e.g. in distributed caching layers. A cache might serialize objects to synchronize them between nodes over the network. We might miss that dependency when running tests locally, but fail in production. In those cases, you can follow the strategies listed below to solve the problems.

Whitelist vs. Blacklist

There are two approaches we can take when filtering specific serializable objects:

Whitelist - block everything and allow specific classes or packages in.
Blacklist - block specific problematic classes and packages.

A blacklist lets us block well-known vulnerabilities and that might be enough. But we have no guarantee that we blocked everything. A whitelist is usually the more secure option, yet it might break your code if you missed a class thats required in an edge case.

We can set the filter on the JDK itself by editing the java.security properties file. That might make sense if you package a JDK with your application. Personally, I prefer using the command line argument to configure that e.g.:

java -Djdk.serialFilter=!* -jar MyJar.jar

This command will block all serialization. Notice I need to use the quotes to prevent bash from expanding the star sign. The exclamation point means we wish to block and the star means we block everything.

The following code is a blacklist. Were blocking a specific package. We can also narrow it down to a specific class. But as I said before, this isnt ideal:

java -Djdk.serialFilter=!mypackage.* -jar MyJar.jar

Besides the inherent problems with the blacklist, a major problem is knowing what to block. There are obvious targets like classes that have been vulnerable in the past e.g.:

java.rmi.server.UnicastRemoteObject
java.util.logging.Handler
java.util.zip.Inflater
org.apache.commons.collections.functors.InvokerTransformer
org.apache.commons.collections4.functors.InvokerTransformer

Unfortunately, this list is not exhaustive and I couldnt find any list that I can use as a source for a proper blacklist. We can look through the Common Vulnerabilities and Exposures (CVE) database for exploits but thats painstaking work.

Finally, we have a whitelist where we allow the classes under the package mypackage. We can serialize them as usual. The JVM seamlessly blocks everything else. This is pretty close to the ideal situation. We can add additional classes and packages as necessary by adding them and separating them with a semicolon:

java -Djdk.serialFilter=mypackage.*;!* -jar MyJar.jar

What about Complexity?

How do you know which classes are serialized in the code? How do you get an alert if your code blocked a serialization attempt? This might be something you would want to track since it might be the system breaking or it might be an attempted hack. Both are valid reasons for an alert. You cant do that declaratively but you can write code that can use sophisticated logic to determine whether serialization should succeed.

This is a sample from the Oracle documentation of a simple serialization filter. Notice it can reject the serialization or leave it undecided. This is part of a filter chain where each stage in the validation process can reject the serialization or pass it on to the next stage. We can bind the filter globally as we do here, or do it on a per-stream basis. The API is remarkably flexible and provides a lot of information about the process:

ObjectInputFilter.Config.setSerialFilter(info -> info.depth() > 10 ? Status.REJECTED : Status.UNDECIDED);

TL;DR

You should always use serialization filtering when running a JVM deployment. This should always be the case.

Serialization filtering was backported to older JVM versions so there is absolutely no excuse.

Serialization filtering requires no code changes and we can enable it via global configuration or command line.

At the very least, you can use it to blacklist known vulnerabilities. Ideally, we should block all serialization and whitelist specific classes or packages as needed.

Is it Time to go Back to the Monolith?

Shai Almog — Tue, 07 Feb 2023 16:51:22 GMT

History repeats itself. Everything old is new again and Ive been around long enough to see ideas discarded, rediscovered and return triumphantly to overtake the fad. In recent years SQL has made a tremendous comeback from the dead. We love relational databases all over again. I think the Monolith will have its space odyssey moment again. Microservices and serverless are trends pushed by the cloud vendors, designed to sell us more cloud computing resources. Microservices make very little sense financially for most use cases. Yes, they can ramp down. But when they scale up, they pay the costs in dividends. The increased observability costs alone line the pockets of the big cloud vendors.

You can check out a video version of this post here:

https://youtu.be/NWu7AJJlLM8

I recently led a conference panel that covered the subject of microservices vs. monoliths. The consensus in the panel (even with the pro monolith person), was that monoliths dont scale as well as microservices.

This is probably true for the monstrous monoliths of old that Amazon, eBay et al. replaced. Those were indeed huge code bases in which every modification was painful and their scaling was challenging. But that isnt a fair comparison. Newer approaches usually beat the old approaches. But what if we build a monolith with newer tooling, would we get better scalability?

What would be the limitations and what does a modern monolith even look like?

Modulith

To get a sense of the latter part you can check out the Spring Modulith project. Its a modular monolith that lets us build a monolith using dynamic isolated pieces. With this approach we can separate testing, development, documentation and dependencies. This helps with the isolated aspect of microservice development with little of the overhead involved. It removes the overhead of remote calls and the replication of functionality (storage, authentication, etc.).

The Spring Modulith isnt based on Java platform modularization (Jigsaw). They enforce the separation during testing and in runtime, this is a regular Spring Boot project. It has some additional runtime capabilities for modular observability but its mostly an enforcer of best practices. This value of this separation goes beyond what were normally used to with microservices but also has some tradeoffs. Lets give an example. A traditional Spring monolith would feature a layered architecture with packages like:

com.debugagent.myappcom.debugagent.myapp.servicescom.debugagent.myapp.dbcom.debugagent.myapp.rest

This is valuable since it can help us avoid dependencies between layers. E.g. the DB layer shouldnt depend on the service layer. We can use modules like that and effectively force the dependency graph in one direction: downwards. But this doesnt make much sense as we grow. Each layer will fill up with business logic classes and database complexities. With a Modulith, wed have an architecture that looks more like this:

com.debugagent.myapp.customerscom.debugagent.myapp.customers.servicescom.debugagent.myapp.customers.dbcom.debugagent.myapp.customers.restcom.debugagent.myapp.invoicingcom.debugagent.myapp.invoicing.servicescom.debugagent.myapp.invoicing.dbcom.debugagent.myapp.invoicing.restcom.debugagent.myapp.hrcom.debugagent.myapp.hr.servicescom.debugagent.myapp.hr.dbcom.debugagent.myapp.hr.rest

This looks pretty close to a proper microservice architecture. We separated all the pieces based on the business logic. Here the cross dependencies can be better contained and the teams can focus on their own isolated area without stepping on each other's toes. Thats a lot of the value of microservices without the overhead.

We can further enforce the separation deeply and declaratively using annotations. We can define which module uses which and force one-way dependencies. So the human resources module will have no relation to invoicing. Neither would the customers module. We can enforce a one-way relation between customers and invoicing and communicate back using events. Events within a Modulith are trivial, fast and transactional. They decouple dependencies between the modules without the hassle. This is possible to do with microservices but would be hard to enforce. Say invoicing needs to expose an interface to a different module. How do you prevent customers from using that interface?

With modules we can. Yes. A user can change the code and provide access, but this would need to go through code review and that would present its own problems. Notice that with modules we can still rely on common microservice staples such as feature-flags, messaging systems, etc. You can read more about the Spring Modulith in the docs and in Nicolas Frnkels blog.

Every dependency in a module system is mapped out and documented in code. The Spring implementation includes the ability to document everything automatically with handy up-to-date charts. You might think, dependencies are the reason for Terraform. Is that the right place for such high level design?

An Infrastructure as Code (IaC) solution like Terraform could still exist for a Modulith deployment. But they would be much simpler. The problem is the division of responsibilities. The complexity of the monolith doesnt go away with microservices as you can see in the following image (taken from this thread). We just kicked that can of worms down to the DevOps team and made their lives harder. Worse, we didnt give them the right tools to understand that complexity so they have to manage this from the outside.

Thats why infrastructure costs are rising in our industry, where traditionally prices should trend downwards When the DevOps team runs into a problem they throw resources at it. This isnt the right thing to do in all cases.

Other Modules

We can use Standard Java Platform Modules (Jigsaw) to build a Spring Boot application. This has the advantage of breaking down the application and a standard Java syntax. But it might be awkward sometimes. This would probably work best when working with external libraries or splitting some work into common tools.

Another option is the module system in maven. This system lets us break our build into multiple separate projects. This is a very convenient process that saves us from the hassle of enormous projects. Each project is self-contained and easy to work with. It can use its own build process. Then as we build the main project everything becomes a single monolith. In a way, this is what many of us really want

What about Scale?

We can use most of the microservice scaling tools to scale our monoliths. A great deal of the research related to scaling and clustering was developed with monoliths in mind. Its a simpler process since theres only one moving part: the application. We replicate additional instances and observe them. Theres no individual service thats failing. We have fine grained performance tools and everything works as a single unified release.

I would argue that scaling is simpler than the equivalent microservices. We can use profiling tools and get a reasonable approximation of bottlenecks. Our team can easily (and affordably) set up staging environments to run tests. We have a single view of the entire system and its dependencies. We can test an individual module in isolation and verify performance assumptions.

Tracing and observability tools are wonderful. But they also affect production and sometimes produce noise. When we try to follow through on a scaling bottleneck or a performance issue, they can send us down the wrong rabbit hole.

We can use Kubernetes with monoliths just as effectively as we can use it with Microservices. Image size would be larger but if we use tools like GraalVM, it might not be much larger. With this we can replicate the monolith across regions and provide the same fail-over behavior we have with microservices. Quite a few developers deploy monoliths to Lambdas, Im not a fan of that approach as it can get very expensive. But it works...

The Bottleneck

But theres still one point where a monolith hits a scaling wall: the database. Microservices achieve a great deal of scale thanks to the fact that they inherently have multiple separate databases. A monolith typically works with a single data store. That is often the real bottleneck of the application. There are ways to scale a modern DB. Clustering and distributed caching are powerful tools that let us reach levels of performance that would be very difficult to match within a microservice architecture.

Theres also no requirement for a single database within a monolith. It isnt out of the ordinary to have an SQL database while using Redis for cache. But we can also use a separate database for time series or spatial data. We can use a separate database for performance as well, although in my experience this never happened. The advantages of keeping our data in the same database is tremendous.

The Benefits

The fact that we can complete a transaction without relying on eventual consistency is an amazing benefit. When we try to debug and replicate a distributed system, we might have an interim state thats very hard to replicate locally or even understand fully from reviewing observability data.

The raw performance removes a lot of the network overhead. With properly tuned level 2 caching we can further remove 80-90% of the read IO. This is possible in a microservice but would be much harder to accomplish and probably wont remove the overhead of the network calls.

As I mentioned before, the complexity of the application doesnt go away in a microservice architecture. We just moved it to a different place. In my experience so far, this isnt an improvement. We added many moving pieces into the mix and increased overall complexity. Returning to a smarter and simpler unified architecture makes more sense.

Why use Microservices

The choice of programming language is one of the first indicators of affinity to microservices. The rise of microservices correlates with the rise of Python and JavaScript. These two languages are great for small applications. Not so great for larger ones.

Kubernetes made scaling such deployments relatively easy, thus it added gasoline to the already growing trend. Microservices also have some capability of ramping up and down relatively quickly. This can control costs in a more fine grained way. In that regard microservices were sold to organizations as a way to reduce costs. This isnt completely without merit. If the previous server deployment required powerful (expensive) servers this argument might hold some water. This might be true for cases where usage is extreme, a sudden very high load followed by no traffic. In these cases, resources might be acquired dynamically (cheaply) from hosted Kubernetes providers.

One of the main selling points for microservices is the logistics aspect. This lets individual agile teams solve small problems without fully understanding the big picture. The problem is, it enables a culture where each team does its own thing. This is especially problematic during downsizing where code rot sets in. Systems might still work for years but be effectively unmaintainable.

Start with Monolith, Why Leave?

One point of consensus in the panel was that we should always start with a monolith. Its easier to build and we can break it down later if we choose to go with microservices. But why should we?

The complexities related to individual pieces of software make more sense as individual modules. Not as individual applications. The difference in resource usage and financial waste is tremendous. In this time of cutting down costs, why would people still choose to build microservices instead of a dynamic, modular monolith?

I think we have a lot to learn from both camps. Dogmatism is problematic as is a religious attachment to one approach. Microservices did wonders for Amazon. To be fair their cloud costs are covered

On the other hand, the internet was built on monoliths. Most of them arent modular in any way. Both have techniques that apply universally. I think the right choice is to build a modular monolith with proper authentication infrastructure that we can leverage in the future if we want to switch to microservices.

What are you Missing by Debugging in VS Code?

Shai Almog — Tue, 31 Jan 2023 13:55:56 GMT

In the first chapter of my debugging book, I discuss IDE debugging. In that chapter, I mostly talk about IntelliJ/IDEA. People often ask me why I didnt write as much about VS Code The reason is that there isnt much to write about. Its debugger is simpler for better and for worse. It isnt as powerful as other IDEs. I created the following video that covers the content of this post:

https://www.youtube.com/watch?v=OBgLeRwjlAc

This isnt a slam against VS Code or against Microsoft. Visual Studio has one of the most powerful debuggers around. But Visual Studio Code doesnt have a lot of the features from Visual Studio or other IDEs. I believe this is intentional. I think this is a user experience-driven decision in which they removed features to simplify usability. One thing VS Code did well was exposing the logpoint (tracepoint) feature, so it is more discoverable to the casual developer. Thats pretty great and wouldnt have been practical if the IDE had all the salient features.

But theres a price that comes with simplicity. As you can see in the following table there are many missing features that are available in IntelliJ. These are all features I covered in blog posts or videos. Notice that the video links in the following table are direct links to the specific time within the video.

Feature	VS Code	Comments	Links
Breakpoint			Video, Post
Conditional Breakpoint			Video, Post
Logpoint/Tracepoint			Video, Post
Step Over			Video, Post
Step Into			Video, Post
Step Out			Video, Post
Continue			Video, Post
Run to Cursor			Video, Post
Return Immediately		Restart Frame is available	Video, Post
Jump to Line			Video, Post
Return Value Display		(on by default)	Video, Post
Evaluate			Video, Post
Watch			Video, Post
Inline Watch			Video, Post
Set Value			Video, Post
Object Marking			Video, Post
Method Breakpoints			Video, Post
Field watchpoints			Video, Post
Exception Breakpoints		They suck without filters	Video, Post
Grouping/Naming Breakpoints			Video, Post
Disable Breakpoints			Video, Post
Instance Filters			Video, Post
Class Filters			Video, Post
Caller Filters			Video, Post
Filtering		Array and Collection filtering	Video, Post
Stream Debugger			Video, Post
Basic rendering		Very simplistic	Video, Post
Entry Rendering			Video, Post
Rendering Annotations			Video, Post
Thread View			Video, Post
Async Stack Traces		No custom support	Video, Post
Searchable memory View			Video, Post
Track new Instances			Video, Post

The Missing Features

Following is a high-level overview of the missing features.

Flow Control

Return immediately lets us return right away from a method and potentially return an arbitrary value. This is fantastic when you want to test edge cases.

There are also drop frame and throw exception features.

To be fair, VS Code has restart frame which is similar to drop frame and also nice.

Jump to line requires a plugin for IntelliJ. It lets us drag the execution pointer to an arbitrary location. If you have a bug, just drag the execution back and try again.

Need to skip a line of code because your app is in a problematic state but you still want to debug?

Drag forward. This is a fantastic killer feature when you need it.

Watch Area

Both IDEs contain a watch but only IntelliJ can show the values of the watch variables directly in the editor itself. This is very convenient when watching multiple values. It lets us see the stack at a glance as we scroll through the code.

Object marking is one of my favorite obscure features. It lets us dynamically declare a global variable that helps us track a value. We can use this global variable in a conditional breakpoint to verify things. One such example is saving the current thread as a marked object and then only breaking if we hit the method with a different thread.

Breakpoints

Method breakpoints are pretty problematic but they have some edge uses. One of the big values is the ability to break when returning from a long method. This is helpful in tracing threading issues.

Field watchpoints are very useful when tracking field mutation and new values.

We can manage breakpoints, name, group and disable them as a hierarchy group. When dealing with multiple tasks and switching branches in the middle of a debugging session, we can keep that session on hold by grouping all the breakpoints together.

When we return to the task, we can instantly jump right back!

VS Code has exception breakpoints. But without filters they absolutely suck!

We can filter breakpoint hits based on multiple criteria such as instance, class or a specific method in the stack. I spent so much time pressing continue over and over again. We can reduce this pain using these tools.

Arrays, Collections and Streams

Theres another spectacular type of filtering. We can filter the content of an array or collection right in the watch or evaluate area. I spent a great deal of time digging through arrays of image data with thousands of elements. This was a nightmare. With this, we can find the entries that we need in a collection or array, instantly!

This is about the Java 8 and newer stream API which is a functional programming construct. Its a fantastic tool, but it makes debugging awkward. The stream debugger borrows concepts from time travel debuggers to make stream debugging easier than regular debugging sometimes.

Entry Rendering

This is one of the most fantastic features you can think of. We can completely customize the way entries look in the watch. In the demo here, I show how I can expose the content of an Object Relational Mapping object as I step over in the debugger.

But this is hard to configure every time for every case. Annotations let us configure this globally so we can see this every time for specific library objects when running in the debugger.

Thread and Asynchronous Debugging

VS Code shows threads, but it has very limited display functionality and configurability. IntelliJ can open a dedicated thread view, hierarchies and much more.

It also supports gluing asynchronous stack traces together to make it easier to debug asynchronous code. This works seamlessly with well known APIs and the really cool thing is that we can use annotations to add this to our custom APIs.

Memory

We can search through memory to find any object instance. We can find VM internal instances and investigate issues by reviewing the objects in the system.

Better yet. We can track every new instance of a particular class. Get full stack traces to every new instance created between one breakpoint and another. This can track what happened under the hood with surgical precision.

Finally

Theres a lot I didnt cover because theres just so much. I dont think VS Code is inherently bad. It just went for simplicity. Personally, I think of myself as a power user. If youre like me I hope this post gave you a sense of what youre missing.

Please check out my book, my course, and follow me for videos like the one embedded above.

Remote Debugging Dangers and Pitfalls

Shai Almog — Tue, 24 Jan 2023 18:34:50 GMT

This is the last part of the debugging series, to learn the rest you'll need to get the book or the course. One of the most frequently asked questions I receive is: can we do these things in VS Code?

The answer is unfortunately no. But I elaborate on the debugging capabilities of VS Code in this video. I'll do a blog post that covers that next week.

Below is the last video in the series:

https://www.youtube.com/watch?v=ImOmu4GHOls

Transcript

Welcome back to the ninth part of debugging at scale where we really know the quality of your code. Remote debugging doesnt always deal with a remote machine, we often need it when debugging into Kubernetes or Docker.

Well delve more into that later but for now well discuss the basic mechanics. How to connect, how to make it slightly less vulnerable to security attacks, and then well discuss the problems of remote debugging.

The Connection and Command Line

Well start with a discussion around the connection. We first need to run the process that well connect to remotely. To do that we need to run a command similar to this one. Notice that this is a simplified version, in many cases the argument should be embedded in configuration files. When you inspect your maven or gradle files you might see many of the arguments listed here. This is how these things work under the hood. Lets go over the command and break it down piece by piece to see that we understand it correctly.

The first part is the launch of the Java command line. This is pretty obvious. We need quotes in bash since theres a star at the end of the line and bash wants to expand it. Without this quote the command wont work properly.

Agent lib is the system that loads the native library wiring directly into the virtual machine and JDWP is the Java Debug Wire Protocol. This is the underlying networking protocol used to communicate between the debugger and the running process. Its a high level protocol, that means it can be implemented on top of various transports. Typically, its implemented over TCP sockets, but its the same protocol we used to debug devices directly. You dont need to know too much about JDWP but the concept is simple, you send commands and can query the system. Thats what the IDE does for you. When you add a breakpoint the IDE sends a JDWP command to add a breakpoint at the given location. When the breakpoint is hit JDWP sends back an event to the IDE indicating that. The IDE can then query the details about the current environment, stack, variables, etc.

In this case we transfer the details via a server socket. We can use dt_shmem which stands for shared memory as the wire protocol. This is faster and useful for processes that have access to a shared memory area. This is actually pluggable, and you can build your own JDWP transport. This isnt useful usually but speaks to the power and flexibility of the API.

We can optionally suspend the virtual machine on launch if you want to debug something right from the start. Ive set this to no which means the VM will start running right away. If you set it to yes with the letter y the vm will pause on launch and wait for the JDWP connection. This is the address and port we are listening on. In this case I allow anyone to connect on port 5005. I can limit this to localhost only by changing the star character. This is probably the better approach, although it wont make the protocol fully secure.

This is the rest of the command, the class were running. Typically you would have something more substantial here. In this case Im just running the PrimeMain class. To start debugging we need to edit the run configuration in intellij.

Connecting from IntelliJ/IDEA

Next we need to locate a configuration for remote debugging. Once I select that we can add it. Notice its pre-configured with the defaults such as port 5005. I give the new run configuration a name, and were ready to go with debugging the app. Notice there are many options to tune here, but we dont need any of them. Also check out this area right here. Seems familiar? Thats the exact line we discussed before. The IDE is showing us how to set up the command line for the remote process. This lets us verify that we entered everything correctly.

We now have a new Debug Remote run configuration. We can switch to a different configuration from the same location. But when we want to do remote debugging we need to toggle it here. Next we need to press the debug button to run this command.

We are now instantly connected to the running process. Once that is done this feels and acts like any debugger instance launched from within the IDE. I can set a breakpoint, step over, inspect variables etc. So why do it?

In some cases running the server locally in the IDE is impractical. A good example would be debugging a container on your own machine. That might not be trivial.

Security Implications of JDWP

Calling JDWP insecure is inaccurate.

That would be like putting your house keys and home address wrapped in a nice gift wrapping with an itemized list of your valuables sorted by value in front of your house. This is an open door. An open door isnt a security vulnerability. Its an open door!

JDWP is very insecure when used remotely. Locally on your own machine it isnt a problem, but it has almost no security protections. Theres no solution for that. But theres a very partial workaround of tunneling it over SSH. This is relatively trivial. Just use this command to open a tunnel between the remote machine to your local machine. For both sides it will seem like local debugging. So the example I showed before of connecting to a local host server, would work perfectly with this remote host as SSH will move all the packets back and forth. Securely.

We cant ssh into a Kubernetes container but we can port forward which is almost identical. We can do something similar to this command to forward the port from the given pod to the local machine and vice versa. Same idea as the SSH tunneling but appropriate to the Kubernetes world.

Dangers of Remote Debugging

In this final section I want to talk about the dangers of remote debugging in production. Breakpoints break seems obvious. Thats what theyre here to do. But if we run on a server we block it completely by mistake. We can use tracepoints. As I said, theyre great. But they are no replacement to breakpoints and an accidental click in the gutter can literally stop your server in its tracks.

JDWP effectively allows remote code execution. Lets you access all the bytecode of the app which is effectively the same as giving access to your full source code. It lets attackers do almost anything since it wasnt designed with security in mind. We need to relaunch the application with debugging enabled. That means killing the running process and starting it over again. Disconnecting existing users, etc. That isnt great.

Some operations in the debugger require more than one step in terms of the protocol. As a result you could send a request to the debuggee, lose your connection and the debuggee could be stuck in a problematic state. This is an inherent limitation of the JDWP protocol and cant be worked around in a standard debugger. The problem is that even unintentional actions can demolish a server. A simple conditional breakpoint that invokes a method as part of the condition can demolish server performance and crash it.

Imagine placing a breakpoint where the user password is passed to authentication If JDWP is open for your server a member of your team might use that, and you will never know! Theres no tracking at all! 60% of security hacks happen from within the organization. If your company does remote debugging they have no way of knowing whether an employee used that to manipulate the application state or siphon user details. Theres no tracking or anything. This can be in violation of various rules and regulations since it might expose personal user data. Remote debugging into production can trigger liability risks.

I discuss some of the solutions for those problems both in the low level tooling and in higher level observability solutions. This is covered in the book and in the full course.

Final Word

With this we finished the first part of the course. If you want to check out the full course go to debugagent.com to learn more The next video covers the strategies for debugging and the science of debugging. If you have any questions please use the comments section. Thank you!

Memory Debugging - a Deep Level of Insight

Shai Almog — Tue, 17 Jan 2023 18:56:26 GMT

When I mention memory debugging the first thing that comes to the minds of many developers is the profiler. That isn't wrong but it's still a partial picture. Profilers are amazing at mapping that "big picture" but when you want to understand the domain, they fall short.

Modern debuggers let us gain a level of insight into the application that's unrivaled. We can inspect and locate a specific object instance with surgical precision.

https://www.youtube.com/watch?v=dFOFOEg2W4k

Transcript

Welcome back to the eighth part of debugging at scale where we know exactly which object was allocated by whom and why.

Profiler vs. Debugger

Profilers expose a lot of information about memory, but they dont give us the fine-grained view a debugger offers. The debugger can solve that last mile problem, it can connect the information you see in the debugger to actual actionable changes you need to make in the code.

The debugger perfectly complements the insights of the profiler. In the debugger we can pinpoint specific objects and memory locations. A profiler is a blunt instrument and the debugger is a fine one. By combining both we can zoom in on a specific issue and understand the root cause.

Searchable Memory View

Well start by launching the IDE memory view. We can enable the memory view by clicking the widget on the right side of the IDE here. Once we click it we can see the memory view in the same area. Notice that the memory view is empty by default even after we launch it. This keeps the IDE responsive. In order to see the actual objects in memory we need to click the load link in the center. Once loaded we can see the instance count for every object. This helps us get a sense of what exactly is taking up memory.

But thats only part of the story. When we step over there are allocations happening. We can see the difference between the current point and the one before when we look at the diff column. Notice when I say point I mean either the line before with a step over, but it can also apply for pressing continue between two breakpoints. In this case I can see the line I stepped over triggered the allocation of 70 byte arrays. That might seem like a lot but the IDE cant distinguish threads and a deep call graph, so we need to take the number with a grain of salt.

We can double-click an individual entry and see all the instances of the given object which is a remarkably powerful feature. Ill dig a bit deeper into this feature soon enough. As a reminder we can filter the objects we see here using the field on the top of the dialog and locate any object in memory. This is a very powerful tool.

Update Loaded Classes

Clicking load classes every time is tedious. I have a fast machine with a lot of RAM. I can enable Update Loaded Classes on Debugger Stop and I will no longer need to press load explicitly. Only do that if your machine is a monster as this will slow down your debugging sessions noticeably. Im enabling this here because I think it will make the video clearer.

Track New Instances

You might have noticed that area on the right side of the instance view. We can enable it with the track new instances option. This option lets us explicitly track the individual allocations that are going on between two point. We can enable that by right-clicking any non-array object and enabling this option like we do here.

Once enabled we see a small watch sign next to the tracked object but theres a lot more involved as we continue the execution. I can now see only the objects allocated in this diff. We can understand exactly what happened in terms of RAM at great detail. Notice that here I can see the exact number of elements that were allocated here. There were a lot because I took a long pause waiting before stepping over. By clicking show new instances I get a special version of the instances dialog.

In this version of the dialog I only see the new instances created. The IDE knows exactly which objects were created between the last stop on a breakpoint and now. It only shows me these objects. For each of the individual objects. I can see the stack trace that triggered it all the way up to the literal call to new!

I can understand who created every object and follow the logic to why an object was created. I can double-click an entry in the stack and go to the applicable source code. This is a fantastic level of insight.

Step-Over and Breakpoints

I discussed this before but these updates dont just work for step over. Everything I showed works exactly the same when jumping between two breakpoints. Even if theyre in separate methods. The diff will be between those two points!

This is very powerful. You can slowly narrow the gap between two points as you discover which area of the code is taking up memory. Notice that memory allocation directly correlates to performance as garbage collection is a major source of performance overhead. This lets us narrow down the root cause.

Final Word

In the next video well discuss remote debugging and its risks. I know what you might be thinking. I already know how to use remote debugging This is a different video, well discuss tunneling, port-forwarding and the risks involved in doing all of that. If you have any questions please use the comments section. Thank you!

Debugging Threads and Asynchronous Code

Shai Almog — Tue, 10 Jan 2023 16:52:06 GMT

My debugging book is finally out in printed form. I'm so thrilled! The course is also complete with 7 hours of content and 50 total videos.

If any of you are in the Tel Aviv area I will be speaking at two events on the 25th. In the morning/noon I will speak at LynxDevCon and later that night I will speak at the JavaIL meetup. I also have a few podcast talks and other events coming up.

This week we'll discuss one of the harder problems in programming: threading. For many cases threading issues aren't as difficult to debug. At least not in higher abstractions. Asynchronous programming is supposed to simplify the threading model but oftentimes it makes a bad situation worse by detaching us from the core context. We discuss why that is and how debuggers solve that problem. We also explain how you can create custom asynchronous APIs that are almost as easy to debug as synchronous applications!

https://www.youtube.com/watch?v=fPiTRdkJ6AQ

Transcript

Welcome back to the seventh part of debugging at scale where we dont treat debugging like taking out the garbage.

Concurrency and parallelism are some of the hardest problems in computer science. But debugging them doesnt have to be so hard. In this section well review some of the IDE capabilities related to threading as well various tricks and asynchronous code features.

Thread Views

Lets start by discussing some of the elements we can enable in terms of the thread view. In the stack frame we can look at all the current threads in the combo box above the stack frame. We can toggle the currently selected thread and see the stack for that thread and the thread status. Notice that here we chose to suspend all threads on this breakpoint. If the threads were running we wouldnt be able to see their stack as its constantly changing.

In the stack frame we can look at all the current threads in the combo box above the stack frame. We can toggle the currently selected thread and see the stack for that thread and the thread status. Notice that here we chose to suspend all threads on this breakpoint. If the threads were running we wouldnt be able to see their stack as its constantly changing. We can enable the threads view on the right hand side pull down menu to see more

As you can see viewing the stack is more convenient in this state when were working with many threads. Furthermore, we can customize this view even more by going into the customize thread view and enabling additional options.The thread groups option is probably the most obvious change as it arranges all the threads based on their groups and provides a pretty deep view of the hierarchy. Since most frameworks arrange their threads based on categories in convenient groups this is often very useful when debugging many threads.

Other than that we can show additional information such as the file name, line number, class name and argument types. I personally like showing everything but this does create a somewhat noisy view that might not be as helpful. Now that we switched on the grouping we can see the hierarchy of the threads. This mode is a bit of a double-edged sword since you might miss out on an important thread in this case but If you have a lot of threads in a specific group it might be the only way you can possibly work. I think well see more features like this as project Loom becomes the standard and the thread count increases exponentially. Im sure this section will see a lot of innovation moving forward.

Debugging a Race Condition

Next well discuss debugging race conditions. The first step of debugging a race condition is a method breakpoint. I know what I said about them but in this case we need it. Notice the return statement in this method includes a lot of code. If I place a breakpoint on the last line it will happen before that code executes and my coverage wont include that part.

So lets open the breakpoint dialog and expand it to the fully customizable dialog. Now we need to define the method breakpoint. I type the message and then get the thread name. I only use the method breakpoint for the exit portion because if I used it for both Id have no way to distinguish between exit and enter events. I make this a tracepoint by unchecking the suspend option. So now we have a tracepoint that prints the name of the thread that just exited the method.

I now do the exact same thing for a line breakpoint on the first line in the method. A line breakpoint is fine since entry to the method makes sense here. I change the label and make it also into a tracepoint instead of a breakpoint. Now we look at the console. I copy the name of the thread from the first printout in the console and add a condition to reduce the noise. If theres a race condition there must be at least one other thread right? So lets remove one thread to be sure

Going down the list its obvious that multiple threads enter the code. That means theres a risk of a race condition. Now it means I need to read the logs and see if an enter for one thread happened before the exit of another thread. This is a bit of work but is doable.

Debugging a Deadlock

Next lets discuss deadlocks. Here we have two threads each is waiting on a monitor held by the other thread. This is a trivial deadlock but debugging is trivial even for more complex cases. Notice the bottom two threads have a MONITOR status. This means theyre waiting on a lock and cant continue until its released. Typically, youd see this in Java as a thread is waiting on a synchronized block. You can expand these threads and see whats going on and which monitor is held by each thread. If youre able to reproduce a deadlock or a race in the debugger they are both simple to fix.

Asynchronous Stack Traces

Stack traces are amazing in synchronous code but what do we do when we have asynchronous callbacks?

Here we have a standard Async Example from JetBrains that uses a list of tasks and just sends them to the executor to perform on a separate thread. Each task sleeps and prints a random number. Nothing to write home about, as far as demos go this is pretty trivial.

Heres where things get interesting. As you can see theres a line that separates the async stack from the current stack on the top. The IDE detected the invocation of a separate thread and kept the stack trace on the side. Then when it needed the information it took the stack trace from before and glued it to the bottom. The lower part of the stack trace is from the main thread and the top portion is on the executor thread. Notice that this works seamlessly with Swing, executors, Spring Async annotation etc. Very cool!

Asynchronous Annotations

Thats pretty cool but theres still a big problem. How does that work and what if I have custom code?

It works by saving the stack trace in places where we know an asynchronous operation is happening and then placing it later on when needed. How does it connect the right traces? It uses variable values. In this demo I created a simple listener interface. Youll notice it has no asynchronous elements in the stack trace.

By adding the async schedule and async executor annotations I can determine the point where an async code might launch which is the schedule marker. I can place it on a variable to indicate the variable I want to use to lookup the right stack trace. I do the same thing with execute and get custom async stack traces. I can put the annotations on a method and the current object will be used instead.

Final Word

In the next video well discuss memory debugging. This goes beyond what the profiler provides, the debugger can be a complimentary surgical tool you can use to pinpoint a specific problem and find out the root cause.If you have any questions please use the comments section. Thank you!

Watch Area and Renderers

Shai Almog — Tue, 03 Jan 2023 16:58:29 GMT

This is it. The debugging book is now live. I would really appreciate reviews and feedback!

I also finished recording and editing the entire course. There are 50 total videos which total in 7 hours... I also recorded additional videos for the two other free courses for beginners and for modern Java. So keep an eye on those.

Renderers

In today's video we discuss one of my favorite obscure IDE features: renderers. Very few people are aware of them. I explained them in the past but I feel I didn't properly explain why they are so much better than any other alternative. This time I think I got the explanation right.If you work with JPA or any elaborate API you should check this out, I think the demo is revolutionary. If you provide a complex library to developers this can also be an amazing tool.

https://www.youtube.com/watch?v=oaUf8KXHsd0

Transcript

Welcome back to the sixth part of debugging at scale where the bugs come to die.In this section we discuss the watch area. The watch is one of the most important areas in the debugging process.

Yet we dont give it nearly as much attention as we should. In this segment well discuss a few of the powerful things we can do in the watch area and how we can extend it to support some fantastic capabilities.

Mute Renderers

Lets start with mute renderers which lets us improve the performance of the watch area. Before we discuss that Id like to talk about the watch area itself. This is the watch area. In it, we can see most of the visible variables for the current stack frame and their values. We can expand entries within the watch area. Every variable value or expression we have in the watch is an entry. We can add arbitrary elements to the watch and even embed watch expressions directly into the IDE user interface.

Notice the values on the right hand side these are typically the results of the toString() method. In IntelliJ, we can customize these via renderers which we will discuss further. But theres more to it as well see later on. For now just consider this. Every time I step over a line of code the IDE needs to collect all the variables in scope and invoke toString() on every one of them. In some case even more elaborate code. This is expensive and slow

In the right click menu we have the mute renderers option. By checking this option we can disable that behavior and potentially speed up the step-over speed significantly. Once we select that you will notice that all the values turn to three dots followed by the word toString. This means the renderers dont fetch the value anymore. They instead show this placeholder. This removes the overhead of the renderers completely and can speed up your step-over performance.

If we want to see the value we can click the toString label and the value is extracted dynamically. Notice that this only impacts the objects. Primitives, arrays etc. are unaffected by this feature.

Customize Rendering

Rendering is the process of drawing the element values in the watch. To get started with rendereres we need to customize them through the right click menu here. This launches the renderer customization dialog which lets us do amazing things in intellij.

For the most basic customization we can toggle and enable multiple features within this dialog. Then press apply to instantly see them in the variables view below. I can see the declared type of the field.

We can include fully qualified names for class files we can see the static field values.We can include hex values for primitives by default which is a feature I always enable because its so useful for me. This is an amazing view thats worth exploring and customizing to fit your own preference where you can tune the verbosity level in the watch area.

But the real power of this dialog is in the second tab. The java type renderer which is the next subject.

Data Rendering

We can go so much further with renderers. You might recall the visit objects Ive shown before. This is from a standard Spring Boot demo called pet clinic. Spring Boot has the concept of a Repository which is an interface that represents a datasource. Often a repository is just a table, it can do more, but it has a strong relation to an underlying SQL table, and it helps to think about it in these terms.If you look at the visitRepository and perRepository objects at the bottom of the screen youll notice that we dont have much to go on. These are just object IDs with no data thats valuable for a person debugging the objects. I didnt expand them but theres nothing under the variables here either.

Lets fix that in the customize data view as we did before. We add a renderer that applies to JpaRepository which is the base interface of this instance. Then we just write the expression to represent the rendering here. This renderer will apply to JPA repository and its sub interfaces or classes.

Next instead of using the default renderer I use an expression to indicate what we will show. The JPARepository includes a method called count() which queries the database and counts the number of elements within the database. I simply invoke it, notice I assume that the current object is the object being rendered. I dont provide an object instance. The IDE automatically runs in the context of the object. You can also use this to represent the rendered object. Notice I dont need to cast to a JPARepository.

This means the expression will be rendered directly in the watch without changing the toString method which in this case I obviously cant change and usually might not want to. The toString() method is useful in production, I wouldnt want expensive code in there. But in the renderer I can just go wild and do things that dont make sense in the repository.

Notice the on-demand checkbox. If we check this the expression will act like a muted renderer by default. You will need to click it to see the value.

Lets apply this change to the code and youll notice the visitRepository instantly changes to use the new expression we defined. We can now immediately see that the repository has 4 elements which is pretty cool. Right?

Notice that petRepository isnt changed, this is because its a repository too, but it isnt a JPARepository.

So far we did stuff that can theoretically be done by toString() methods. It might be hacky, but its not something unique. Lets take this up a notch.

The When expanding node option lets us define the behavior when a user expands the entry. The findAll() method of JPARepository returns all the entities in the repository, this will be invoked when we expand the entry.

We can optionally check if theres a reason to show the expand widget. In this case I use the count() method which would be faster than repeatedly calling findAll(). Once we apply the changes we can see the elements from the repository listed. Youll notice all 4 elements are here and since they are objects we can see all the attributes like any object we see in the watch.

This is truly spectacular, and you cant fake it with a toString() call

Doing it for Everyone

That was a cool feature right? But its so annoying to configure all of that stuff for every project. Here we see the same renderer from before, youll notice that everything looks exactly the same. The numbering, the list of entities etc. But when we open the list of renderers its blank, there are no renderers!

How does this suddenly work without a renderer?

Whats going on?

We use code annotations to represent the renderer.

This way you can commit the renderer to the project repository, and you dont need to configure individual IDE instances. This is pretty simple, we add a dependency on the JetBrains annotation library into the POM.

This is an annotation library. That means the code doesnt change in a significant way. Its just markers. Since its just hints to the debugger its ignored in runtime and doesnt have any implications or overhead.

We add an import to the renderer then we scroll down and we added simple renderer annotation code. Notice that this is pretty much the code I typed in the dialog but this time its used in an annotation this way our entire team can benefit from the improved view of repository objects!

If youre building libraries or frameworks you can integrate this to make the debugging experience easier to your users without impacting the behavior of the toString() methods or similar semantics.

Finally

In the next video well discuss threading issues. Their reputation as hard to debug isnt always justified.

If you have any questions please use the comments section. Thank you!

Debugging Streams and Collections

Shai Almog — Tue, 27 Dec 2022 18:35:19 GMT

I will run a book giveaway promotion on the Code Ranch on January 17th. Be sure to be there and let your friends know. It would be great to answer your questions about debugging. I'm very excited by this and by the feedback I'm getting for the course and new videos.

I also launched a free new Java course for complete beginners. No prior knowledge needed. This is probably not the audience for this course. But if you know someone that might be interested I'll appreciate a share. I hope people find it useful and learn a bit about Java. I'm working on a cadence of one video per week in this beginner course.

I also have another upcoming course for modern Java development. It's landing really soon. Subscribe to my channel to keep track of that. It's also free at this time.

Finally, I finished scripting all the videos for the full debugging course. It came out to a nice round 50 videos. I will hopefully finish filming and uploading everything in early January. This weeks video covers collections and Java 8 stream debugging. Check it out...

https://www.youtube.com/watch?v=fok4Icxsl2k

Transcript

Welcome back to the fifth part of debugging at Scale where we no longer stare blankly at the screen. We know where to look for that bug!

In this section we discuss streams and collections. These constructs are much harder to debug because of the issue of scale.

Notice that here Im talking about Java 8 and higher streams. These streams come from the realm of functional programming. Well dig deeper into them. But first I want to start by talking about filtering. Filtering is such a basic feature, that Im amazed it took me so long to notice that its there.

Filtering

When we have code that views an array or collection content, we can filter the content using an expression and reduce the noise significantly. I can right-click any such collection and select the filter operation.

In this case I have four elements in the pet clinic demo but this is especially valuable for large collections. Try reviewing hundreds of results to find the entry youre looking for

Here I reduce the content by testing against the pet id. Notice I use the keyword this to represent the current element in the list I dont need to do that, I can just type the getter but using this and a dot opened up the code completion support.

Once I apply this you will notice the numbers skip everything that doesnt match. That means elements one and two are hidden and we go from zero to three. That makes it pretty easy to instantly see where the filter took effect.This is very versatile if we have a list of names we can filter it so we can only see applicable names etc. It works with arbitrary objects, arrays and all collection types.

Java 8 Streams

For the next two features we need to discuss streams. Here we have a simple Java stream expression. These are functional expressions we can use to process multiple elements, lets review the various pieces of this expression. visits is a standard Java collection on which we invoke the stream method to get a functional interface we can work with.

Each operation in the stream transforms it to a different stream. In this case we filter out the duplicates within the stream. Map converts elements within the stream to a different type. In this case the visits are of type Visit. The map method is invoked for every element in the stream. In this case it converts the elements to PetDTO types. This operation maps from one type to another.

Finally, we collect all the elements in the stream into a set. We could collect them to a list or a different collection type. This is the value we return to the user. Now that we understand this I want to talk about a few important principals we saw here.The stream is self-contained. If we run it again, and again it will be idempotent that means the result doesnt change. Thats good. In the map code I could just change a global variable or add an element to a list. This would be called a side effect, and its a very bad thing to do in a stream. It will make it less debuggable and can cause problems with parallel streams, etc. I strongly suggest avoiding this and some things just wont work if you do it.

Debugging Streams With Breakpoints

Lets review the process of debugging the stream. We can debug it like we would debug a loop. Add a breakpoint and place a condition on it, so it will only stop for the pet we care about which is pet ID number 7.As you can see I have that condition right here and I can stop at the specific entry.

As I press continue the loop keeps going and stops when the applicable entry is hit. I can also use a non-conditional breakpoint to stop and just keep pressing continue. This is tedious for a larger list.This would work exactly the same as a loop and would let us see everything. But is there a better way?

The Stream Debugger

IntelliJ ships with a stream debugger which is a fantastic dedicated tool.When you stop on a breakpoint that includes a stream expression you can. See this button. Notice that it might be folded into a sub menu category depending on your version of intellij. This tool will only work if the stream has no side effects. If it does rely on an external variable it will fail, since the tool needs to manipulate the stream and run it to produce the results. If the stream has side effects it will trigger them and cause a problem. You will get a cryptic error message thats really hard to debug. So make sure the stream expression doesnt change anything outside the stream itself!

This launches the stream debugger. On the top you can see the stages of the stream expression. Notice that as I traverse through the stages of the expression the objects change and draw a line between their original mapping to the new one. Initially the list was of visits and now we see the conversion to the pets in each visit. Since the final stage is a set we will only get one instance of each pet. That isnt shown here.The advantage here is that you can see the entire process in a single view that you can take back and forth. Unlike a loop which you normally debug by stepping in the debugger. The view here is a bit more complete.This is inspired by time travel debuggers which is a unique branch of debugging I talk about in the book.

Final Word

In the next video well discuss watch expressions which are far more elaborate than what you might expect Specifically renderers which are some of the cooler features in JetBrains IDEs.If you have any questions please use the comments section. Thank you!

Java, Debugging, DevOps & Open Source

Software Testing as a Debugging Tool

The Intersection of Debugging and Testing

Unit Tests

Integration Tests

Coverage

The Debug-Fix Cycle

Composing Tests with Debuggers

Test-Driven Development

Final Word

Wireshark & tcpdump: A Debugging Power Couple

Introduction to Wireshark

Browser Network Monitors

Installation and Getting Started

Navigating Through Noise with Filters

Deep Dive into Data Analysis

Beyond Basic Usage

The Basics of HTTPS Encryption

Methods for Decrypting HTTPS in Wireshark

Pre-Master Secret Key Logging

Using a Proxy

Integrating tcpdump with Wireshark for Enhanced Network Analysis

The Role of tcpdump in Network Troubleshooting

Key Scenarios for tcpdump Usage:

Using tcpdump Effectively

Challenges and Considerations

Final Word

Mastering jhsdb: The Hidden Gem for Debugging JVM Issues

Introduction

Getting Started with jhsdb

Understanding and Using debugd

Leveraging jstack for Thread Dumps

Heap Memory Analysis with jmap

Basic JVM Insights with jinfo

Performance Metrics with jsnap

GUI Debugging: A Visual Approach

Final Word

Debugging Streams with Peek

Understanding Java Streams

A Simple Stream Example

What is the peek() Method?

Debugging with peek()

Uncovering Common Bugs with peek()

Filtering Issues

Large Data Sets

Addressing Side Effects

Limitations and Pitfalls

Potential for Misuse in Production Code

Performance Overhead

Side Effects and Functional Purity

The Right Tool for the Job

Navigating the Pitfalls

Final Thoughts

Debugging Using JMX Revisited

The Need for Advanced Management Tools in Development

Introduction to JMX (Java Management Extensions)

Understanding MBeans

Spring and Management Beans

Tooling for JMX Management

Getting Started with JMXTerm

Leveraging JMX in Debugging and Management

Exposing MBeans in Spring Boot

Understanding Spring Boot JMX Support

Expose an MBean in Spring Boot

Example: Exposing a Simple Configuration MBean

Final Word

Unleashing the Power of Git Bisect

The Essence of Debugging with Git

Setting the Stage for Debugging

Initiating Bisect Mode

Marking the Known Good Revision

Marking the Known Bad Revision

Bisecting to Find the Culprit

Expected Output

Reset and Analysis

Advanced Usage and Tips

Script Automation for Precision and Efficiency

Handling Flaky Tests with Strategy

Skipping Commits with Care

Unraveling a Regression Mystery

What is the `peek()` Method?

Debugging with `peek()`

Uncovering Common Bugs with `peek()`