Published on 26 July 2024.
In this blog post we’re going to explore how to write and test a Git client using the Testing Without Mocks approach. Specifically we’re going to explore how to apply Output Tracking to this example and also contrast it with mocks. All details about the example are not explained, but the full source code is included at the end. Check out James’ article for more details about the testing without mocks approach.
The example Git client is a CLI-application that provides a simplified interface to Git. This represents a real world scenario yet can be made small enough for an example.
The application implements two commands:
myscm save -> git commit
myscm share -> git push
The application consists of the following classes:
App --+--> SaveCommand ---> Process
|
+--> ShareCommand --> Process
|
+--> Args
|
+--> Terminal
Process
, Args
, and Terminal
are low-level infrastructure wrappers that are made nullable using embedded stubs.
Process
is for running external processes. (git
in this example.)Args
is for reading command line arguments.Terminal
is for writing text to the terminal.SaveCommand
and ShareCommand
are application code that perform a function in the domain of a Git client. They are made nullable using fake it once you make it.
App
is also application code that routes commands to the correct sub-command. It is also made nullable using “fake it once you make it”.
App
?We want to write sociable, state-based test.
What does that mean in the context of testing App
?
Sociable means that we should use its real dependencies. That is, we should inject a real SaveCommand
, ShareCommand
, Args
, and Terminal
. We should not inject test doubles like mocks or stubs.
So the test setup will look something like this:
"""
>>> app = App(
... save_command=SaveCommand(...),
... share_command=ShareCommand(...),
... terminal=Terminal(...),
... args=Args(...),
... )
"""
However, if we were to invoke methods on app
now, it would interact with the outside world. It would read command line arguments, execute git
commands, and write to the terminal.
We don’t want to do that. It takes a long time and is brittle. We therefore inject null versions of dependencies like this:
"""
>>> app = App(
... save_command=SaveCommand.create_null(),
... share_command=ShareCommand.create_null(),
... terminal=Terminal.create_null(),
... args=Args.create_null(),
... )
"""
Creating a null version is exactly like creating a real version except that at the very edge of the application boundary, the communication with the outside world is turned off. We put this in a factory-method:
class App:
@classmethod
def create_null(cls):
return cls(
save_command=SaveCommand.create_null(),
share_command=ShareCommand.create_null(),
terminal=Terminal.create_null(),
args=Args.create_null(),
)
...
App
has only one method, and that is run
:
def run(self):
...
So the only test we can write is this:
"""
>>> app = App.create_null()
>>> app.run()
"""
There is no way to control what the command line arguments are, and there is no way to observe what the application is doing.
Here are two scenarios that would be useful to test:
When the application is called with ["save", "message"]
, then git commit is performed.
When the application is called with ["share"]
, then git push is performed.
In order to write those test, we need a way to control the outside world to simulate that a given set of command line arguments are present. We also need a way to observe what commands are run.
We can solve the first part by passing simulated command line arguments to create_null
. The test then becomes this:
"""
>>> app = App.create_null(args=["save", "message"])
>>> app.run()
"""
App.create_null
is modified to this:
class App:
@classmethod
def create_null(cls, args):
return cls(
save_command=SaveCommand.create_null(),
share_command=ShareCommand.create_null(),
terminal=Terminal.create_null(),
args=Args.create_null(args=args),
)
...
Args
supports configuring responses when creating the null version. In that case it would return the configured command line arguments instead of the real ones. The communication with the outside world has been turned off, and we simulate the part of the outside world that reads command line arguments from the environment.
Now we can write our two scenarios like this:
"""
>>> app = App.create_null(args=["save", "message"])
>>> app.run()
# How to assert that git commit was called?
>>> app = App.create_null(args=["share"])
>>> app.run()
# How to assert that git push was called?
"""
And now we come to the main topic of this blog post: output tracking.
App
performs actions by delegating to SaveCommand
and ShareCommand
. Both of them take the rest of the command line arguments and perform an action without returning anything. To observe that with output tracking, we introduce state in the commands so that we can query them and see if they were run. A slightly more elegant solution, instead of introducing state, is to fire events. Here is how we implement it in SaveCommand
:
class SaveCommand(Trackable):
def run(self, args):
self.notify(f"SAVE_COMMAND {args!r}")
...
To track events, we can do this:
"""
>>> events = Events()
>>> SaveCommand.create_null().track_events(events).run(["message"])
>>> events
SAVE_COMMAND ['message']
"""
We use the event tracking pattern for both commands and the terminal like this:
class App:
@classmethod
def create_null(cls, events, args):
return cls(
save_command=SaveCommand.create_null().track_events(events),
share_command=ShareCommand.create_null().track_events(events),
terminal=Terminal.create_null().track_events(events),
args=Args.create_null(args=args),
)
...
And now we can write our tests like this:
"""
>>> events = Events()
>>> App.create_null(events, args=["save", "message"]).run()
>>> events
SAVE_COMMAND ['message']
>>> events = Events()
>>> App.create_null(events, args=["share"]).run()
>>> events
SHARE_COMMAND []
>>> events = Events()
>>> App.create_null(events, args=["unknown", "sub", "command"]).run()
>>> events
TERMINAL_WRITE 'Unknown command.'
"""
The implementation looks like this:
def run(self):
args = self.args.get()
if args[0:1] == ["save"]:
self.save_command.run(args[1:])
elif args[0:1] == ["share"]:
self.share_command.run([])
else:
self.terminal.write("Unknown command.")
The tests for App
are similar to end-to-end-test in that the whole stack is executed. Except right at the application boundary. So if we supply incorrect arguments to the save command for example, this test will blow up:
"""
>>> App.create_null(Events(), args=["save"]).run()
Traceback (most recent call last):
...
ValueError: Expected one argument as the message, but got [].
"""
This is overlapping, sociable testing. We are actually testing that App
calls SaveCommand
correctly. However, the behavior of the save command is not tested here. We only test that application parses command line arguments correctly and calls the appropriate sub-command.
Let’s contrast how the first test can be written using mocks and stubs instead. Here it is again:
"""
>>> events = Events()
>>> App.create_null(events, args=["save", "message"]).run()
>>> events
SAVE_COMMAND ['message']
"""
And here is the mock/stub version:
"""
>>> save_command_mock = Mock()
>>> App(
... save_command=save_command_mock,
... share_command=None,
... terminal=None,
... args=Mock(**{"get.return_value": ["save", "message"]})
... ).run()
>>> save_command_mock.run.call_args_list
[call(['message'])]
"""
The share command and terminal are not exercised in this test, so we inject None
. For args
we inject a stub that is configured to return ["save", "message"]
when its get
method is called. For the save_command
, we inject a mock. After we call the run
method on the application, we assert that the run
method was called on the mock with the ['message']
argument.
Let’s contrast the two assertions:
"""
>>> events
SAVE_COMMAND ['message']
>>> save_command_mock.run.call_args_list
[call(['message'])]
"""
They look very similar. Almost to the point that output tracking feels like mocking.
But there is one crucial difference:
The mock version creates isolated tests whereas the output tracking version creates sociable tests.
We have already seen what happens in the output tracking version when we call the save command with incorrect arguments. What happens in the mock based version? It happily passes:
"""
>>> save_command_mock = Mock()
>>> App(
... save_command=save_command_mock,
... share_command=None,
... terminal=None,
... args=Mock(**{"get.return_value": ["save"]})
... ).run()
>>> save_command_mock.run.call_args_list
[call([])]
"""
To make the mock based test suite “equivalently powerful” we need to augment it with “contract tests”. In this case we need a test saying something like when the save command is called with no arguments, it does not blow up. And we have to write such tests for every example in our test suite. When we assert that a dependency is called in a certain way or returns a certain thing under certain conditions, we also have to write a “contract test” that checks that the dependency can actually accept those arguments and return those things under said conditions. That seems like a whole lot more work to me. (I think the term “contract test” is mostly used in the context of external services, but I think the reasoning is the same for two classes where one is a dependency. That’s how J.B. Rainsberger uses the term in J B Rainsberger Integrated Tests Are A Scam HD and Getting Started with Contract Tests.)
Another more subtle difference between output tracking and mocks is that output tracking tracks the action that was performed whereas mocks record function calls.
Here are the two assertions again:
"""
>>> events
SAVE_COMMAND ['message']
>>> save_command_mock.run.call_args_list
[call(['message'])]
"""
The save command emits an event that indicates that the save action was performed with the given arguments. We are free to rename individual functions and the test will still pass.
In the mock version we explicitly check that the run
method was called. If we want to rename it, we have to update the test as well.
I struggle a bit with this difference. I think in most cases, the function call and the event should contain the same information.
I asked James about this:
I have a question regarding output tracking.
“Output Trackers should write objects that represent the action that was performed, not just the function that was called to perform it.”
Shouldn’t those in most cases be very similar? I mean, the name of a function should match what it does, right? Sure, you can refactor them separately, but wouldn’t you often want to rename the object written if you rename the function?
He replied
It’s really a prescription against treating the tracker as a Spy that records the function name and arguments. You should be able to refactor the API without feeling like you have to change the tracker, as long as behavior remains the same.
I can see cases where tracking events can be a little more flexible. Take this logging class for example:
class Logger:
def info(self, message):
self.log("INFO", message)
def error(self, message):
self.log("ERROR", message)
def log(self, level, message):
self.notify(f"LOG {level} {message}")
...
In application code you call info
and error
. But in tests you don’t need to care about which exact method was called. Only that the relevant LOG ...
event was emitted.
Perhaps the SAVE_COMMAND
event should not include raw arguments? Here is the implementation:
def run(self, args):
self.notify(f"SAVE_COMMAND {args!r}")
if len(args) != 1:
raise ValueError(f"Expected one argument as the message, but got {args!r}.")
self.process.run(["git", "commit", "-a", "-m", args[0]])
Perhaps it makes more sense to record the event with an explicit message like this?
def run(self, args):
if len(args) != 1:
raise ValueError(f"Expected one argument as the message, but got {args!r}.")
message = args[0]
self.notify(f"SAVE_COMMAND message={message}")
self.process.run(["git", "commit", "-a", "-m", message])
An assertion would then look like this:
"""
>>> events
SAVE_COMMAND message=message
"""
Is this event more relevant from the point of view of the caller? Maybe. In any case, the event approach is more flexible compared to the mock version.
#!/usr/bin/env python
from unittest.mock import Mock
import doctest
import subprocess
import sys
class Trackable:
def __init__(self):
self.events = []
def track_events(self, events):
if events:
self.events.append(events)
return self
def notify(self, event):
for events in self.events:
events.append(event)
class Events:
def __init__(self):
self.events = []
def append(self, event):
self.events.append(event)
def __repr__(self):
return "\n".join(self.events)
class App:
@classmethod
def create(cls):
"""
>>> isinstance(App.create(), App)
True
"""
return cls(
save_command=SaveCommand.create(),
share_command=ShareCommand.create(),
terminal=Terminal.create(),
args=Args.create(),
)
@classmethod
def create_null(cls, events=None, args=[]):
return cls(
save_command=SaveCommand.create_null().track_events(events),
share_command=ShareCommand.create_null().track_events(events),
terminal=Terminal.create_null().track_events(events),
args=Args.create_null(args=args),
)
def __init__(self, save_command, share_command, terminal, args):
self.save_command = save_command
self.share_command = share_command
self.terminal = terminal
self.args = args
def run(self):
"""
>>> events = Events()
>>> App.create_null(events, args=["save", "message"]).run()
>>> events
SAVE_COMMAND ['message']
>>> save_command_mock = Mock()
>>> App(
... save_command=save_command_mock,
... share_command=None,
... terminal=None,
... args=Mock(**{"get.return_value": ["save", "message"]})
... ).run()
>>> save_command_mock.run.call_args_list
[call(['message'])]
>>> App.create_null(Events(), args=["save"]).run()
Traceback (most recent call last):
...
ValueError: Expected one argument as the message, but got [].
>>> save_command_mock = Mock()
>>> App(
... save_command=save_command_mock,
... share_command=None,
... terminal=None,
... args=Mock(**{"get.return_value": ["save"]})
... ).run()
>>> save_command_mock.run.call_args_list
[call([])]
>>> events = Events()
>>> App.create_null(events, args=["share"]).run()
>>> events
SHARE_COMMAND []
>>> events = Events()
>>> App.create_null(events, args=["unknown", "sub", "command"]).run()
>>> events
TERMINAL_WRITE 'Unknown command.'
"""
args = self.args.get()
if args[0:1] == ["save"]:
self.save_command.run(args[1:])
elif args[0:1] == ["share"]:
self.share_command.run([])
else:
self.terminal.write("Unknown command.")
class SaveCommand(Trackable):
@classmethod
def create(cls):
return cls(
process=Process.create()
)
@classmethod
def create_null(cls, events=None):
return cls(
process=Process.create_null().track_events(events)
)
def __init__(self, process):
Trackable.__init__(self)
self.process = process
def run(self, args):
"""
>>> events = Events()
>>> SaveCommand.create_null().track_events(events).run(["message"])
>>> events
SAVE_COMMAND ['message']
>>> events = Events()
>>> SaveCommand.create_null(events=events).run(['message'])
>>> events
PROCESS_RUN ['git', 'commit', '-a', '-m', 'message']
>>> SaveCommand.create_null().run(['message', '--force'])
Traceback (most recent call last):
...
ValueError: Expected one argument as the message, but got ['message', '--force'].
"""
self.notify(f"SAVE_COMMAND {args!r}")
if len(args) != 1:
raise ValueError(f"Expected one argument as the message, but got {args!r}.")
self.process.run(["git", "commit", "-a", "-m", args[0]])
class ShareCommand(Trackable):
@classmethod
def create(cls):
return cls(
process=Process.create()
)
@classmethod
def create_null(cls, events=None):
return cls(
process=Process.create_null().track_events(events)
)
def __init__(self, process):
Trackable.__init__(self)
self.process = process
def run(self, args):
"""
>>> events = Events()
>>> ShareCommand.create_null(events=events).run([])
>>> events
PROCESS_RUN ['git', 'push']
"""
self.notify(f"SHARE_COMMAND {args!r}")
self.process.run(["git", "push"])
class Terminal(Trackable):
@classmethod
def create(cls):
return cls(sys=sys)
@classmethod
def create_null(cls):
class NullStream:
def write(self, text):
pass
def flush(self):
pass
class NullSysModule:
stdout = NullStream()
return cls(sys=NullSysModule())
def __init__(self, sys):
Trackable.__init__(self)
self.sys = sys
def write(self, text):
"""
>>> events = Events()
>>> Terminal.create().track_events(events).write("hello")
hello
>>> events
TERMINAL_WRITE 'hello'
>>> Terminal.create_null().write("hello")
"""
self.notify(f"TERMINAL_WRITE {text!r}")
print(text, file=self.sys.stdout, flush=True)
class Args(Trackable):
@classmethod
def create(cls):
return cls(sys=sys)
@classmethod
def create_null(cls, args):
class NullSysModule:
argv = ["null program"]+args
return cls(sys=NullSysModule())
def __init__(self, sys):
Trackable.__init__(self)
self.sys = sys
def get(self):
"""
>>> Args.create().get()
['--test']
>>> Args.create_null(args=["configured", "args"]).get()
['configured', 'args']
"""
return self.sys.argv[1:]
class Process(Trackable):
@classmethod
def create(cls):
return cls(subprocess=subprocess)
@classmethod
def create_null(cls):
class NullSubprocessModule:
def call(self, command):
pass
return cls(subprocess=NullSubprocessModule())
def __init__(self, subprocess):
Trackable.__init__(self)
self.subprocess = subprocess
def run(self, command):
"""
>>> events = Events()
>>> Process.create().track_events(events).run(["echo", "hello"])
>>> events
PROCESS_RUN ['echo', 'hello']
>>> Process.create_null().run(["echo", "hello"])
"""
self.notify(f"PROCESS_RUN {command!r}")
self.subprocess.call(command)
if __name__ == "__main__":
if Args.create().get() == ["--test"]:
doctest.testmod()
print("OK")
else:
App.create().run()
What is Rickard working on and thinking about right now?
Every month I write a newsletter about just that. You will get updates about my current projects and thoughts about programming, and also get a chance to hit reply and interact with me. Subscribe to it below.