Published on 17 June 2012.
Recently I’ve been writing interpreters for the Brainfuck programming language in Haskell. I’ve also held two workshops where I’ve taught Haskell, and in those workshops, the last exercise is to work on a Brainfuck interpreter.
I’ve experimented with different solutions both to see what they feel like and also to see how they affect performance.
The fifth version became particularly beautiful, and I shall share that beauty with you in this post. This version is inspired by ideas that raek presented to me during and after a workshop.
If you are not familiar with Brainfuck, go read on Wikipedia. I will not explain it here.
I’m going to assume that you are a bit familiar with Haskell. If there is something in particular that you do not understand, feel free to ask in the comments.
The function we are implementing is this:
execute :: String -> IO ()
= interact (run (compile program) emptyDataMap) execute program
The Brainfuck program enters our execute
function as a string. We compile the program to an internal representation and then run
it together with a data tape, emptyDataMap
, that holds the data for the Brainfuck program.
interact
passes stdin to our run
function and prints the output of it on stdout.
We represent the Brainfuck data tape with a typeclass. That allows us to experiment with different data structures without changing the interpreter. The typeclass looks like this:
class Data d where
emptyData :: d
dataGet :: d -> Int
dataModifyValue :: (Int -> Int) -> d -> d
dataModifyPos :: (Int -> Int) -> d -> d
First of all we need a way to create a data tape. Next we need a way to extract the integer at the current position. And finally we need to be able to change the value at the current position and also change the position itself.
A concrete implementation of this typeclass is presented below:
data DataMap = DataMap
currentPos :: Int
{ values :: M.Map Int Int
,
}
emptyDataMap :: DataMap
= DataMap 0 M.empty
emptyDataMap
instance Data DataMap where
= emptyDataMap
emptyData
= M.findWithDefault 0 (currentPos dat) (values dat)
dataGet dat
= dat { values = newValues }
dataModifyValue fn dat where
= dataGet dat
oldValue = M.insert (currentPos dat) (fn oldValue) (values dat)
newValues
= dat { currentPos = fn (currentPos dat) } dataModifyPos fn dat
The most interesting thing to note is that we use a map instead of a list structure. This makes it easy to create an infinite tape. If we ask for a value at a position that does not yet exist, we just return the default value 0.
To make it easier to work with the data tape, we have these general helper functions that only depend on the interface of the typeclass:
dataMoveRight :: Data d => d -> d
= dataModifyPos inc
dataMoveRight
dataMoveLeft :: Data d => d -> d
= dataModifyPos dec
dataMoveLeft
dataGetAscii :: Data d => d -> Char
= C.chr $ dataGet dat
dataGetAscii dat
dataWriteAscii :: Data d => Char -> d -> d
= dataModifyValue (const $ C.ord i)
dataWriteAscii i
dataIncValue :: Data d => d -> d
= dataModifyValue inc
dataIncValue
dataDecValue :: Data d => d -> d
= dataModifyValue dec
dataDecValue
inc :: Int -> Int
= (+1)
inc
dec :: Int -> Int
= x - 1 dec x
With the data tape out of the way, let’s look at how a Brainfuck program is represented in Haskell. The source code is transformed in two steps: first the program is converted to a list of tokens, and then those tokens are converted to a sort of byte code that the run
function interprets.
Making this separation has the benefit that each function that does a transformation can focus on doing only one transformation. The parsing function only needs to deal with the problem of getting the structure of a Brainfuck program represented in Haskell, and does not have to know anything about how this later gets executed.
The data structure for tokens looks like this:
data Token
= TInc
| TDec
| TLeft
| TRight
| TPrint
| TRead
| TLoop [Token]
deriving (Show, Eq)
This is a tree-like structure where all tokens are leaves except for loops which in turn have a list of tokens inside of them.
The function that converts a Brainfuck program to a list of tokens uses the Parsec library and looks like this:
parseTokens :: String -> [Token]
=
parseTokens input case parse bfTokens fileName (removeComments input) of
Left err -> error (show err)
Right x -> x
where
fileName :: String
= ""
fileName removeComments :: String -> String
= filter (`elem` "+-<>.,[]")
removeComments bfTokens :: Parser [Token]
= many bfToken
bfTokens bfToken :: Parser Token
= fmap (const TInc) (char '+')
bfToken <|> fmap (const TDec) (char '-')
<|> fmap (const TLeft) (char '<')
<|> fmap (const TRight) (char '>')
<|> fmap (const TPrint) (char '.')
<|> fmap (const TRead) (char ',')
<|> fmap TLoop (between (char '[') (char ']')
bfTokens)
bfTokens
is our parser. It parses a list of bfToken
. bfToken
defines a few cases saying that a token is either a plus, or a minus, and so on. The last case says: match a left bracket, then a list of tokens, then a right bracket, and put those tokens inside a TLoop
.
Our parser is run on the line with the case-expression. Parsing something with Parsec can either fail or succeed. If parsing fails, we call error
and make the interpreter crash. If parsing succeeds, we return the result (which is a list of tokens).
Before we pass the input string to the parser, we remove all comments by keeping only the characters from the input string that mean something in Brainfuck.
The byte code that run
interprets looks like this:
data ByteCode
= BInc ByteCode
| BDec ByteCode
| BLeft ByteCode
| BRight ByteCode
| BPrint ByteCode
| BRead ByteCode
| BLoop ByteCode ByteCode
| BEND
deriving (Show, Eq)
This is a recursive data type. Every byte code instruction except BEND
contains the byte code instruction following it. The loop instruction has two byte codes to continue with: either we enter the loop or we continue after it.
The function that converts tokens to byte code looks like this:
toByteCode :: [Token] -> ByteCode
= toByteCode' tokens BEND
toByteCode tokens where
toByteCode' :: [Token] -> ByteCode -> ByteCode
= end
toByteCode' [] end TInc :xs) end = BInc (toByteCode' xs end)
toByteCode' (TDec :xs) end = BDec (toByteCode' xs end)
toByteCode' (TLeft :xs) end = BLeft (toByteCode' xs end)
toByteCode' (TRight :xs) end = BRight (toByteCode' xs end)
toByteCode' (TPrint :xs) end = BPrint (toByteCode' xs end)
toByteCode' (TRead :xs) end = BRead (toByteCode' xs end)
toByteCode' (TLoop ls:xs) end = let inner = toByteCode' ls loop
toByteCode' (= toByteCode' xs end
rest = BLoop inner rest
loop in loop
There is one special trick going on here, and that is in the last case where we create the loop instruction. toByteCode'
takes as last argument the instruction that should be at the end. At the top level, we pass in BEND
, but when creating the loop, we pass in the loop itself. So the last instruction of the loop links to the loop itself.
Below is a picture of the byte code representation of this Brainfuck program “.[-.]>
” :
Compiling the whole program is just a matter of combining the parsing with the byte code convertion:
compile :: String -> ByteCode
= toByteCode . parseTokens compile
The run
function looks like this:
run :: Data d => ByteCode -> d -> String -> String
BEND dat input = "done!\n"
run BInc next) dat input = run next (dataIncValue dat) input
run (BDec next) dat input = run next (dataDecValue dat) input
run (BLeft next) dat input = run next (dataMoveLeft dat) input
run (BRight next) dat input = run next (dataMoveRight dat) input
run (BPrint next) dat input = (dataGetAscii dat) : run next dat input
run (BRead next) dat (i:is) = run next (dataWriteAscii i dat) is
run (BLoop loop next) dat input
run (| dataGet dat == 0 = run next dat input
| otherwise = run loop dat input
It takes as input a byte code, the Brainfuck data tape, and the user input. The return value is a string which is the output of the program.
The formatting reveals that there are only two byte codes which modify the output, and that is BEND
and BPrint
. In the same way, we can see that BRead
is the only byte code that consumes input. All other byte codes just pass the input along to the next instruction. About half of the byte codes do something with the data tape. In those cases, a modified data tape is passed to the recursive call to run
.
When running a loop instruction, we examine the current value on data tape and decide which branch of instructions to continue with.
I find this implementation beautiful because every piece of the program does one specific thing and the pieces are combined to form a whole. The program is also compact. But it’s not compact because text has been crammed together. It’s compact because combining different pieces in Haskell does not require much syntactic noise.
Because I’ve never felt this kind of beauty in my object oriented code, I suspect that it is the way functional languages allow you to combine pieces that make them more beautiful to me.
If you want to look closer at the source code or look at other implementations, it is up on github.
What is Rickard working on and thinking about right now?
Every month I write a newsletter about just that. You will get updates about my current projects and thoughts about programming, and also get a chance to hit reply and interact with me. Subscribe to it below.