Here is the latest from Julia Computing
BL
G

Finding ioctls with Clang and Cxx.jl

21 February 2017 | Keno Fischer

Among the more popular tools in the linux debugging toolbox is strace, which allows users to easily trace and print out all system calls made by a program (as well as the arguments to these system calls). I recently found myself writing a similar tool to trace all the requests made to a proprietary network driver. I had access to the sources of the userspace-facing API for this driver, but strace proper did not know about it. I thus took this opportunity to write a general tool to extract ioctls from header files. The result is a compelling, but nevertheless compact enough for a blog post, application of Cxx.jl, the Julia-C++ FFI and Clang, LLVM's C/C++ compiler project. In this blog post, I will walk you through my approach to this problem, highlighting both how to use Cxx.jl, and how to use the Clang C++ API. I will be focusing solely on extracing this data from header files. How to use it to write an strace like tool is a topic for another time.

Aside: About Cxx.jl

If you already know about Cxx.jl, feel free to move on to the next section. Cxx.jl is a julia package (available though the julia package manager), that allows julia to seamlessly interact with C++ code. It does this by taking advantage of Julia's capabilities for staged code generation to put an actual C++ compiler into julia's compilation pipeline. This looks roughtly as follows:

  1. Julia parses code in julia syntax

  2. The Cxx.jl package provides macros that translate from Julia to C++ code (either by punning on julia syntax, or by providing a string macro that allows the user to write C++ code directly). The package remembers what C++ code the user wants to run and leaves behind a generated function (basically a way for to ask the compiler to call back into the package when it wants to generate code for this particular function).

  3. Later when Julia wants to run a function that includes, C++ code, it sees the generated function, calls back into Cxx.jl, which then performs proper type translation (of any julia values used in the C++ code and back to julia from C++ code), and creates a C++ AST which it then passes to Clang to compile. Clang compiles this AST and hands back LLVM IR, which Cxx.jl can then declare to julia is the implementation of the generated function.

Note that we could have generated native code in step 3, instead of LLVM IR, but using LLVM IR, allows cross-language LTO-style optimization.

The easiest way to interact with Cxx.jl package, is through the C++ REPL that comes with the package. After the Cxx package is loaded, this mode is automatically added to the julia REPL and can be accessed by pressing the '<' key:

julia> using Cxx

  julia> # Press '<' here

  C++ > #include <iostream>
  true

  C++ > std::cout << "Hello World" << std::endl;
  Hello World

The problem

Before getting into the code, let's first carefully understand the problem at hand. We're interested in ioctl requests made to a certain driver. ioctl is essentially the catch-all system call for all requests to drivers that don't fit anywhere else. Such requests generally look like so:

c++

    int result = ioctl(fd, REQUEST_CODE, argument);

Where fd is a file descriptor associated with a resource managed by the driver, and argument is generally either an integer or (more commonly) a pointer to a more complicated structure of arguments for this request. REQUEST_CODE is a driver-specific code that specified what kind of request to make. In practice, there are exceptions to these rules, for a variety of reasons, but that's the general structure. So let's look at how the possible ioctl requests (I'll just call them ioctls for short, even though there's only one ioctl system call) are declared in the kernel headers. To be concrete, I'll pick out the USB ioctls, but the discussion applies generally. Let's look at an excerpt from the linked file:


  c++

  #define USBDEVFS_SETCONFIGURATION  _IOR('U', 5, unsigned int)
    #define USBDEVFS_GETDRIVER         _IOW('U', 8, struct usbdevfs_getdriver)
    #define USBDEVFS_SUBMITURB32       _IOR('U', 10, struct usbdevfs_urb32)
    #define USBDEVFS_DISCARDURB        _IO('U', 11)
    

Each of these lines defines an ioctl request (what I called request code above). Regular ioctls (ioctls defined like the ones above) have their request code split up as follows:

0xAAAABBCC
    \__/ | |
    Size | Code
      Category

which are the three values encoded by the #define above. The category is a (in theory unique by driver) 8-bit value that identifies to the kernel which driver to route the request to. The code is then used by the driver to identify the requested function. The size portion of the ioctl is ignored by the kernel, but may be used by the driver for backwards compatibility purposes. In the above define, the category is always 'U' (an ASCII-printable value is often chosen, but this is not a requirement), the numerical code follows, and lastly, we have the argument struct, which is used to compute the size.

For our ioctl dumper, we want four pieces of information: 1. The name of the ioctl 2. The category 3. The code 4. The argument struct (for size, as well as to extract field names such that we can print the argument structures in our ioctl dumper).

With a clear understand of what our goal is, let's get to work!

Playing with the Preprocessor

It is probably possible to accomplish a lot of this using regexes or similar text processing, but there is a few distinct advantages to using a proper C compiler, such as clang for the task: 1. It has a correct preprocessor, so we can see though any defines, as well as making sure to ignore anything not reachable due to ifdef or similar 2. It is easier to use it to automatically extract the fieldnames/offset etc, while seeing through typedefs and anything else that might make it hard for a text processor to understand what's going on.

So, let's get us a Clang instance. Setting one up from the C++ API requires a bit of boilerplate, but luckily for us, the Cxx.jl package, comes with the ability to create separate Clang instances from the one it using to process C++ code:

julia> CCompiler = Cxx.new_clang_instance(
      false #= don't julia definitions =#,
      true #= C mode (as opposed to C++) =#)
  Cxx.CxxInstance{2}()

Now, let's use that instance, to load up the header file we discussed above:

julia> Cxx.cxxinclude(CCompiler, "linux/usbdevice_fs.h")
  true

To achive our goal, we'll need to manually work with Clang's Parser and Preprocessor objects, so let's extract those for easy reference:

julia> PP = icxx"&$(Cxx.active_instances[2].CI)->getPreprocessor();"
  (class clang::Preprocessor *) @0x000055ff269d2380

  julia> P  = Cxx.active_instances[2].Parser
  (class clang::Parser *) @0x000055ff26338870

Ok, great. We have confirmed that the compiler parsed the header file and that it knows about our macro of interest. Let's see where we can go from there. Consulting the clang documentation we find out about clang::Preprocessor::getMacroInfo and clang::MacroInfo::tokens, which would give us what we want. Let's encode this into some julia functions for easy reference:

getIdentifierInfo(PP, name) = icxx"$PP->getIdentifierInfo($name);"
  getMacroInfo(PP, II::pcpp"clang::IdentifierInfo") = icxx"$PP->getMacroInfo($II);"
  getMacroInfo(PP, name::String) = getMacroInfo(PP, getIdentifierInfo(PP, name))
  tokens(MI::pcpp"clang::MacroInfo") = icxx"$MI->tokens();"

We can now do:

julia> tokens(getMacroInfo(PP, "USBDEVFS_SETCONFIGURATION"))
  (class llvm::ArrayRef<class clang::Token>) {
   .Data = (const class clang::Token *&) (const class clang::Token *) @0x000055ff269cec60
   .Length = (unsigned long &) 9
  }

So we have our array of tokens. Of course, this is not very useful to us in this form, so let's do two things. First, we'll teach julia how to properly display Tokens:

# Convert Tokens that are identifiers to strings, we'll use these later
  tok_is_identifier(Tok) = icxx"$Tok.is(clang::tok::identifier);"
  Base.String(II::pcpp"clang::IdentifierInfo") = unsafe_string(icxx"$II->getName().str();")
  function Base.String(Tok::cxxt"clang::Token")
      @assert tok_is_identifier(Tok)
      II = icxx"$Tok.getIdentifierInfo();"
      @assert II != C_NULL
      String(II)
  end
  getSpelling(PP, Tok) = unsafe_string(icxx"$PP->getSpelling($Tok);")
  function Base.show(io::IO, Tok::Union{cxxt"clang::Token",cxxt"clang::Token&"})
      print(io, unsafe_string(icxx"clang::tok::getTokenName($Tok.getKind());"))
      print(io, " '", getSpelling(PP, Tok), "'")
  end

Which'll looks something like this (not I used the pointer from above) [1] The astute reader may complain that I'm using the global PP instance to print this value. That is a valid complaint, and in the actual code, I made it an IOContext property, but I did not want to complicate this blog post with that discussion.


  c++

  C++> *(clang::Token*) 0x000055ff269cec60
    identifier '_IOR'
    

Great, we're on the right track. Let's also teach julia how to automatically iterate over llvm::ArrayRefs:

# Iteration for ArrayRef
import Base: start, next, length, done
const ArrayRef = cxxt"llvm::ArrayRef<$T>" where T
start(AR::ArrayRef) = 0
function next(AR::cxxt"llvm::ArrayRef<$T>", i) where T
    (icxx"""
        // Force a copy, otherwise we'll retain reference semantics in julia
        // which is not what people expect.
        $T element = ($AR)[$i];
        return element;
    """, i+1)
end
length(AR::ArrayRef) = icxx"$AR.size();"
done(AR::ArrayRef, i) = i >= length(AR)

Even though this may looks a bit complicated, all this is saying is that arrayrefs are indexed from one to AR.size(); and we can use the C++ bracket operator to access elements . With this defined, we get:

julia> collect(tokens(getMacroInfo(PP, "USBDEVFS_SETCONFIGURATION")))
9-element Array{Any,1}:
 identifier '_IOR'
 l_paren '('
 char_constant ''U''
 comma ','
 numeric_constant '5'
 comma ','
 unsigned 'unsigned'
 int 'int'
 r_paren ')'

We're off to a great start.

Getting all the ioctls

As the previous section may have indicated, defining iteration over an object, is an enourmously powerful way to work with said object. Because everything in julia is generation, enabling iteration over an object, immidiately allows us to use any of the standard iteration tools (e.g. filters, maps, etc) to work with our objects.

With this, in mind, let's see what we want to do. We know that ioctls are introduced by a macro that expands to _IO(...), _IOR(...), _IOW(...) or _IOWR(...). So let's define iteration over Clang's identifier table and write down exactly that:

start(tab::rcpp"clang::IdentifierTable") = icxx"$tab.begin();"
next(tab::rcpp"clang::IdentifierTable", it) = (icxx"$it->second;", icxx"++$it;")
done(tab::rcpp"clang::IdentifierTable", it) = icxx"$it == $tab.end();"
length(tab::rcpp"clang::IdentifierTable") = icxx"$tab.size();"
# Get all identifier that are macros
macros = Iterators.filter(II->icxx"$II->hasMacroDefinition();", icxx"$PP->getIdentifierTable();")
# Expand into tuples of (II, tokens)
IItokens = map(II->(II, collect(tokens(getMacroInfo(PP, II)))), macros)
# Now filter down to the ones we're interested in
ioctl_defs = filter(IItokens) do x
      II, tokens = x
      isempty(tokens) && return false
      tok_is_identifier(tokens[1]) && String(tokens[1]) in ["_IO","_IOR","_IOW","_IOWR"]
  end;

And if all worked well, we end up with:

julia> map(x->(String(x[1]),x[2]), ioctl_defs)
34-element Array{Tuple{String,Array{Any,1}},1}:
 ("USBDEVFS_FREE_STREAMS", Any[identifier '_IOR', l_paren '(', char_constant ''U'', comma ',', numeric_constant '29', comma ',', struct 'struct', identifier 'usbdevfs_streams', r_paren ')'])
 ("USBDEVFS_BULK32", Any[identifier '_IOWR', l_paren '(', char_constant ''U'', comma ',', numeric_constant '2', comma ',', struct 'struct', identifier 'usbdevfs_bulktransfer32', r_paren ')'])
 ("USBDEVFS_DISCARDURB", Any[identifier '_IO', l_paren '(', char_constant ''U'', comma ',', numeric_constant '11', r_paren ')'])

Extracting the fields from the structures

Now, it's fairly simple to do any any post-processing we want here, and what to do exactly will depend on our intended application, but I do want to highlight how to extract the fields. At first I attempted to simply use the second to last token as the type name, but that doesn't work very well, because some types are multiple tokens (e.g. unsigned int) and some others are only defined via (sometimes complicated preprocessor rules). Instead, the right way to do this, is to simply feed those tokens back through the parser. We'll use a couple of definitions

"Given an array of tokens, queue them up for parsing"
function EnterTokenStream(PP::pcpp"clang::Preprocessor", tokens::Vector{cxxt"clang::Token"})
  # Vector memory layout is incompatible, convert to clang::Token**
  toks = typeof(tokens[1].data)[x.data for x in tokens]
  icxx"$PP->EnterTokenStream(llvm::ArrayRef<clang::Token>{
    (clang::Token*)$(pointer(toks)),
    (size_t)$(length(toks))
  },false);"
end
"Advance the parse if it's currently at EOF. This happens in incremental parsing mode and should be called before parsing."
function AdvanceIfEof(P)
  icxx"""
  if ($P->getPreprocessor().isIncrementalProcessingEnabled() &&
    $P->getCurToken().is(clang::tok::eof))
      $P->ConsumeToken();
  """
end
"Parse a type name"
function ParseTypeName(P::pcpp"clang::Parser")
  AdvanceIfEof(P)
  res = icxx"$P->ParseTypeName(nullptr, clang::Declarator::TypeNameContext);"
  !icxx"$res.isUsable();" && error("Parsing failed")
  Cxx.QualType(icxx"clang::Sema::GetTypeFromParser($res.get());")
end
"Parse a constant expression"
function ParseConstantExpression(P::pcpp"clang::Parser")
  AdvanceIfEof(P)
  res = icxx"$P->ParseConstantExpression();"
  !icxx"$res.isUsable();" && error("Parsing failed")
  e = icxx"$res.get();"
  e
end
"Convert a parsed constant literal to a julia Char (asserts on failure)"
function CharFromConstExpr(e)
  Char(icxx"""
    clang::cast<clang::CharacterLiteral>($e)->getValue();
  """)
end
tok_is_comma(Tok) = icxx"$Tok.is(clang::tok::comma);"
tok_is_numeric(Tok) = icxx"$Tok.is(clang::tok::numeric_constant);"

With these definitions:

julia> ioctl_tokens = first(ioctl_defs)[2]
9-element Array{Any,1}:
 identifier '_IOR'
 l_paren '('
 char_constant ''U''
 comma ','
 numeric_constant '29'
 comma ','
 struct 'struct'
 identifier 'usbdevfs_streams'
 r_paren ')'

julia> typename_tokens = Vector{cxxt"clang::Token"}(ioctl_tokens[findlast(tok_is_comma, ioctl_tokens)+1:end-1])
2-element Array{Cxx.CppValue{Cxx.CxxQualType{Cxx.CppBaseType{Symbol("clang::Token")},(false, false, false)},N} where N,1}:
 struct 'struct'
 identifier 'usbdevfs_streams'

julia> EnterTokenStream(PP, typename_tokens); QT = Cxx.desugar(ParseTypeName(P))
Cxx.QualType(Ptr{Void} @0x000055ff24a13960)

C++ > ((clang::RecordType*)&*$QT)->getDecl()->dump()
RecordDecl 0x55951860c240 </usr/lib/gcc/x86_64-linux-gnu/6.2.0/../../../../include/linux/usbdevice_fs.h:153:1, line:157:1> line:153:8 struct usbdevfs_streams definition
|-FieldDecl 0x55951860c300 <line:154:2, col:15> col:15 num_streams 'unsigned int'
|-FieldDecl 0x55951860c358 <line:155:2, col:15> col:15 num_eps 'unsigned int'
`-FieldDecl 0x55951860c418 <line:156:2, col:21> col:16 eps 'unsigned char [0]'

We could process these for exmaple as such. Here I'll be using a different approach, where instead of using julia to do the iteration, I'll just write most of the function in C++ and only call back into julia once at the end:

# C structure to julia array of fields
function inspectStruct(CC, S)
    CC = Cxx.instance(CC)
    ASTCtx = icxx"&$(CC.CI)->getASTContext();"
    fields = Any[]
    icxx"""
    auto &ARL = $ASTCtx->getASTRecordLayout($S);
    for (auto field : ($S)->fields()) {
      unsigned i = field->getFieldIndex();
      // Skip these for now
      if (field->isImplicit())
        continue;
      if (field->getType()->isUnionType())
        continue;
      if (field->getType()->isArrayType())
        continue;
      if (field->getType()->isRecordType() ||
          field->getType()->isEnumeralType())
        continue;
      if (field->getType()->isPointerType() &&
          field->getType()->getPointeeOrArrayElementType()->isRecordType())
        continue;
      $:(begin
        QT = Cxx.QualType(icxx"return field->getType();")
        push!(fields, (
          String(icxx"return field;"),
          Cxx.juliatype(QT),
          icxx"return $ASTCtx->toCharUnitsFromBits(ARL.getFieldOffset(i)).getQuantity();"
        ))
      nothing
    end);
    }
    """
    fields
end
julia> inspectStruct(CCompiler, icxx"((clang::RecordType*)&*$QT)->getDecl();")
2-element Array{Any,1}:
 ("num_streams", UInt32, 0)
 ("num_eps", UInt32, 4)

Conclusion

With the above code, we can easily extract and work with the definitions of ioctls in the linux headers. I hope this blog post has given you an idea of both how to use the Clang C++ API to do some C introspection, as well as some idea, of how to use some of the generic programming features in julia. The above is a pretty decent summary some of the first things I do when working with new data sources in julia: 1. Defining printing method for the relevant types 2. Define iteration on any container data structures 3. Use julia's iteration tools to write whatever query I'm interested in Following this strategy usually gets one pretty far. In this case, it was essentially sufficient to solve our problem and provide a useful list of ioctls and the fields of their arguments to use in our ioctl dumping tool.

Recent posts

Eindhoven Julia Meetup
06 March 2023 | JuliaHub
Newsletter February 2023 - JuliaCon Tickets Available!
21 February 2023 | JuliaHub
11 Years of Julia - Happy Valentine's Day!
14 February 2023 | JuliaHub