Tuesday, October 1, 2019

Rene Dudfield: post modern C tooling - draft 2

DRAFT 1 - 9/16/19, 7:19 PM, I'm still working on this, but it's already useful and I'd like some feedback - so I decided to share it early.
DRAFT 2 - 10/1/19



This is a post about contemporary C tooling. Tooling for making higher quality C, faster.

In 2001 or so people started using the phrase "Modern C++". So now that it's 2019, I guess we're in the post modern era? Anyway, this isn't a post about C++ code, but some of this information applies there too.
The C language has no logo, but it's everywhere.

Welcome to the post modern era.

Some of the C++ people have pulled off one of the cleverest and sneakiest tricks ever. They required 'modern' C99 and C11 features in 'recent' C++ standards. Microsoft has famously still clung onto some 80s version of C with their compiler for the longest time. So it's been a decade of hacks for people writing portable code in C. For a while I thought we'd be stuck in the 80s with C89 forever. However, now that some C99 and C11 features are more widely available in the Microsoft compiler, we can use these features in highly portable code (but forget about C17/C18 ISO/IEC 9899:2018/C2X stuff!!).

So, we have some pretty modern language features in C with C11.  But what about tooling?

Tools and protection for our feet.

C, whilst a work horse being used in everything from toasters, trains, phones, web browsers, ... (everything basically) - is also an excellent tool for shooting yourself in the foot.

Noun

footgun (plural footguns)
  1. (informal, humorous, derogatory) Any feature whose addition to a product results in the user shooting themselves in the foot. C.

Tools like linters, test coverage checkers, static analyzers, memory checkers, documentation generators, thread checkers, continuous integration, nice error messages, ... and such help protect our feet.

How do we do continuous delivery with a language that lets us do the most low level footgunie things ever? On a dozen CPU architectures, 32 bit, 64bit, little endian, big endian, 64 bit with 32bit pointers (wat?!?), with multiple compilers, on a dozen different OS, with dozens of different versions of your dependencies?

Surely there won't be enough time to do releases, and have time left to eat my vegan shaved ice desert after lunch?



Debuggers

Give me 15 minutes, and I'll change your mind about GDB. --
https://www.youtube.com/watch?v=PorfLSr3DDI
Firstly, did you know gdb had a curses based 'GUI' which works in a terminal? It's a quite a bit easier to use than the command line text interface. It's called TUI. It's built in, and uses emacs key bindings.

But what if you are used to VIM key bindings? https://cgdb.github.io/

Also, there's a fairly easy to use web based front end for GDB called gdbgui (https://www.gdbgui.com/). For those who don't use an IDE with debugging support built in (such as Visual studio by Microsoft or XCode by Apple).





Reverse debugger

Normally a program runs forwards. But what about when you are debugging and you want to run the program backwards?

Set breakpoints and data watchpoints and quickly reverse-execute to where they were hit.

How do you tame non determinism to allow a program to run the same way it did when it crashed? In C and with threads some times it's really hard to reproduce problems.

rr helps with this. It's actual magic.

https://rr-project.org/






LLDB - the LLVM debugger.

Apart from the ever improving gdb, there is a new debugger from the LLVM people - lldb ( https://lldb.llvm.org/ ).


IDE debugging

Visual Studio by Microsoft, and XCode by Apple are the two heavy weights here.

The free Visual Studio Code also supports debugging with GDB. https://ift.tt/1RZAkzn

Sublime is another popular editor, and there is good GDB integration for it too in the SublimeGDB package (https://ift.tt/2vs5x6S).



Portable building, and package management

C doesn't have a package manager... or does it?

Ever since Debian dpkg, Redhat rpm, and Perl started doing package management in the early 90s people world wide have been able to share pieces of software more easily. Following those systems, many other systems like Ruby gems, JavaScript npm, and Pythons cheese shop came into being. Allowing many to share code easily.

But what about C? How can we define dependencies on different 'packages' or libraries and have them compile on different platforms?

How do we build with Microsofts compiler, with gcc, with clang, or Intels C compiler? How do we build on Mac, on Windows, on Ubuntu, on Arch linux?

Part of the answer to that is CMake. "Modern CMake" lets you define your dependencies,


Conan package manager

There are several packaging tools for C these days, but one of the top contenders is Conan.

https://conan.io/




Testing coverage.

Tests let us know that some certain function is running ok. Which code do we still need to test?

gcov, a tool you can use in conjunction with GCC to test code coverage in your programs.
lcov, LCOV is a graphical front-end for GCC's coverage testing tool gcov.


Instructions from codecov.io on how to use it with C, and clang or gcc. (codecov.io is free for public open source repos).
https://github.com/codecov/example-c


Here's documentation for how CPython gets coverage results for C.
 https://devguide.python.org/coverage/#measuring-coverage-of-c-code-with-gcov-and-lcov

Here is the CPython Travis CI configuration they use.
https://ift.tt/2QdE9sn
    - os: linux
language: c
compiler: gcc
env: OPTIONAL=true
addons:
apt:
packages:
- lcov
- xvfb
before_script:
- ./configure
- make coverage -s -j4
# Need a venv that can parse covered code.
- ./python -m venv venv
- ./venv/bin/python -m pip install -U coverage
- ./venv/bin/python -m test.pythoninfo
script:
# Skip tests that re-run the entire test suite.
- xvfb-run ./venv/bin/python -m coverage run --pylib -m test --fail-env-changed -uall,-cpu -x test_multiprocessing_fork -x test_multiprocessing_forkserver -x test_multiprocessing_spawn -x test_concurrent_futures
after_script: # Probably should be after_success once test suite updated to run under coverage.py.
# Make the `coverage` command available to Codecov w/ a version of Python that can parse all source files.
- source ./venv/bin/activate
- make coverage-lcov
- bash > (curl -s https://codecov.io/bash)




Static analysis

"Static analysis has not been helpful in finding bugs in SQLite." -- https://ift.tt/1lrHmyE

According to David Wheeler in "How to Prevent the next Heartbleed" (https://dwheeler.com/essays/heartbleed.html#static-not-found the security problem with a logo, a website, and a marketing team) only one static analysis tool found the Heartbleed vulnerability before it was known. This tool is called CQual++. One reason for projects not using these tools is that they have been (and some still are) hard to use. The LLVM project only started using the clang static analysis tool on it's own projects recently for example. However, since Heartbleed tools have improved in both usability and their ability to detect issues.

I think it's generally accepted that static analysis tools are incomplete, in that each tool does not guarantee detecting every problem or even always detecting the same issues all the time. Using multiple tools can therefore be said to find multiple different types of problems.

Compilers are kind of smart

The most basic of static analysis tools are compilers themselves. Over the years they have been getting more and more tools which used to only be available in dedicated Static Analyzers and Lint tools.
Variable shadowing and format-string mismatches can be detected reliably and quickly is because both gcc and clang do this detection as part of their regular compile. --  Bruce Dawson
Here we see two issues (which used to be) very common in C being detected by the two most popular C compilers themselves.

Compiling code with gcc "-Wall -Wextra -pedantic" options catches quite a number of potential or actual problems (https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html). Other compilers check different things as well. So using multiple compilers with their warnings can find plenty of different types of issues for you.

Compiler warnings should be turned in errors on CI.

By getting your errors down to zero on Continuous Integration there is less chance of new warnings being introduced that are missed in code review. There are problems with distributing your code with warnings turned into errors, so that should not be done.

Some points for people implementing this:
  • -Werror can be used to turn warnings into errors
  • -Wno-error=unknown-pragmas
  • should run only in CI, and not in the build by default. See werror-is-not-your-friend (https://ift.tt/2otugX2).
  • Use most recent gcc, and most recent clang (change two travis linux builders to do this).
  • first have to fix all the warnings (and hopefully not break something in the process).
  • consider adding extra warnings to gcc: "-Wall -Wextra -Wpedantic" See C tooling
  • Also the Microsoft compiler MSVC on Appveyor can be configured to treat warnings as errors. The /WX argument option treats all warnings as errors. See MSVC warning levels
  • For MSVC on Appveyor, /wdnnnn Suppresses the compiler warning that is specified by nnnn. For example, /wd4326 suppresses compiler warning C4326.
If you run your code on different CPU architectures, these compilers can find even more issues. For example 32bit/64bit Big Endian, and Little Endian.

Static analysis tool overview.

Note, that static analysis can be much slower than the analysis usually provided by compilation. It trades off more CPU time for (perhaps) better results.

The talk "Clang Static Analysis" (https://www.youtube.com/watch?v=UcxF6CVueDM) talks about an LLVM tool called codechecker (https://ift.tt/1N3XoYK). Clang's Static Analyzer, a free static analyzer based on Clang.  Not that XCode IDE on Mac includes the clang static analyser.

Visual studio by Microsoft can also do static code analysis too. ( https://docs.microsoft.com/en-us/visualstudio/code-quality/code-analysis-for-c-cpp-overview?view=vs-2017)

cppcheck focuses of low false positives and can find many actual problems.
Coverity, a commercial static analyzer, free for open source developers
CppDepend, a commercial static analyzer based on Clang
codechecker, https://ift.tt/1N3XoYK
cpplint, Cpplint is a command-line tool to check C/C++ files for style issues following Google's C++ style guide.
Awesome static analysis, a page full of static analysis tools for C/C++. https://ift.tt/297jUXZ
PVS-Studio, a comercial static analyzier, free for open source developers.




cppcheck 

Cppcheck is an analysis tool for C/C++ code. It provides unique code analysis to detect bugs and focuses on detecting undefined behaviour and dangerous coding constructs. The goal is to detect only real errors in the code (i.e. have very few false positives).

The quote below was particularly interesting to me because it echos the sentiments of other developers, that testing will find more bugs. But here is one of the static analysis tools saying so as well.
"You will find more bugs in your software by testing your software carefully, than by using Cppcheck."

To Install cppcheck

http://cppcheck.sourceforge.net/ and https://github.com/danmar/cppcheck
The manual can be found here: http://cppcheck.net/manual.pdf

brew install cppcheck bear
sudo apt-get install cppcheck bear

To run cppcheck on C code.

You can use bear (the build ear) tool to record a compilation database (compile_commands.json). cppcheck can then know what c files and header files you are using.

# call your build tool, like `bear make` to record. 
# See cppcheck manual for other C environments including Visual Studio.
bear python setup.py build
cppcheck --quiet --language=c --enable=all -D__x86_64__ -D__LP64__ --project=compile_commands.json

 It does seem to find some errors, and style improvements that other tools do not suggest. Note that you can control the level of issues found to errors, to portability and style issues plus more. See cppcheck --help and the manual for more details about --enable options.

For example these ones from the pygame code base:
[src_c/math.c:1134]: (style) The function 'vector_getw' is never used.
[src_c/base.c:1309]: (error) Pointer addition with NULL pointer.
[src_c/scrap_qnx.c:109]: (portability) Assigning a pointer to an integer is not portable.
[src_c/surface.c:832] -> [src_c/surface.c:819]: (warning) Either the condition '!surf' is redundant or there is possible null pointer dereference: surf.

cppcheck reports 942 things in the pygame codebase. (633 without cython related things).




Custom static analysis for API usage

Probably one of the most useful parts of static analysis is being able to write your own checks. This allows you to do checks specific to your code base in which general checks will not work. One example of this is the gcc cpychecker (https://ift.tt/2QdDE1t). With this, gcc can find API usage issues within CPython extensions written in C. Including reference counting bugs, and NULL pointer de-references, and other types of issues. You can write custom checkers with LLVM as well in the "Checker Developer Manual" (https://ift.tt/2AouWCL)

There is a list of GCC plugins (https://ift.tt/2nio7Qp) among them are some Linux security plugins by grsecurity.




"Using SAL annotations to reduce code defects." (https://docs.microsoft.com/en-us/visualstudio/code-quality/using-sal-annotations-to-reduce-c-cpp-code-defects?view=vs-2019)

"In GNU C and C++, you can use function attributes to specify certain function properties that may help the compiler optimize calls or check code more carefully for correctness."
https://ift.tt/YnYDi9




Performance profiling and measurement

“The objective (not always attained) in creating high-performance software is to make the software able to carry out its appointed tasks so rapidly that it responds instantaneously, as far as the user is concerned.”  Michael Abrash. “Michael Abrash’s Graphics Programming Black Book.”
Reducing energy usage, and run time requirements of apps can often be a requirement or very necessary. For a mobile or embedded application it can mean the difference of being able to run the program at all. Performance can directly be related to user happiness but also to the financial performance of a piece of software.

But how to we measure the performance of a program, and how to we know what parts of a program need improvement? Tooling can help.

Valgrind

Valgrind has its own section here because it does lots of different things for us. It's a great tool, or set of tools for improving your programs. It used to be available only on linux, but is now also available on MacOS.

Apparently Valgrind would have caught the heartbleed issue if it was used with a fuzzer.

https://ift.tt/1rsSquc

Apple Performance Tools

Apple provides many performance related development tools. Along with the gcc and llvm based tools, the main tool is called Instruments. Instruments (part of Xcode) allows you to record and analyse programs for lots of different aspects of performance - including graphics, memory activity, file system, energy and other program events. By being able to record and analyse different types of events together can make it convienient to find performance issues.

Many of the low level parts of the tools in XCode are made open source through the LLVM project. See "LLVM Machine Code Analyzer" ( https://ift.tt/2AmtaC3) as one example.

Free and Open Source performance tools.



Microsoft performance tools.


Intel performance tools.

https://ift.tt/2nrboKY




Caching builds

https://ift.tt/1p9Qaer

ccache is very useful for reducing the compile time of large C projects. Especially when you are doing a 'rebuild from scratch'. This is because ccache can cache the compilation of parts in this situation when the files do not change.
https://ift.tt/2AkIUWi

This is also useful for speeding up CI builds, and especially when large parts of the code base rarely change.


Distributed building.


distcc https://ift.tt/1IhPrLe
icecream https://ift.tt/1nYVgri


Complexity of code.


How complex is your code?
https://ift.tt/2QdDGGD

complexity src_c/*.c


Testing your code on different OS/architectures.

Sometimes you need to be able to fix an issue on an OS or architecture that you don't have access to. Luckily these days there are many tools available to quickly use a different system through emulation, or container technology.


Vagrant
Virtualbox
Docker
Launchpad, compile and run tests on many architectures.
Mini cloud (ppc machines for debugging)

If you pay Travis CI, they allow you to connect to the testing host with ssh when a test fails.


Code Formatting

clang-format

clang-format - rather than manually fix various formatting errors found with a linter, many projects are just using clang-format to format the code into some coding standard.



Services

LGTM is an 'automated code review tool' with github (and other code repos) support. https://ift.tt/2An3sNP

Coveralls provides a store for test coverage results with github (and other code repos) support. https://coveralls.io/




Coding standards for C

There are lots of coding standards for C, and there are tools to check them.

An older set of standards is the MISRA_C (https://ift.tt/1aHRHMw) aims to facilitate code safety, security, and portability for embedded systems.

The Linux Kernel Coding standard (https://ift.tt/2raUSfL) is well known mainly because of the popularity of the Linux Kernel. But this is mainly concerned with readability.

A newer one is the CERT C coding standard (https://ift.tt/2AilH9l), and it is a secure coding standard (not a safety one).

The website for the CERT C coding standard is quite amazing. It links to tools that can detect each of the problems automatically (when they can be). It is very well researched, and links each problem to other relevant standards, and gives issues priorities. A good video to watch on CERT C is "How Can I Enforce the SEI CERT C Coding Standard Using Static Analysis?" (https://www.youtube.com/watch?v=awY0iJOkrg4). They do releases of the website, which is edited as a wiki. At the time of writing the last release into book form was in 2016.







How are other projects tested?

We can learn a lot by how other C projects are going about their business today.
Also, thanks to CI testing tools defining things in code we can see how automated tests are run on services like Travis CI and Appveyor.

SQLite

"How SQLite Is Tested"

Curl

"Testing Curl"
https://ift.tt/2QdDHdF

Python

"How is CPython tested?"
https://ift.tt/2AqGbuy

OpenSSL

"How is OpenSSL tested?"

https://ift.tt/2QdbW4I
They use Coverity too: https://ift.tt/2AouZhV
https://ift.tt/2Qg5gDd

libsdl

"How is SDL tested?" [No response]


Linux

As of early 2019, Linux used no unit testing within the kernel tree (some unit tests exist outside of the kernel tree).

There's no in-tree unit tests, but linux is probably one of the most highly tested pieces of code there is.

Linux relies a lot on community testing. With thousands of developers working on Linux every day, that is a lot of people testing things out. Additionally, because all of the source code is available for Linux many more people are able to try things out, and test things on different systems.


https://ift.tt/2ssxdKn

https://ift.tt/2LSv8PV


Haproxy

https://ift.tt/2Azy2nH









from Planet Python
via read more

1 comment:

  1. Hiya, Could I copy the snapshot and apply it on my own blog page? This Site

    ReplyDelete

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...