I know it's not truly this simple, but if the file extension is ".ino", I feel like your detection algorithm should be free to use that as a massive indication it is Arduino code.