0:00
Hi, thank you for coming, thank you for having me
0:05
I'm here to talk to you about AI versus rule-based static code ysis
0:11
Who am I? Why do I have standing to talk to you about this
0:15
Look, here's a picture of me. It's the same picture of me that's on everything. I love this picture of me
0:20
It looks like I'm Ernest Shackleton. I'm actually on the boat from Neon to Ivoire going across the lake Geneva in the winter
0:29
So I'm Kendrick, I'm Director of Technology ... at Codacy. What Codacy does is a developer development, four-year masters in computer systems and software engineering
0:42
18 years in software, mostly PHP and JavaScript. Sorry, sorry I did do some C-sharp once in 2011
0:49
I hope this is still relevant. I'm a scrum master, I'm a game designer and what I call thought follower
0:54
I don't write posts on LinkedIn with new thoughts, I just consume everybody else's
0:59
So it's Friday night. We could be here talking about AI or we could be having a beer, right
1:10
I just got a Coke right here. So let's just take a quick check to make sure we're spending our time on some of the features of beer versus AI before we start
1:19
So beer is fun. It is easy to over consume, can get expensive, and it'll lead you to making bad decisions and talking nonsense
1:26
On the other hand, AI, yeah, okay, let's do the talk. Right, so this little bit's going to be a bit salesy
1:34
It's not really, you'll see the point in a minute. But what Codacy does is we run static code ysis tools for you on your code, on your repos, and give you the results
1:47
And we let you gate to your PRs based on them, which is pretty cool. but fundamentally we support 40 programming languages of which C-sharp is one and more than 50 tools
1:58
And there are 10,000 configurable rules in our universe for static code ysis, which is pretty cool
2:05
But that's a lot of work to maintain, right? Keeping all those tools and rules up to date
2:12
piece of work for us so what if we didn't have to do that? What if AI could just replace all of
2:20
those tools for us? Wouldn't that be great because then we would have a much easier job or we'd be
2:24
out of business, not sure which. Well let's dive in and have a look. So I'm gonna stop sharing
2:34
the presentation for a sec and instead we can switch some VS code. I hope everybody can still see
2:42
I've pulled up an old version of one of Codac's own files, which is in Scala, sorry, in VS Code with a little plugin that I've been messing around with called Codicide, which does all sorts of stuff
2:57
But one of the things that it does is allows you to run static ysis through OpenAI using chat GPT 3.5 Turbo
3:05
if that means anything to you it will create a prompt it will fire chunks of your code off
3:15
and ask it to pretend to be a static code ysis tool and what happens if we do that so
3:23
I pressed the button earlier to save as a little bit that it is a bit slow and I will push the
3:28
the button again at the end to show you that that's the case. So one limitation as AI stands
3:36
right now is that it's not that quick in solving this problem. But yeah let's just dive in and have
3:42
a look at a few of the things that it's picked out. It's finding a name constant, something that you
3:47
will see over and over again is terrible at actually finding the right line number of your
3:53
issue so we're looking at number 15 which presumably is talking about this here
3:59
this here but it's actually only managed to figure out how to put the line number for the
4:06
issue it's found all the way up here so not super accurate and we've got some hardcoded credentials
4:16
we've got some unvalidated input which it picks up quite a few of
4:23
it loves talking about unencrypted cookies if you can find them so it'll say and this is a
4:30
file like cookie helper about doing cookies right so it will give you a few interesting things I like
4:36
this one particularly this is the genuine cookie only flag making vulnerable process
4:41
scripting that is great it's found an error somewhere we can just go to find that there is
4:48
a reference to http only in a few places in the time http only being sent to false
4:55
so that's pretty cool it's finding stuff it's not getting it quite in the right place but you know
4:59
that's that's still interesting it's getting there what I'm going to do for you now is I'm going to
5:04
push this button again so like I say when I push this button what's happening is that we're starting
5:09
again we're taking the whole file from scratch we're taking in 60 line chunks
5:14
pumping them off to open AI and getting the response which has its obvious limitations already right very often it might pick it up this time it will complain that you got unused import statements because it finds something like scala
5:29
That's not used in the first 60 lines of this file, right? And so it will say, as far as it's concerned
5:36
in the context that it's got, that there's an error here and there's not
5:41
Obviously, as AI gets more advanced and we can either pump more context in
5:46
or set models up the windows and get bigger. But for right now, that is a pain point
5:52
And I've done a lot of talking so far, and it's still not returned any results. I do wonder if I actually clicked the button correctly
5:59
It must have been getting that out. Yeah, here we go. We're starting to get something here
6:04
And indeed, it has picked up somewhere. It's just underlined that one
6:11
So having pushed the button again, here's another interesting thing about using AI for doing static code ysis
6:16
it's given different results, and some of those might be valid as well
6:22
There's still a fair amount of noise here. But if you are relying on your static code ysis for doing compliance
6:30
or for definitely detecting a specific set of issues, then again, AI, in inverted commas, is really not quite there yet, right
6:41
So that's what we've got there. By comparison, I just flip over to the repository view from Codicy
6:52
It's done some weird autocomplete there. And wait a few seconds again
6:57
What this is going to do is this is going to pull down all of the issues within this file from a previous static code ysis run using ScalaMeta Pro and whatever else is configured in Codicy for this file
7:11
there are indeed issues within this file but they're not the ones that the AI has picked out
7:20
So it's very big on... Somehow the AI is executed on top of that, sorry, let me just run that again just for the repository
7:30
for you. There's one specific error that it finds lots and lots of times about prohibiting insecure
7:38
cookies but not anything else so the AI found different useful things compared to what we found
7:46
but it wasn't consistent and it found a lot of noise as well and something that people hate about
7:50
standard code ysis is when it provides when you generate a lot of noise
7:59
you can go and do the same sort of activity on other file types so we can go and look at his
8:06
markdown file if we want to do the same thing. It's actually interesting that the AI either gets
8:15
markdown really right and it will correct your spelling and tabbing and formatting errors within
8:25
markdown and you can see a white space between two ashes and the text and not there or it will
8:32
think that you're trying to execute a code file and you'll complain about things that look like
8:37
HTML or that look like code and completely get it wrong which again in terms of bad decisions
8:45
not great you can ameliorate a load of this stuff of course talking about the import statements you
8:50
could before we pump stuff into your AI you could say okay well we've put the import statements as
8:55
a problem so we're going to put a special condition in and we're just going to skip over the import statements right and then we'll just ask it to not do those or maybe we can just search
9:06
for references manually for all of those so we don't need to do that and again you can but you've
9:14
just generated a new piece of work that you then have to maintain forever and you certainly can't
9:19
do that easily across 40 different languages so that's a problem we're still not getting any
9:27
response from the AI at this moment. Oh, there you go. It's found lots. That's fun
9:34
Another weakness I found with AI, we can have a look at the errors in a sec, is that
9:44
in codacy we support 10,000 rules. You can turn them on and off. You are getting noise in your
9:50
linter because either because you work in a particular way or because you have a legacy
9:54
code base has got lots of the same kind of problems in. You know, something we find a lot in category
10:02
I'll flip back over to the repo view in a minute, but you'll see that there's always a ton
10:07
of code style errors in pretty much any repository that's ysed within Codicy
10:12
using rules tools which a lot of people don't care about or they're going to
10:20
use ScalaFund or whatever to wipe them out anyway and they're not interested in fixing those whereas AI is going to pick those
10:27
things up every single time unless you start writing a very long and complicated prompt to say
10:33
and don't consider this and don't consider that and don't consider that in which case you're going
10:39
to have the um that problem where you filling up your prompt with uh instructions and then you don have enough of a window to pump in enough code to get some good results So yeah let just have a tiny look
10:56
at what it's telling us about Markdown. So somehow it's decided that this is unused code. The public
11:08
API v3 section is no longer relevant. I can tell you the public API v3 is pretty relevant in
11:14
codacy. But it's picked up some unnecessary white space unfortunately because it can't get the line
11:19
number right we don't know where the unnecessary white space is. Never mind. So today it seems like
11:26
it's decided that this is code and is trying to treat it like code
11:32
rather than necessarily thinking that it's marked down. Oh well, never mind. So let's
11:44
flip back to my presentation. I tried to come up with a name for this, CODIS-AID
11:54
I don't know. Yeah, so what do we get inconsistent results, incorrect results
12:01
We can't necessarily put enough context in for it to make sense. We don't necessarily get enough output
12:08
So something you didn't see is that the code behind the prompt to open AI asks it to limit itself to only the top 10 issues that it finds in any particular 60-line chunk
12:21
But actually, there could be more than 10 issues, right? but you don't know how many you're going to get
12:26
so you can't optimize the prompt size to send you back absolutely everything
12:31
and so there's potential again as it's set up now that you might actually miss issues
12:38
You could do something cleverer again by breaking it into functions and trying to pump the entire function context in
12:43
for each call. Again as someone who's involved in code quality I know there's a lot of code
12:50
where you're going to have functions that are more than 60 lines
12:55
And so if you've got a 100, 200 line function, I mean, A, that's a code smell by itself
12:59
But at the same time, your AI static code ysis is going to struggle
13:04
It is still kind of fun, though. It is kind of fun to play with these toys
13:10
But there is maybe a different way that AI can help us
13:15
because, you know, AI is always very insouciantly confident about what it's doing
13:23
It will always give you results, whether there are results there to give or not
13:26
It will always try to do something. Well, that means we can do something different
13:32
So we've been messing around with this with codacy ourselves in the principle of automatic fixing
13:39
Here's a little screen grab and I'll go and find some in a sec
13:44
I've closed my browsers without going to program my history, unfortunately. But we can take rule-based static code ysis failures, and then we can actually get AI to do its very confident, not necessarily always correct job, of describing what the error is to the developer with a bit of context, because we don't just shoot it with a line with the issue in, we'll put a few lines either side
14:13
we can get a nice description out and even suggest what you might change it to
14:20
now even this suggestion here has a mistake in it right so great it's to
14:26
changing the double equals into a dot equals method which is great but it's
14:33
also removing all the white space off the beginning of the beginning of the line
14:38
and that's something that we're working on fixing right now but you can see that
14:42
doing something pretty useful there. This is taken on GitHub's suggested fixes
14:49
mechanism which in OpenPRs then you can just press a button to say like commit that suggestion
14:54
which is pretty cool from GitHub. Yeah so we're not going to try and do static code ysis with AI
15:01
because we kind of proved that it doesn't actually work that well and it's expensive as
15:07
well I haven't mentioned that so far but each time I press that button there was actually like whole numbers of cents
15:15
flying out of a bank account for that sort of crummy static ysis on one file
15:20
If you extend it across an entire repository you're going to be spending fifty dollars for a pretty miserable output and sure it will get better but fundamentally I think the commenter
15:30
said that everyone's trying to use an AI shaped hammer on a nail is bang on. Can I say bang on
15:38
because actually if you look at what the problem you try to solve is really you're running a bunch
15:47
of regular expressions which have pretty well-defined constraints on them and that's
15:52
giving you output what you're trying to do is use a fantastically complicated like Markov gene
15:57
Bayesian probability large language model with you know billions of lines of similar code put in
16:04
in there to try and infer how a regex should work So like the rate for me the regex is always going to be cheaper and faster You saw how long it took to actually produce some output for that markdown file
16:19
Well, actually, if you run markdown lint on that file, you can run markdown lint on the entire repository in like a few milliseconds
16:29
You know, so AI is definitely not winning on speed or cost at the moment
16:34
And again, maybe it will get better. you know maybe there's a feature where just everything is AI because it's so cheap the cost
16:40
is irrelevant anyway yeah yeah so I've got a few examples of in fact we've got some C-sharp ones so
16:48
let's maybe look at those since they're relevant so this is one of our sample repos where we've
16:54
tried to put in some issues and then ask the AI to describe what the fix should be right so here's
17:05
a medium error prone issue I haven't looked at the C-sharp one before I've looked at the C-sharp one
17:11
and the entire script one this is asking it to put in a static to define this class
17:22
I'm not sure why it is that doesn't have a structure it's not marked as static so presumably
17:28
if it doesn't have a structure it should be marked as static so there you go now we'll try and stick
17:33
that in. And the same again. And it's not a great example on this plus one. There's a
17:46
few where it's actually trying to text if you're looping from inside the loop and define
17:58
it afterwards yeah here we are this one which I mean this is listed as a minor code style issue
18:04
but actually reducing I think this is actually fairly serious because you could wind up using
18:09
j in a separate for the further downside function not realizing that it just wound up getting to
18:17
some value before you start moving inside the loop means that you've set the start so you can
18:26
do things like that. So I thought that was a nice one. Again, we're still not doing the indentation
18:32
properly, but I think that is the the most easily fixable part of this problem here, right
18:49
So initialization, it's assigned a value that's never used, so let's not initialize it because
18:54
you don't want to do that. There's another nice one about not messing around with, this might be
18:59
the TypeScript one, don't change the... here's a nice, like, on the one hand this is great, this is like
19:09
it contains a magic number and we explain what magic numbers are, magic numbers make the code less
19:13
readable and less maintainable, it's unclear what the number represents. To fix the issue, assign it to
19:18
constant with a description of a name. Great. All great stuff. Except, of course, the AI is not clever enough to tell you what that five actually represents, right
19:28
And I think probably because this is in FizzBuzz, that's perhaps an unfair challenge to ask of the AI, right
19:36
Because probably the fact that it's five is entirely arbitrary, and so assigning the constant to be five is possibly AI having a laugh at us
19:48
Maybe if it had context of what the problem was with better variable names, it would actually suggest a better constant name
19:54
But possibly it wouldn't. I don't know. But even so, like solve magic numbers, perfectly valid use case
20:04
There's one in here which is to do with changing the value that's in the argument
20:14
Again, it's really nice. it's like don't if you've got an input that includes n then don't actually
20:25
reassign the argument, assign it to a new variable. I thought it was in here somewhere and it gives you a very nice like autocorrect for it
20:38
And for any of these things again this is GitHub's feature not ours then you can just
20:43
ask GitHub to just commit that suggestion right now. So you could just press the button
20:49
I'm not going to press the button, because that's going to mess around with someone's dev work at work
20:55
But that means that you've now got a way where any of the suggestions
20:58
that your AI does come up with that you do like the look of, you're actually only two clicks away from actually implementing it
21:04
And in terms of making developer micro-cycles faster than just going, yeah, that's good
21:12
Yeah, that's good. Yeah, that's good. no, I'll ignore that one, no, no, I'll ignore that one
21:16
then that's pretty cool. So yeah, that was the other bit of the stuff
21:22
I wanted to show you