42

I'm not talking about a diff tool. I'm really looking to see if a project contains code that may have been "refactored" from another project. It would be likely that function names, variable names and whatnot would be changed. Conditionals might be reversed, etc.

8 Answers8

12

When I was teaching software engineering, I used the (free) service at Stanford called MOSS (Measure of Software Similarity). This allowed me to detect plagiarism between student projects very easily. The system also allowed me to enter "known good" code examples that I had used during class that were to be ignored.

The great thing (completely a side issue) about the results that came back were that we could tell which students worked together --- even if they didn't blatantly copy the code, they discussed the problems enough that their code was similar. The sad part was finding the odd student with NO SIMILARITY to any other code. They usually didn't do so well.

Peter K.
  • 3,818
  • 1
  • 25
  • 34
8

You might be able to use the PMD tool to find what you are looking for. It is meant to detect cut and paste within a code base but if you include the suspected origin project source it might help you see where code was copied from it.

busyspin
  • 125
5

The closest thing I know of to what you are looking for is Clone Detective. It is a Visual Studio plug-in.

Clone Detective is a Visual Studio integration that allows you to analyze C# projects for source code that is duplicated somewhere else. Having duplicates can easily lead to inconsistencies and often is an indicator for poorly factored code.

epotter
  • 2,846
  • 26
  • 27
4

It sounds like you want to compute the difference between two abstract syntax trees (AST), so you might be interested in the Smart Differencer tool.

Found on https://stackoverflow.com/questions/974855/eclipse-abstract-syntax-tree-diff.

1

Even if you're not talking about a diff tool, you can still use one for this, to a certain extent at least. If I see two sections of code that look similar, for example, I frequently paste both into BeyondCompare to see how much work it would be to simplify it by refactoring the common functionality out.

On the other hand, if you don't know where the similar code is, but you're just wondering if any exists somewhere... what are you looking for? An automated tool to detect plagiarism? I'm not sure any such thing exists.

Mason Wheeler
  • 83,213
1

This article on wikipedia on the subject also includes links to several tools that can be used to find similar or duplicate code. We have an internal tool for this, so I'm not familiar with the external tools mentioned in the article.

Alan
  • 2,889
1

What you really want to do is see if there is code cloned (copied) across the two projects (both projects consisting of possibly large sets of files). You can do this by running a clone detection tool. Wikipedia lists a variety of them.

To decide grossly if there is lot of copying, you only need to match source lines, and there are a variety of exact source-line clone detectors out there. I believe PMD is one of them. What these won't do is find code that is copy-paste-edited; they will find boilerplate copy-paste-unchanged code likely wrapped around the copy-past-edited stuff.

If you want to see the details of the copying for copy-past-edit code, you need a clone detector that finds "parameterized" clones. Token based detectors do this for edits which replace just variable names or constants.

Abstract-syntax tree (AST) based detectors do this for edits involving larger chunks, such as expressions, statements, insertions, deletions, et. These latter tend to give better answers, because unlike the token detectors, they can use the language structure of the computer source code as a guide.

Our CloneDR tool is such a detector.

I don't know of tools that will actually find "equivalent" code (reversed conditionals), etc. Researchers have built clone detectors that do something like this, but the combinatorics make this very expensive to execute, and the research prototypes scaled poorly.

Ira Baxter
  • 1,930
1

I really like how CCFinderX visualizes similarity, so you might want to check that one too. Supports quite a few of languages, it's free and fairly easy to setup (Python 2.6).

MaR
  • 710