4

I am not sure if this is the best place to ask this question, but I was looking at a Wikipedia article and noticed there were a lot of edits for that article. Since you can view each edited iteration of the article, I figured the amount of space those web pages take would add up. There are millions of articles and Wikipedia doesn't seem to get rid of vandalized edits. Alternatively, I was thinking that Wikipedia could keep the base article and use the edit history as instructions for how to edit the base article.

My question is, as a person with a limited background in programming, how does a site like Wikipedia store previous edits of pages? Do they store each page, are each edit instructions to modify a base article, or is there some other concept used?

Or, I guess, is the data storage used so minimal, that this isn't even an issue?

2 Answers2

5

The software that runs Wikipedia is called MediaWiki, and you can see in its documentation that it indeed keeps the full text "wikitext" of each revision, though it may be compressed or stored in a separate database.

Alternatively, I was thinking that Wikipedia could keep the base article and use the edit history as instructions for how to edit the base article.

This is possible but would be unneccessarily complicated, since:

Or, I guess, is the data storage used so minimal, that this isn't even an issue?

Yep, you nailed it. This is text, it really doesn't take much space at all compared to images or videos, and nowadays storage space is ridiculously abundant and cheap.

0

Some of it is stored as plain text in the database, some of it is stored in a database in concatenated gzip format, or as compressed diffs.

Tgr
  • 111