I'm fairly new to working with databases, but I've currently have a webCrawler that I've written in C# using MYSQL database. The crawler is frequently writing and deleting records from the database as sites are scraped.
Each record has primary key, which is MD5 checksum for the URL, to insure that no table has two duplicate entries.
Currently is it good practice to do a check on table before inserting the database to see if there is a duplicate. I.e. two operations on the database.
Or is it sufficient to add the record and let the database gracefully fail to add it.
The same question is relevant for other operations like deleting etc.
At the moment I'm trying to operate on 1000's of records a minute from a single client (multiple connections from that client). Does the answer change knowing that level of database activity.
Also it fairly frequently that there will be duplicate. and the add code will be skipped. Say every ten adds, there is one new record.