1 - The OLEDB provider is initiated on Server A, Server B sees it as a normal connection and query, it doesn't know that Sever A is a SQL Server, it's just a client to it.
2 - The query is executed on both. Server A passes the query to Server B which executes it, then Server A works with the results. The optimizer comes into play on both, but can't talk to the other servers.
Given a query of no complexity the performance will be equivalent. However, when you start getting into complex queries or large datasets, you will see performance gains by fetching just what you need from Server B into a temp table and then using that with the data contained on Server A. (Assuming you need to mix and match data across servers). Essentially, with the OPENQUERY syntax, you pass the query as a whole to server B (hopefully including a where clause) which then returns data to Serer A. With the four part naming, you have no control and the optimizer rapidly will decide to just fetch all of the data in the table from Server B and then eliminate based on other criteria (joins, where, etc.).
3 - The result is not cached (except as SQL caches data normally). Unless you store the results from the linked server query into a temp table somewhere.
4 - If you are familiar with how applications connect to a database and execute queries then the process is very similar. Except for authentication differences it's the same, just that it allows for easily mixing and matching data from multiple servers into a single result set. But your performance will depend greatly on how much data is pulled from Server B and staying aware of the optimizer's limited ability to prevent you from pulling the entire table from Server B, even if you only ended up needing 5 of those rows.