I imagine that I need to pre-proccess it in a way and simplify it that white space is the free space, walls are black lines and gray unknown. Then point to those couple of files (pgm and yaml) map server and continue with localisation and so on.

You got the idea right. Regarding the question about the yaml, it is quite simple: the origin is up to you, and the resolution is the size in meters of a pixel of your image.

However, beware when using the approach you propose. In a real situation, what the robot "sees" with its sensors is quite different from a perfect map taken from the floor plan. So, even though you'll have a nice map, you may have trouble localizing the robot there due to mismatch of what the robot is expected to "see" and what it is actually "seeing".

An example would be glass walls, which would be in your floor plan but are invisible to a LIDAR.