Another solution used by the inception authors to reduce the number of parameters was to use an average-pooling layer instead of a fully connected one after the last convolutional block. With a 7 × 7 window size and stride of 1, this layer reduces the feature volume from 7 × 7 × 1,024 to 1 × 1 × 1,024 without any parameter to train. A dense layer would have added (7 × 7 × 1,024) × 1,024 = 51,380,224 parameters. Though the network loses a bit in expressiveness with this replacement, the computational gain is enormous (and the network already contains enough non-linear operations to capture the information it needs for the final prediction).
The last and only FC layer in GoogLeNet has 1,024 × 1,000 = 1,024,000 parameters, a fifth of the total number the network has!