Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On High Performance with Go

You're reading from   Hands-On High Performance with Go Boost and optimize the performance of your Golang applications at scale with resilience

Arrow left icon
Product type Paperback
Published in Mar 2020
Publisher Packt
ISBN-13 9781789805789
Length 406 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Bob Strecansky Bob Strecansky
Author Profile Icon Bob Strecansky
Bob Strecansky
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Section 1: Learning about Performance in Go
2. Introduction to Performance in Go FREE CHAPTER 3. Data Structures and Algorithms 4. Understanding Concurrency 5. STL Algorithm Equivalents in Go 6. Matrix and Vector Computation in Go 7. Section 2: Applying Performance Concepts in Go
8. Composing Readable Go Code 9. Template Programming in Go 10. Memory Management in Go 11. GPU Parallelization in Go 12. Compile Time Evaluations in Go 13. Section 3: Deploying, Monitoring, and Iterating on Go Programs with Performance in Mind
14. Building and Deploying Go Code 15. Profiling Go Code 16. Tracing Go Code 17. Clusters and Job Queues 18. Comparing Code Quality Across Versions 19. Other Books You May Enjoy

CUDA – powering the program

After we have all of our CUDA dependencies installed and running, we can start out with a simple CUDA C++ program:

  1. First, we'll include all of our necessary header files and define the number of elements we'd like to process. 1 << 20 is 1,048,576, which is more than enough elements to show an adequate GPU test. You can shift this if you'd like to see the difference in processing time:
#include <cstdlib>
#include <iostream>

const int ELEMENTS = 1 << 20;

Our multiply function is wrapped in a __global__ specifier. This allows nvcc, the CUDA-specific C++ compiler, to run a particular function on the GPU. This multiply function is relatively straightforward: it takes the a and b arrays, multiplies them together using some CUDA magic, and returns the value in the c array:

__global__ void multiply(int j, float...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image