 Hi everyone, my name is Ketan Date and I'm a postdoctoral researcher in the department of industrial enterprise systems engineering at the University of Illinois at Urbana-Champaign. Today I'm going to talk about our research project titled Collaborative CPU plus GPU algorithms for triangle counting and trusty composition. This project was done in response to the Amazon DARPA IEEE graph challenge that was hosted recently by MIT Lincoln Labs. This project was a collaborative effort between the faculty and students from the department of industrial engineering and the center for cognitive computing systems research that is C3SR in the department of electrical and computer engineering at UIUC with support from the Watson Research Center at IBM. The team members in this project including myself are Professor Rakesh Nagi from ISC, Kevin Fang, Professor Wenmei Hu, Professor Namson came from ECE and Jinjun Zhuang from IBM. The objective of this competition was to design algorithms which can perform triangle counting and trusty composition in large graphs or networks. Triangles and trusses represent the most fundamental substructures in a network. A triangle is defined as a cycle of length 3 while a truss is defined as a non-trivial subgraph in which each edge is connected to at least k-2 triangles. Triangle counting task is to count all the triangles in the given network while trusty composition task is to find all k-3's for k-2. Both tasks are extremely important in graph analytics for calculating various metrics that describe community structure in the given network. Our solution for this competition was to design collaborative algorithms that use both CPU and GPU threads for triangle counting and trusty composition specifically targeted for the IBM Minsky platform. Through this project, we also benchmarked and compared the performance of two-memory management schemes, zero-copy memory and CUDA unified memory. We used the Minsky machine that was given to the C3SR group by IBM. It contains two Power8 CPUs with 80 cores each and four NVIDIA Tesla P100 accelerators built on the Pascal architecture and connected by the N-Willing Interconnect. This evolutionary architecture makes Minsky extremely amenable for big data analytics. We tested our collaborative algorithms on various networks and compared the performance with the baseline sequential implementation provided by the graph challenge organizers. We found that the collaborative algorithms achieved 28x speed up on average for triangle counting and 165x speed up on average for trusty composition. We also found that zero-copy memory management scheme is on average three to four times faster than the unified memory management scheme. Our paper received an honorable mention amongst the graph challenge competition participants. Future directions of research are to solve larger problems by applying graph partitioning approaches and optimizing the memory access part patterns for these two algorithms.