Final Report
Overview
The final written report for the project will be delivered to me and to your liaison. I can provide a color-printed copy for you to distribute to the liaison and for their offices. The final report is to be created in R Markdown
and knit as a Pdf or Word document. An example starter file can be found here. You are not to write the document in GoogleDocs, or other sharing platforms. The course is designed for you to acquire new skills and build confidence in those skills rather than support other skills. Investing in yourself is what will get you internships, jobs, and leadership roles in labs and lab manager roles in graduate school. Thus, the report is to be put in R Markdown
and maintained in the remote repository. Because RStudio
does not have grammar check capability, you are free to create your own personal content in a Word Processing software like MS Word before you add the content to R Markdown
. However, I advise you change the font in that working file to sans serif font because R Markdown
may get confused with certain characters (e.g., serif font apostrophes, etc.) and you may need to fix these issues later. After running your grammar and spelling check, then add the content to the .Rmd
file. If you do work in a separate file, you will need to make sure to save this file to the project /report
directory and ensure you push that file to the remote repository. Also, make sure you use your initials in the file name so that team members don’t all have the same file names.
A code lead should take on the responsibility to save the file to /report
the team project, add
, commit
, and push
the file to the remote repo on GitHub. When the team works on this report file shared with others on GitHub, I recommend two different approaches:
The team agrees to request adding changes to the document on a case-by-case basis. The PM oversees the process and controls delegation. A member requests permission from the PM to edit the file through the team Discord channel. The PM approves the request and broadcasts to the team on Discord that Member A will issue a
pull
request to update the local file with the remote file. Afterpull
ing the changes, Member A will (1) update the file locally by adding their content, (2) save the file on their local system, (3) knit anHTML
version to ensure there are no errors in knitting and so that others can review your edits without actually opening the.Rmd
file, (4) and then exit the file so that it is not active inRStudio
. Then, they will (5)add
,commit
(with a description), andpush
the changed file to the remote repo on GitHub. Once complete, Member A broadcasts to the team on Discord that they have pushed their changes. The PM asks to verify (just to double check) that Member A has actually exited the file. Once verified, the PM broadcasts that other team members can request from the PM permission to edit the file. Do not edit without permission from the PM.The team agrees upon a contribution schedule (dates, hours) and the PM creates and controls delegation of who will be working on which section and at what time. The PM broadcasts to the team on Discord that Member A will issue a
pull
request to update the local file with remote file. Afterpull
ing the changes, Member A will (1) update the file locally by adding their content, (2) save the file on their local system, (3) knit anHTML
version to ensure there are no errors in knitting and so that others can review your edits without actually opening the.Rmd
file, (4) and then exit the file so that it is not active inRStudio
. Then, they will (5)add
,commit
(with a description), andpush
the changed file to the remote repo on GitHub. Once complete, Member A broadcasts to the team on Discord that they have pushed their changes. The PM asks to verify (just to double check) that Member A has closed the file. Then the PM broadcasts that the file can be edited by the next team member according to the schedule agreed upon. Do not edit without permission from the PM.The team decides to work in separate files and create copies of the main
.Rmd
document that include team member’s initials as a suffix to the filename. The team delegates the sections to be written by specific team members. Each member edits their file, saves it, and ensures they can knit anHTML
version of the file. If successful without error, then, they andadd
,commit
, andpush
changes to the remote so. Team members can view theHTML
version without actually opening your file. Although this approach may appear easiest, it presents other problems which is a reason why I recommend it as a third option. A major limitation of this approach is with integrating work. Arranging, rearranging, revising work is more difficult when you are dealing with multiple files. Weaving pieces into one another can be difficult. Team members who need to integrate sections will need to identify where content will be inserted (e.g., between what sentences, etc.). This is a common problem with working with separate files whether by yourself or with others. Also, pieces that are in one file may be overlooked and not incorporated. Problems mentioned here as well as others does not make this approach the most convenient in the long long. Moreover, this approach does not involve as much communication between team members, which all of you know is always essential to successful teamwork. Options #1 and #2 ensure this communication.
Note: Remember that with a version control system like Git, you always have access to all versions of the files pushed to GitHub. You can always revert to a prior version. I can help you with this if you get to this point. Good messages in your commit
s will be helpful here.
Keeping the Repository up to Date
Whatever your approach to managing versions of the report, you will need to update the repo weekly with your contributions. Updates should be done as part of you weekly team meeting to ensure it’s completed.
Sections of Final Report
- Title Page (page1)
- Abstract (page 2)
- Table of Contents (page 3)
- Acknowledgments (page 4)
- Introduction (approximately 1-2 pages)
- Data (approximately 1-2 pages)
- Results/Findings (approximately 8-10 pages including visualizations)
- Discussion (approximately 1/2 to 1 pages)
- Conclusion (approximately 1/2 of page)
- References (last page)
Abstract
The abstract provides a main summary of data, problem, methods, and key findings.
Acknowledgments
People to acknowledge for the project. Certainly acknowledge the liaisons, the organization for whom the liaison represents, CMS Athletics, the college you represent for the opportunity, etc.
Contents
A contents pages, or table of contents, provides a listing of the document sections and subsections as we as a page for location.
- Title Page
- Abstract
- Table of Contents
- Acknowledgments
Chapters:
- Introduction
- Data
- Results/Findings
- Discussion
- Conclusion
- References
Table of Contents
Insert a table of contents for the chapters and sections.
Chapter 1: Introduction
This chapter would be approximately 1 to 2 single-spaced pages. Here, you will describe both the organization and the project. Respectfully, start with the organization for whom you are completing the work and only after discuss the project.
Some key details to address include:
Client Context: Provide an overview of the client’s business and emphasize how data visualization will assist in addressing their challenges or needs. Who are they? What do they do? What is their relevance/importance?
Project Description: Next, provide a description of the problem that you worked out with your liaison. Make sure that the problem is stated clearly and is detailed enough to be evaluated and replicated. Why is the problem important to address? Clearly define the project’s goals, emphasizing the utilization of data visualization techniques to extract insights and patterns from the data.
Scope: The project scope defines the boundaries, objectives, deliverables, and tasks that need to be accomplished to successfully complete a project. The scope helps define what content will and will not be included in the project. Detail the project’s boundaries, specifically highlighting the use of data visualization methods for analysis and understanding.
Methodology and Approach: After describing the nature of the problem, explain how you set out to address the problem. Explain the key sporting event of interest, time period of interest, athletes of interest, etc. Explain the variables of interest used to help answer problem. If the problem was broken down into smaller parts or sub-goals, describe them in the order you plan to discuss the data in the data chapter.What are the means by which you are solving the problem. For this project, you are not fitting mathematical models to data but focusing on visualizing data to communicate your findings in a descriptive sense so make sure you are clear about this detail.
Tools and Techniques: Highlight the specific visualization tools, software, or libraries used and explain their relevance in uncovering insights from the data.
Chapter 2: Data
This chapter would be approximately 1 or 2 single-spaced pages. In this section, you would provide a description of the data, its source, format, etc.
Sources of Data
What was the data source/where did you obtain it from? Include the source URL of the website from which you accessed the data. Include information about where and how the data were collected or obtained. Specify whether the data were obtained from internal databases, external sources, or gathered through specific methods (surveys, sensors, web scraping, etc.).
Data Characteristics
Discuss the data in detail. In which format was the data stored? How many cases were there in total? How many variables were contained? What variables were contained? What were the key variables you used?
Describe the types of variables present in the data set (numerical, categorical, text, etc.). When discussing variables of the visualization in the results chapter, make sure to provide clarity about the variable, its metric, and reason for using that variable (e.g., mean, max, median, mean of max values, median of max values, dispersion measures, etc.).
List and briefly describe each attribute, feature, or variable in the data set, paying special attention to those used for the project.
Data Quality and Data Preprocessing
Describe the steps taken to clean and prepare the data for investigation. This description may include removing duplicates, standardizing formats, trimming, correcting inconsistencies, etc. Explain any criteria used to select variables or features for visualization, focusing on those with the greatest impact or insight for the organization’s understanding.
Some key details to address include:
Missing Values: Explain the presence and treatment of any missing data. Explain how missing values were handled during analysis (removal, replacement, etc.).
Outliers and Anomalies: Mention any identified outliers or anomalies and how they were addressed (treatment or exclusion).
Variables Created: Describe the variables created, their units of measurement, etc. Explain if any normalization or scaling procedures applied to create the variables and to ensure data consistency and comparability across measures.
Also, specify where the cleaned data may be obtained.
Data Limitations
Highlight any limitations or constraints of the data set that affected the team’s ability to address the initial problem. Similarly, describe how any limitations might affect the interpretation of the findings.
Chapter 3: Results/Findings
This chapter would be approximately 8 or 10 single-spaced pages, including data visualizations. In this section, you will tell your story about what you have discovered in the data, making sure that the findings are relevant to the project description/proposal. You will summarize the insights and patterns discovered through data visualization, emphasizing key visual representations that reveal significant information. Your goal is to communicate relevant patterns, trends, relationships or or anomalies within the data set.
This section should be organized logically for telling your story about data. Consider the parts of the story and how to organize those sections appropriately. The organizational structure can be based on importance, hierarchy, impact, etc. To address this organization issue, first, outline your subsections. Second, outline the written and visual content relevant for each subsection. If some findings depend on other findings, unpack the subsections in an order that facilitates understanding. Similarly, do so within sections. Construct the message and then discuss ways to organize more effectively to: (1) state the facts; (2) build up a hierarchy of elements (e.g., athletes to teams; meets in a season, quarters in a season, season within career); (3) break down elements (e.g., teams to athletes; years, quarters or seasons, meets in season); (4) address from least controllable aspects to most controllable aspects (e.g., admission, coaching strategies, etc.); (5) create excitement with rising tension or significance. In other words, this section’s organization should serve a purpose but the organization is up to you. You should not just present the data in an order based on order of authorship. Related to authorship, consider a way to organize authors. You may even wish to explain why the order what chosen as some people may infer something from the order.
Create subsections where relevant. For each section, introduce the topic or problem and then unpack it.
Example:
3.1 Topic/Question A
Introduce the topic or question of interest. Describe the data used to answer this question. Be clear about the metrics used and why those metrics were chosen to answer the question. For example, why the mean? Why the max? Why the median? Why a z-score? If the story involves a comparison of metrics, explain why they are needed and then determine the order to present them. Although the data visualizations should be able to stand along, provide some description of it and direct your audience to particular comparisons or areas of importance within that description. Make reference to the figure (the data visualization) in this text. Insert data visualizations at appropriate locations (trying to ensure that a visualization is referenced and appears on the same page).
When discussing the visualization, organize sentences such that the predictor precedes the outcome variable as the outcome is contingent on the other variables anyway; order in terms of cause and then effect. Interpret data in terms of the direction of comparisons. Do not state that there appears to be a “difference” when that difference is due to an ordering of data. Just describe the finding in terms of the direction of the data. Don’t make your reader work more than necessary. Write using the active voice. Avoid the passive voice and remove references to “it”, “this”, “these”, etc. that otherwise provide ambiguity to the reader trying to resolve the sentence. “It has been shown” lacks clarity. What does “it” refer to? Sure, the reader may be able to infer what you mean if there is only one noun in the previous sentence. Your writing, however, should not require them to work harder than you did when you wrote the sentence; just state the reference “the increase in variable x …”. Don’t make your audience guess what you mean and potentially infer erroneously. Such errors are a consequence of your lack of clarity, not the consequence of them interpreting incorrectly. To avoid these errors, read you work aloud after sufficient distraction or have peers who don’t understand the project read your writing aloud and see where they stumble or where they are unable to explain what you are talking about. Make sure that your figures have titles, captions, a reference to your social profile and the course PSYC167: Data Visualization. The data source may not necessary if you explain this in the report but if the image is placed elsewhere, like on the Internet, then having the source would also be relevant in the caption.
3.2 Topic/Question B
3.3 Topic/Question C
3.4 Topic/Question D
If you have exploratory findings that are worth sharing that are beyond the scope of the main project goal, you can add then in a sub-section.
Chapter 4: Discussion
This chapter would be approximately .5 or 1 single-spaced page. In this section, remind the audience of the problem that the project set out to address and the approach by which you addressed it. The previous section on research findings provided a description of the findings. In this section, you explain those findings more broadly and explain how they may relate to the entire problem.
Chapter 5: Conclusion
This chapter would be approximately .5 to 2 single-spaced pages.
Your project report should contain some messaging of a conclusion which summarizes the results you have discovered. Recap the main achievements and the value derived from the use of data visualization in understanding the data. Highlight the most significant visual findings or insights that hold relevance and importance for the client. Propose potential actions or strategies based on the insights obtained through data visualization for further exploration or implementation. The conclusion should provide some action for moving forward. What should your client, organization, or liaison consider moving forward? Should they focus on further development of their current trajectory or direction? Should they consider aspects that have been ignored or that that differ from current current trajectory or direction? Should the organization consider whether to triage or abort some program, direction, venture, or experiment? Consider discussing other methods, statistical perhaps, to further address the problem in the future. When doing so, however, make sure to explain why those statistical methods are relevant and what they might inform.
Chapter 5: Challenges and Limitations
Present any important challenges or limitation in this optional section. Discuss any limitations or challenges encountered during the project that may affect the interpretation or application of the findings. Propose potential ways to address or mitigate these limitations in the future.
Appendix
If necessary.
References
A list of references.