Adventures of a Techie Academic with Lightweight DITA (LwDITA): Conversation with Carlos Evia
Article for Discussion
“Structured Authoring without XML: Evaluating Lightweight DITA for Technical Documentation” in the Technical Communication Journal (63.1, Feb 2016). You can view the article on STC or Ingenta. (If you don’t have access, email Carlos ([email protected]) for a copy.) This article received a Distinguished article of 2017 STC award.
TJ: Let’s start by getting to know you a bit more. I was reading another academic article the other day and came across this quote from a person in one of the article’s studies:
I pay a good bit of attention to Carlos Evia at Virginia Tech, who’s one of the few people that’s doing more techie stuff on the academic side…. (Toward Multidirectional Knowledge Flows)
Are you known as a techie academic? What do you do with tech that has led to this reputation?
CE: I think the label of “techie” has been with me mainly because I came to the field of technical communication via a master’s degree in computing systems. I have always seen technical communication products and projects related to some form of technology (for a while it was computers, and then it moved to mobile devices and other wearables). That label can also respond to my affiliation with one of the “camps” in academia that sees tech comm related to information technology and social sciences (other camps see it closer to the humanities, rhetoric, cultural studies, etc.). As part of a generation of tech comm scholars that had role models in people like Clay Spinuzzi, Mark Zachry, or Dave Clark, I can see myself on that camp of “techies.”
Another factor is the experimental nature of my courses, where my students have tried over the years different platforms, applications, and approaches for content development and delivery. In conferences like the Summit of the Society for Technical Communication, I would go to visit the vendors and ask for demos or trial licenses that I would bring back to the classroom. Thus, there is a “techie” component in my courses that can sound adventurous but has worked well for me for the past 15 years.
TJ: I know you’re heavily involved in the lightweight DITA (short name, LwDITA) effort, and you’re writing a book on this and have other more current efforts. But let’s ground this conversation in an article you co-authored with Michael Priestley about LwDITA in 2016 when you were still proposing and evaluating LwDITA and introducing it to the academic and practitioner community. The article I’m referring to is “Structured Authoring without XML: Evaluating Lightweight DITA for Technical Documentation” in the 2016 Technical Communication Journal.
Sam Dragga, the journal editor, summarized the article as follows:
Carlos Evia and Michael Priestley propose a new direction for the field. Theirs is a feasibility study of using HDITA (i.e., a lightweight version of DITA using HTML5 tags and attributes) to simplify the authoring process for technical documentation. Evia and Priestley test their hypothesis on a class of student writers and report positive findings, both in attitudes about HDITA and in evaluation of the projects created using it” (Visions and Directions of Research).
Can you tell us why you’re interested in DITA in the first place? Isn’t it a bit odd for an academic to focus on DITA? I mean, are there any other academics with similar DITA interests? For an academic to focus on more techie things like DITA, do you lose credibility in the academia?
CE: I can’t give you a list of academics who use DITA because I simply don’t have that information. Furthermore, are we talking exclusively about four-year colleges or can we also include community colleges and academic certificates? For example, the Technical Writing and Communication certificate at the University of California, Santa Cruz Extension has been teaching DITA for many years. As anecdotes, I can tell you that Bill Hart-Davidson, from Michigan State University, and Joyce Locke Carter, now at the University of Arkansas at Little Rock, were covering DITA, structured authoring, and content reuse way before me. Actually, it was in a graduate course taught by Joyce at Texas Tech University when I first heard about DITA.
Like I told you before, I came to technical communication from a computing systems background, so my main focus was to develop (I was a very bad programmer) a tool that would automate some of the human processes in a documentation project. At the time, I was trapping all my content in SQL tables and writing web scripts to filter and reuse specific sections. Thus, when I heard about DITA I knew that it was something I had to explore. When I joined Virginia Tech as faculty, DITA was one of many of the approaches I explored in my teaching, but it became central to my research through a project I was conducting with funding from the National Institute for Occupational Safety and Health. With a team of collaborators in health sciences and building construction, we were creating training materials on topics of workplace safety for Spanish-speaking construction workers. Originally, I was in the team to conduct participatory design sessions with workers and to lead a multimedia production effort. However, we realized that the content we were developing for the project could be reused in similar training events on new topics and for workers who spoke English or another language. Our content needed structure and methods for reuse and single-sourcing. Since I was already using DITA in the classroom, it made sense to implement it in the project.
Once we were done with that project, I still had more work to do with my DITA specializations and templates, so I decided to make that a core component of my research agenda. I focused on ways to simplify the DITA authoring experience and to make the standard more attractive for non-English speaking authors … and that takes us to Lightweight DITA (LwDITA), which we can talk about later.
Now, you are going to have to ask some of my colleagues if this focus has had any effect on my credibility as an academic ;-)
TJ: I don’t want to dive into too many details around LwDITA or DITA, as that’s not really the focus of these academic/practitioner conversations. Readers can check out some other posts for more information:
Where DITA is Now and Where It is Going: Lightweight DITA and DITA 2.0 – How to Create and Deliver (Intelligent Information blog)
Lightweight DITA: I’ve seen the light (Larry Kunz)
I also don’t want to be drawn into the structured versus non-structured authoring debate, as that’s also outside the purpose of this conversation.
DITA has been one of the most significant and shaping forces in tech comm in the past decade. Yet almost no academic articles have focused on DITA besides yours. For example, a search for DITA in the Technical Communication Journal archive yields very few results. Yet whether to implement DITA or not presents huge decisions in companies that require massive time commitments and costs. Another question is whether writing under the DITA model yields better or worse outcomes (better from consistent re-use, worse through information fragmentation). These are questions that thousands of practitioners have had. Why are academics (other than you) mute on this topic? Why hasn’t DITA been the focus of more academic research? Is this an example of why so many practitioners feel academics are out of touch with issues that practitioners face in the workplace?
CE: I don’t think academics have been necessarily mute about DITA. If you go back to the Technical Communication archives, you can see papers talking about single-sourcing and content reuse since the early 2000s. Many of those mentioned XML and some even talked about DITA. But here I just can’t argue: there has been little research about the DITA standard conducted by full-time tech comm academics. Some colleagues at conferences would even see me and say “according to so and so’s survey, no one cares about DITA,” and others would tell me that I was focusing on vocational training, and that companies were supposed to teach DITA to new hires if that was a job requirement. I am convinced that tech comm in academia is a very diverse arena and there are tons of legitimate research questions and interests. I am here doing my thing and I have a solid group of colleagues in industry and academia who also care about it. Tech comm would be sad if we were all focusing on the same thing. At least that’s my positive view of the field, although some days I am a little more pessimistic about the cacophony of interests and ideas that in academia can be grouped under the “tech comm” umbrella.
TJ: I’m trying to understand why there aren’t more journal articles on DITA. Do journal articles frown on more technical topics because the journals try to establish tech comm’s connection and roots in the humanities? Don’t academics search for more general principles and guidelines related to tech comm that stand the test of time, rather than a specific technology? I mean, it’s only 2016 and your article on LwDITA is already somewhat dated (since the LwDITA proposal has now been accepted).
CE: That is true, and that’s why I focus on principles like structured authoring, content reuse, and making tools more accessible and inclusive. Those concepts apply to DITA but go beyond that one specific standard. If I were doing research solely on DITA (say, an analysis of the etymological origins of each tag available in DITA 1.3), my work would not stand that test of time. DITA might not be here in 2030, but I want to think that intelligent content and reuse will still be important for the new gadgets and devices that increasingly diverse and demanding audiences will use in the future.
TJ: You wrote,
“To evaluate the feasibility of authoring and publishing with HDITA, we modified the Instructions assignment of an introductory college course called Technical Writing. Students wrote blog posts during the authoring process and completed a survey on the perceived difficulty of HDITA. We evaluated the quality of HDITA web deliverables with college students from diverse technical and academic backgrounds.”
Isn’t there a danger in using students as test subjects and then making conclusions about the usefulness of a thing? For example, I can hardly imagine that these students tested the LwDITA format using real scenarios that might involve content re-use or integration into other systems. Did you reach out to practitioners for the initial study? Did you pilot LwDITA in any company with tech writers who have real needs?
CE: Let me focus on the first part of your question. In the specific case of the class that tested the HDITA experiment, the course syllabus already required a module on basic web writing and another module on authoring instructions and procedures. The students received instruction similar to what their peers in other sections not using HDITA were receiving. We satisfied those core requirements, and on top of that the students learned a little about structured authoring, content reuse, and multi-platform publishing. Michael Priestley and I wanted to evaluate if the HDITA tags would intimidate novice authors who were not experienced technical writers, and the students were the right audience for that project. We did not claim that, based on those results, HDITA would be useful in industry or with other groups. HDITA, the authoring format of LwDITA based on HTML5, is aimed at casual content creators who do not have experience in XML. The students in our study were casual content creators who did not have experience in XML. Therefore, our findings answered our research question for that specific case. We did not attempt to generalize.
Oh, and keep in mind that my co-author in that article, Michael Priestley, is a practitioner and not an academic. We did reach out to practitioners in IBM and other companies when we were planning and conducting that study. If you take a look at the email archives of the Lightweight DITA subcommittee at OASIS (by the way, all the OASIS communication is archived and open), you will see that we planned to work with a startup here at the Virginia Tech Corporate Research Center. We wanted to train their staff so they could use LwDITA in a project. However, their content needs had hard deadlines that we couldn’t satisfy. LwDITA has been under development since 2013 because we are creating it in an open format that depends on rounds of feedback and revision that involve experts and the general public. We could not have formal changes and revisions ready for a company that needed deliverables in two weeks. Of course, I still want to pilot LwDITA with a company. My colleagues Stan Doherty and Alan Houser (who are hybrid academics and practitioners … that’s an interesting strain of undecidability in the academic-practitioner binary!) are either doing some of this work now or plan to do it in the near future. Alan wants to collaborate on a project of that nature. I am definitely interested.
TJ: I know you co-chair the OASIS committee for LwDITA. Besides the classroom study, did the OASIS committee do other studies to evaluate whether LwDITA will take off? Is the committee hoping that companies who have a docs-as-code model with markdown will make the switch? What will persuade them to abandon an open-source, flexible tool that’s already working and integrated in their continuous delivery workflows?
CE: Well … LwDITA is not a multi-level marketing scheme. We, members of the subcommittee at OASIS, won’t get points for every company that we convince or recruit. Now, the vendors have evangelists who probably have that type of recruiting as a standard job function.
In my opinion, anyone using the docs-as-code model is welcome to test and/or adopt LwDITA. If they are already using GitHub Flavored Markdown, they can easily move to MDITA —the LwDITA authoring format based on Markdown. You ask: “What will persuade them to abandon an open-source, flexible tool that’s already working and integrated in their continuous delivery workflows?” Are you describing DITA? ;-) DITA, as a standard, is open-source, flexible, and integrated into the workflows of many companies in a global scale. We hope that LwDITA is a more flexible approach to creating and delivering content in DITA-like environments. DITA and LwDITA are 100% open standards. If a company is using Markdown in a specific flavor that works for its docs-as-code approach, no one will actively persuade them to abandon that workflow. Now, if the company needs to collaborate with authors or groups that use XML or HTML5, they should take a look at LwDITA, which can be represented in those three authoring formats and take advantage of the reuse and publishing features of DITA XML.
TJ: I think it’s great that you’re on an OASIS committee. To be honest, I don’t know much about OASIS. I have in mind about 8-9 people sitting around a table giving a thumbs up or down, kind of like an aristocracy or some other elite ruling class. How does a committee like OASIS arrive at a standard? Are there “standards” that these standards committees follow in order to arrive at a “declaration” about the way something should be?
CE: Oh no … not at all. I promise there is nothing aristocratic or elitist about OASIS. Quite the contrary. Let’s see … OASIS is the Organization for the Advancement of Structured Information Standards, a non-profit consortium with more than 5,000 participants representing over 600 organizations and individual members in more than 65 countries. Instead of an elite ruling class, OASIS runs on principles of transparent governance and operating procedures. Membership to any OASIS technical committee is open to everyone, leadership in groups and committees is chosen by democratic elections and does not depend on financial contribution, corporate standing, or special appointment.
In the particular case of the DITA Technical Committee (TC), all the work created by the group is visible to everyone and not password protected on the OASIS website. You can go there and see email archives, drafts, meeting minutes, and reports that date back to the committee’s creation. Additionally, anyone can comment on a standard and the TC has the obligation to acknowledge and track all comments.
Let’s look at an example through my personal involvement with the DITA TC. Like I mentioned, I used DITA for a project involving the development of training and reference materials for construction workers. As I was working on that project, I identified a problem with the DITA authoring experience: Even in software tools that did not require full XML code and presented a WYSIWYG-like environment, non-English speaking authors had problems remembering XML tags, like
<shortdesc>, that were written in English. In an autocratic system, I would have requested an audience with the philosopher king (or queen, in the DITA TC case, since the current chair is Kris Eberlein), present my case with feedback, and be thrown to the lions for questioning the authority. In the actual OASIS workflow, I could have emailed the committee and they were required by their bylaws to acknowledge and discuss my feedback. However, I decided to take it to the next level and instead I decided to join the committee as a member. I did not need a special recommendation or endorsement: I just expressed my interest to Michael Priestley, followed the OASIS guidelines, and started calling in to the weekly DITA TC conference calls.
The guidelines for OASIS committees are quite transparent and specify that any decision that can affect a standard needs to be voted and approved by committee members, the overall OASIS membership, and the general public.
TJ: The LwDITA spec is developed very differently from the grassroots model that other Markdown formats follow (e.g., the original Gruber Markdown, GitHub-flavored Markdown, kramdown Markdown, MultiMarkdown, CommonMark, and more). Do you think the top-down model (decision by committee) has as much momentum as the bottom-up model (grassroots promotion) has for adoption? Are there other standards that have originated from OASIS in a top-down way that have taken off?
CE: Ok, let’s analyze this one. You are mixing the lone-wolf approach to Markdown from John Gruber’s original and, say Thomas Leitner’s kramdown, with attempts to standardize Markdown like CommonMark and GitHub Flavored Markdown. Those are different approaches, and they can be used to show why standards like DITA are important. Gruber has said: “I created Markdown for my own use, and, well, I know the formatting rules pretty well.” That means that Gruber’s Markdown is not a standard (it’s just Gruber’s) and does not have a detailed spec. It has documentation, and those are not the same. John MacFarlane et al., in their preface to the CommonMark spec, point out that Gruber’s ambiguous language definition produced “implementations (that) have diverged considerably over the last 10 years.” They add: “As a result, users are often surprised to find that a document that renders one way on one system (say, a GitHub wiki) renders differently on another (say, converting to docbook (sic) using Pandoc).”
That is what an open standard, like DITA or the proposed LwDITA, wants to avoid. Let’s say that you decided to create your own Markdown or XML flavor: TomJML (I never said I was good at coming up with names for fake projects). Some readers of your blog love it and implement it into their work. A user wants to use Bear or Ulysses as a platform to write content in TomJML, but the unique syntax you created for, say, managing conrefs and filters is not supported in those applications. The user complains to the developers and to you. “It is all Markdown and XML, isn’t it? It should work,” she complains. To solve her problem (because you are a nice guy), you create the official TomJML app and a) sell it for $5 on the Mac App Store, or b) make it open source and set up a GitHub repo for it. The user is happy and more users adopt it. But one day you get busier and stop maintaining the app. Projects crash, people have to convert legacy files to a new version (or an old one) of Markdown. Tears.
With a standard like DITA, or what CommonMark aims to accomplish, you and other software developers can use the same rules and the users can go back and forth between apps. If you want other examples of open standards, take a look at the XML and HTML5 specs from the W3C: they both work with committees that receive input from the general public and experts, and they do not sell software. They do not sell anything.
CommonMark has a committee behind it. They describe themselves as “a group of Markdown fans who either work at companies with industrial scale deployments of Markdown, have written Markdown parsers, have extensive experience supporting Markdown with end users – or all of the above.” The DITA technical committee at OASIS is comprised of similar people (hint: they are not evil billionaires).
OASIS maintains many other standards, and you might be familiar with DocBook and XLIFF, which are also OASIS standards and have their own committees.
TJ: You wrote,
Another focus of the subcommittee is to free the lightweight specification from a dependency on XML, which allows DITA as a standard to exist across formats, including HTML5 and Markdown.”
Even after LwDITA is formally adopted/embraced, don’t we have to wait until the tool vendors catch up? It seems like tools that are built on the XML stack might have a tough time incorporating this new format (unless the same XML stack can process it). Do you worry about the feasibility of the technology to create the needed solutions, or is the role of OASIS just to formulate the standard and let tool vendors figure it out?
CE: I can’t talk about all the vendors and their adoption of a new standard, but I am going to focus on the specific case of Syncro Soft —the maker of Oxygen XML Editor. The Oxygen developers are not members of the DITA technical committee, but they pay attention to our work, participate in public reviews, give us feedback through the traditional OASIS email channels, and also by submitting pull requests and issues to our GitHub repositories. They have been early implementers of our LwDITA grammar files and have adopted the LwDITA support that Jarno Elovirta has built into the DITA Open Toolkit. As a result, their product has some solid LwDITA features.
There are other vendors who have been in touch with us and are working hard to implement the proposed LwDITA standard. The subcommittee maintains a list of LwDITA-aware tools in a wiki page hosted by OASIS.
TJ: A lot of people point the blame on tool vendors for the stagnancy of DITA. I’ve heard people say that DITA didn’t live up to its hype in part because the tools around DITA didn’t deliver. Are you depending on tool vendors to deliver on integration and transformation of LwDITA? Will they let us down again?
CE: I don’t think I agree with the stagnancy of DITA or not living up to its hype. DITA is an open architecture that has been widely adopted. It is intended for a rather niche audience of developers of certain types of content, so the standard does not have the goal of taking over the world or being the only approach for creating technical content.
Talking again about vendors, I see different levels of interest about LwDITA, and I hope that transforms into new, easy to use, apps and tools.
TJ: I find that people are polarized around DITA, either drinking its koolaid or spitting it out like poison? In your interactions with others around DITA, do you encounter a lot of similar polarized attitudes? I try not to present myself as anti-DITA, but it seems like you’re either for it or against it in the minds of many people. Why is DITA so polarizing? Can’t we refer to it with the same nonchalance as a tool like Flare?
CE: Here’s the thing: we cannot put Flare and DITA in the same basket (and some people do). Think of Flare as Dreamweaver and DITA as HTML5. Flare is an authoring software that imports and exports content formatted according to the DITA standard, so they are not really in the same category.
I do see some of that polarization, and I wish someone decided to work as a buffer agent between the DITA technical committee and the vendors to improve User Experience and overall support. OASIS has a separate committee, the DITA Adoption Committee, that works on that kind of effort.
I think that the strong anti-DITA reactions we see come from the Yelp-ization of online feedback. In the case of a tool like Flare: if you don’t like it, you don’t buy it or you ask MadCap for a refund. Many anti-DITA posts are actually about a specific software tool, and people can write a gut reaction and publish it online without considering that the DITA Technical Committee is a group of volunteers who do not (at least not directly) make money out of this kind of work.
I respect and support people’s right to complain about an open standard. I complained when the HTML committee moved from XHTML to HTML5. However, plain mean online comments like those that you save for a restaurant that gave you rotten chicken are not helpful.
I also think we need better explanations of the DITA standard versus DITA-aware tools, and that’s what I want to address in some of my research and publication work. I think every DITA user or potential user needs to read the “DITA versus DITA-OT! “ blog post by Robert Anderson, who is a very important member of the DITA TC and co-editor of the DITA 1.3 spec.
TJ: You wrote,
There are no official figures about the usage and adoption of DITA as a platform for authoring technical documentation.”
Why is it so difficult to get a realistic count of the number of people using DITA? Aren’t academics supposed to know how to figure this info out through sampling and statistical analysis? I know Keith Schengili-Roberts did an analysis of Linkedin skills to make an inference that DITA was being used by these authors, but that seems a bit suspect to me as a way of establishing DITA’s usage and adoption.
CE: Let me ask: do we have similar numbers for HTML5? Do we know how many companies are using HTML5? Also, are we talking about companies on a global scale or just in the United States? Collecting that kind of data in a mix of corporate and non-profit organizations using DITA would really be difficult … and … what’s the gain? What are the big societal or even disciplinary problems that such a census would solve?
TJ: Someone recently approached me to ask if there has been any research done around the viability of Markdown as a sustainable and robust enough format for a tech pubs team. The person wanted to leverage any research on the topic (to inform their company’s decision) but could not locate any. Markdown is emerging as another huge format for tech docs, and yet again, just like with DITA, the academic community is silent on this. Can we expect any academic research focusing on Markdown (besides your LwDITA research)?
CE: I have been using Markdown in my courses probably since 2007, and I have done informal research with students. I do not have a big research question about it, so that has not been a focus in my research work. But I do use it a lot and even wrote most of my book in Markdown (or MDITA, to be more precise).
I hope your conversations series helps in the formulation of research questions worth exploring in this area.
TJ: I’m curious whether you’ve ever used DITA on an actual project of a substantial length and size (outside the classroom or a test project)? If so, what was your experience? A larger question is how academics can speak credibly on more technical topics while being removed from the practitioner workspace where this kind of exposure and usage of technology is more common. Maybe this is why academics rarely focus on the technology side of “tech comm”?
CE: Yes, I have. Virginia Tech has an office of Corporate and Foundations Relations that connects companies and nonprofits with faculty, staff, and students to provide training or solve problems. I have worked with a few companies through that office and in some cases the solution to a problem was a Microsoft Word document, and in others it was a detailed DITA troubleshooting system. I blogged about one of those projects for Scriptorium a few years ago, but the companies had non-disclosure agreements that kept us from publishing full studies based on that work. I gained a lot of experience and even involved some of my graduate and undergraduate students, who really liked the opportunity for experiential learning.
I have been talking about this (academics’ credibility with practitioners) for a long time with some colleagues and we see it as a priority in our work. That’s why I am so involved in the DITA TC (where currently I am the only full-time academic).
TJ: Finally, you mentioned that you recently transitioned from the English department to the Communication department? Is that where “technical communication” rightly belongs — as a subgenre of “Communication”? I guess this is a bit off-topic, but I’m curious whether the context of this larger academic grouping (at Virginia Tech at least) indicates some uncertainty about just where tech comm fits into the academia as a discipline. I mean, it wouldn’t be common for another discipline to be transferred from one division to another (e.g., moving Philosophy into the Engineering department). In companies, it’s similarly common for practitioners to be bounced around in various divisions, from Engineering to Product Management to Marketing or Support. It seems no one knows where documentation really belongs. Is it the same in the academia?
CE: The monster that we call “technical communication” in this country exists in many academic units and departments all over the world. When I lived in Mexico, for example, we did not have English departments. Well, we did, but they were teaching English as a second language. And even in this country, technical communication does not live exclusively in English departments (I bet you Lisa Meloncon has some data about this). Here at Virginia Tech, we are creating a major in Communication Science and Social Inquiry that will have a concentration in Professional Communication. That’s where I see my work, and not necessarily restricted to written documentation. Again, this is my particular experience and I know many colleagues are happy and successful in English departments. I just don’t feel like an English person (indeed, some of my work is in Spanish … LOL), but I had a good run of almost two decades in English departments and I respect and appreciate the work of my peers
TJ: Can you tell us a little bit about the program where you teach?
CE: I am currently in the Department of Communication at Virginia Tech. The Department has three undergraduate academic majors: Public Relations, Multimedia Journalism, and Communication Science and Social Inquiry. I teach some courses that are for all three majors, but I am primarily affiliated with the Communication Science and Social Inquiry program, which has a concentration in Professional Communication. That’s where all the good stuff about content strategy and intelligent content will take place.
About Carlos Evia
Carlos Evia ([email protected]) is an Associate Professor in the Department of Communication at Virginia Tech, where he also conducts research for the Centers for Human-Computer Interaction and Innovation in Construction Safety, Health, and Wellbeing. He is also a voting member of the DITA Technical Committee and co-chair (with Michael Priestley) of the Lightweight DITA subcommittee. Carlos is a former director of the Professional and Technical Writing undergraduate program at Virginia Tech. See the following for more information:
About Tom Johnson
I'm a technical writer based in the San Francisco Bay area. In this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, visual communication, information architecture, writing techniques, plain language, tech comm careers, and more. Check out simplifying complexity and API documentation for some deep dives into these topics. If you're a technical writer and want to keep on top of the latest trends in the field, be sure to subscribe to email updates. You can also learn more about me or contact me.